Table Partition on Postgres for Easier Archiving Data

Sylvia Sari
4 min readMar 20, 2024

Problem Statement

At first, when Software Engineers develop a feature, maybe they will be missing out one thing. The thing is the growth of data from the feature that they created. In system design phase, Engineers usually will do estimation calculation of the growth of data. Let say, in 1 year, the ETA of data will be x GB or maybe x TB. We do know about this but we think we can handle this later. Until one day…

Our database are consuming a lot of disk space. We need to add disk before it becomes full. And of course, cost that we spent for adding disk is also increasing.

Not only cost. Sometimes, it also makes an impact to the performance. It can make slower performance and hard if we want to do query for retrieving data with that a lot of data.

What We Want to Achieve?

The goal that we want to achieve is easier to archive data, so we can save disk space and do cost optimization. We would like to do archiving regularly in an effective way.

Note: in this case, we executed on postgres based on my experience.

First Step that We Did

For the first time, we tried to delete old and unused data by using script. We run the script at a certain time (at low traffic). It worked well until we realized that:

The total inserted row was higher than total deleted row in a day.

Even we’re only able to do deletion at certain time. After deletion process, we still need to do vacuum. Vacuum process itself also took quite long (because the data is large). The larger data, the longer vacuum process. By doing this, we considered it’s not an effective way. Therefore, we need to think the alternative way.

First Thing First — Table Partition

What is Table Partition?

One of the alternative way that we can do is doing table partition. Table partition is table that divided into smaller table. It has parent and child tables.

This below sample is non partition or usual table.

Example of Non Partition Table

This below sample is partition table. Let say, we do partition by using range partitioning based on created time monthly.

E.g: On A_partition_1 consist of January 2024 data. On A_partition_2 consist of February 2024 data, and so on.

Example of Partition Table

Why We Use This?

This is an effective way to archive data without row deletion one by one and doing vacuum manually. We only need to detach child table and archive and/or drop the child table.

Step by Step

  1. Pre-requisite: Install pg_partman Extension. By installing this extension, it can make partition process easier because it provides automation process for creating child table, detach child table, etc.
  2. Analyze Query and Determine the Correct Partition Key. This is the most important part. If we do not choose the right partition key, it will cause slower performance than before. So, we need to analyze and define the correct partition key. Please see the details on postgres documentation here.
  3. Revamp Query. All query must consist of partition key. For example, if we use created time as partition key, we need to add filter created time on our query. If it’s not, it will scan all of child table which can cause slower performance.
  4. Revamp Index. All index must consist of partition key. Once we add filter with partition key, we also need to add index that consist partition key also.
  5. Execution. It’s the step to migrate from non partition table to partition table.

Archive Data

After we do table partition, we can easily detach child table. Then we are able to archive and drop the child table without doing vacuum. We can also set automation to detach it automatically. It makes easier and effective for archiving data process.

Result

  1. Disk Space. Once old data has been archived and dropped, it will decrease disk space. We have more disk space on our database. It depends on the use case. Let say, the average of total data monthly is 100GB, then we can save 100GB monthly.
  2. Cost Optimization. If we are able to manage this, we can migrate to new server with lower disk space specification to save the cost.

Reference

https://www.postgresql.org/docs/current/ddl-partitioning.html

--

--