How to Backup and Restore PostgreSQL 9.3 Databases

In my previous blog post, I explained how to install PostgreSQL 9.3 on Oracle Linux. As a DBA, one of the first things we want to learn is how we can recover the database in case of a failure. So in this blog post, I’ll show how to backup and restore PostgreSQL databases in a server. We’ll backup PGDATA directory, so it’ll contain all databases in the instance (remember that PostgreSQL is a multitenant database).

PostgreSQL supports Write Ahead Log (WAL) mechanism like Oracle. So everything will be written to (redo)logs before they written into actual datafiles. So we will use a similar method to Oracle. We need to start “the backup mode”, copy the (data) files, and stop the backup mode, and add the archived logs to our backup. There are SQL commands for starting backup mode (pg_start_backup) and for stopping backup mode (pg_stop_backup), and we can copy the files using OS commands. Good thing is, since 9.1, PostgreSQL comes with a backup tool named “pg_basebackup”. It’ll do everything for us.

To be able to use pg_basebackup, we need to turn on archiving and also turn on wal_sender process. Archiving controlled by “archive_mode”, “archive_command” and “wal_level” parameters. An interesting thing about PostgreSQL is, you need to write OS commands to copy redologs (WAL) files to an archive location. While writing a copy command, you can use %p and %f variables. %p variable holds the full path of wal file, and %f holds only the file name. So something like “cp %p /archives/%f”, will copy the wal file to /archives directory.

WAL Sender process is used to feed logs to a replicate database, and pg_basebackup also uses it to receive archive logs. It’s controlled by max_wal_senders parameter. We’ll set it to 1, so our pg_basebackup will be able to connect PostgreSQL server to fetch the required WAL files (the ones created during backup period). Ok Let’s create directories for backups and archives (login as root):

Now I’ll set the required parameters for archiving and enabling WAL sender process. I switch to postgres user and modify postgresql.conf:

I hope you haven’t surprised by $PGDATA. The PGDATA variable should point to the data directory of PostgreSQL. In my previous post, I recommend to declare these variables in bash_profile script. Wal_level controls the logging level, it can be set to hot_standby or archive to enable log-archiving. I recommend you to set wal_level to hot_standby instead of “archive”. If you also set “hot_standby” to “on”, you can run the queries in recovery mode. I’ll explain it in point in time recovery. We also need to give permission to pg_basebackup to connect our server, so we issue the following line to pg_hba.conf, and then restart the server:

Now here’s the magical command to backup the PostgreSQL database:

It will create a directory such as 20141117 (depending the date it’s executed), compress all database files into a TAR file (–format=t) and put it into that folder. The name of backup file will be base.tar. This TAR file will contain the required WAL files (–xlog) to be able to recover the database. It even contains postgresql.conf and pg_hba.conf.

So we have a backup, how to recover it?

All we need is to extract tar file into the $PGDATA folder. Make sure that PostgreSQL services are down before you copy the files. If we want to restore the databases on a new server, we need to install PostgreSQL software first.

So we restored our database files, and have required wal (redolog) files to recover the database, but if we open the database now, we’ll lose the data since the backup is done. We may want to apply the logs which are created after the backup. To be able to do it, create a “recovery.conf” file which has the below line:

When we issue “pg_ctl start”, PostgreSQL will see the “recovery.conf” file and start recovering the database. When the recovery is done, the “recovery.conf” file will be renamed to “recovery.done”.

Point in-time Recovery

PostgreSQL supports point in time recovery (PIT). You need to include one of the following parameters into the recovery.conf file for PIT:

  • recovery_target_name (string): You specify a named restore point, created with pg_create_restore_point(). For example:

    Then you can use this name while recovering by adding the below line to your recovery.conf file:

  • recovery_target_time (timestamp): Instead of giving a name, (in most cases you won’t be prepared so well), you enter a timestamp.
  • recovery_target_xid (string): This parameter specifies the transaction ID up to which recovery will proceed.

You may remember that I said it’s better to set wal_level to hot_standby instead of archive. In point in-time recovery, you can make PostgreSQL pause the recovery on the recovery_target point, and wait for your approval to end the recovery process. So you can run queries on the database to check if it’s the correct time before you open the database for read write operations. To be able to do it, you also need to set hot_standby=on parameter. It sounds weird to enable hot_standby on a stand-alone server but this mode actually tells PostgreSQL to accept queries while recovery in process.

If the wal_level is hot_standby, and hot_standby is enabled on postgres.conf, you can write a recovery.conf file similar to below:

This recovery.conf file will cause to stop the recovery process before committing of transactions later than 2014-12-01 10:19:42. You will be able to connect the database, run (read-only) queries and after you verify that you reached the corrent point in-time, you run the following command to end the recovery and open the database:

It will resume (in fact ends) the recovery (no further logs will be applied). If you want to apply more logs, change recovery.conf file and set another recovery_target_time, and then restart the server.

What will happen to archived WAL files?

You may write a script to compress the archived log files and keep them with your base backups. After backing up, you better delete the old archived files. Here’s a sample script to backup and clean the archive logs older than 2 days, do not forget the change the folder name according to your real archive location:

I’ll explain how to take logical backups (dumps) in future blog posts.

Please share
  • 3
  •  
  •  
  • 7
  •  
  •  
  •  

Gokhan Atil is a database administrator who has hands-on experience with both RDBMS and noSQL databases, and strong background on software development. He is certified as Oracle Certified Professional (OCP) and is awarded as Oracle ACE (in 2011) and Oracle ACE Director (in 2016) for his continuous contributions to the Oracle users community.

20 Comments

  1. K K Pal

    Hi Gokhan,

    I have stuck in an issue while taking dump, actually I do not have enough disk space free (“for /” partition) on my live server, so I have started moving files in “/var/lib/pgsql/9.2/data/pg_xlog/” in some other folder as I do not have any option other than this.

    As you mentioned in your article “This TAR file will contain the required WAL files (–xlog) to be able to recover the database.” …what will happen in my case because some WAL files will not be there when the process will end?

    So kindly let me know if there is any manual way to create that required TAR file.

    Thanks in advance for your co-operation.

    Regards//K K Pal

     

     

  2. bilal

    Hello

    I want to take full backup but if I need to restore single object for example table or schema. how can I do this. please help about that.

    best regards

  3. Joakim Lundgren

    Hello!

    I’m about to set up full backup & recovery for Postgres-XL – a three node cluster with data on all three nodes. It would be nice if you can elaborate the concept to do this based on the single node approach that you described. Thanks!

  4. sruthi

    >pg_restore: [archiver] WARNING: archive is compressed, but this
    installation does not support compression — no data will be available
    >pg_restore: [archiver] cannot restore from compressed archive (compressionnot supported in this installation)
    I have compiled postgres with zlib also still above errors are coming while doing restoring the linux backup to windows using postgres 9.4 in an appilication, please do needful.

  5. Cristian Quagliozzi

    This is very good example, but it would be better with a remote server.

    Thanks for share!

  6. Martin

    Hi Gokhan…I have a PEM Server 5.0 which manages aroung 30+ Servers and its PEM database has reached to 88% usage ie 88% used.all the PEM data is pressent at PEM database in /base dir only .I want to clean the /base directory  i.e archive the  log files to another destination without affecting other databases.pls send me the procedure to cleanup the 88% used space in /base directory.Pls do the procedure and  as soon as possible as I am in urgency  and top priority task.

    • Gokhan Atil

      Martin, I recommend you to open a ticket to EnterpriseDB. They will be glad to help you and they usually respond very quickly.

  7. Ramkumar

    HI,

    I wanted to do file system backup of postgresql but for that I need to shutdown the database, which i cant do. I prefer to do pg_basebackup but it needs archive_mode to be “on” and enabling wal. But i dont need any data after i trigger pg_basebackup.  In other word, simple data folder copy is required.  So i enabled wal_level = hot_standby and max_wal_senders = 1 and i can successfully able to copy data folder and able to restore successfully.

    But, I am afraid that is it really correct?

    Please let me know

    • Gokhan Atil

      Hi Ramkumar,

      It shouldn’t cause a problem as long as your “online” wal files can keep all transactions completed during your backup process.

  8. Ramkumar

    Thanks Gokhan.

    I have one confusion because I am not going to replay the wal files when I am restoring the backup. Hence I don’t need any transaction happened during backup process and no need to worry aboutt “online wal”.

    This is the question i posted in psql forum. This question has the elaborated information’s.

    We are currently using multiple DBMS in our project and postgresql is one of them. Our private DBMS keeps the online data and postgresql keeps online as well as historical data.

    At present, we are doing the backup/restore process for our project. So we planned to use Pg_basebackup instead of pg_dump.

    Below is the backup steps.

    Time-T1   = Start the backup of private DBMS.

    Time-T2   = Finished the private backup DBMS.

    Time-T3   = Start the pg_basebackup.

    Time-T4   = End the Pg_basebackup.

    Here the requirement is we don’t want to restore the data after Time-T3. But when I followed this approach https://opensourcedbms.com/dbms/point-in-time-recovery-pitr-using-pg_basebackup-with-postgresql-9-2/, I am still getting the information’s archived from Time-T3 to TimeT4.

    Seems, WAL archives are holding all the transactions, which are happened between Time T3 – Time T4.

    Also, I don’t want enable archive_mode = on as it needs to maintain archives files.

    So I decided the enable only these parameters.

    Postgresql.conf

    ———————————

    wal_level = hot_standby

    max_wal_senders = 1

    and added replication permissions for the current user in pg_hba.conf.

    It does, what I need it. In the backup I did not have the data between T3-T4.

    Is this correct or is there anything I missing it.

  9. Abdullah MD

    Hi ,

    We have an issue with one of the database in postgresql 9.3

    By mistakenly we have deleted the database and we don’t have any recent backups for that.

    How to recover it ? Can you help us on this how move ahead.

  10. Sahoong

    Was this carried out on a single server or 2 servers? These is no illustration about doing this with 2 servers or a second remote server. How can the backup taken here be restored to a remote server?

     

     

    • Gokhan Atil

      If you want to restore your backup to another server, you need to install PostgreSQL software first to the new server, copy the backup files, then follow how to recover part of this blog post.

  11. Prabakaran

    Hi,
    Please explain about deadlock error. Why it happen and how to resolve the error. Kindly help me

    Thanks in advance

Leave Comment

Your email address will not be published. Required fields are marked *