PostgreSQL row-level table audit (back in time) hstore restore

Keeping track of database table data changes can be a software development requirement. This approach (auditing a database table) can be used to revert data to a previous state, to restore deleted data or simply to create audit reports.

The audit approach used by me and described in this article is based on hstore and Postgres triggers. Hstore is a key-value store within Postgres, available through contrib extensions.

To start using the audit we will need at least a database and a table to monitor for data modifications. I'll use a generic database named 'demo_db' and a generic table named 'demo_users' within.

First, we need to enable the hstore extension for the database in use. The hstore extension must be enabled by the postgres user by executing the following command.

CREATE EXTENSION hstore;

From now one, all SQL commands can be executed as a regular user using either PgAdmin query tools either the psql console (server shell or Pgadmin plugin).

Second, we need to execute the 2ndQuadrant audit.sql file available on GitHub (link available at the bottom of this article) to create the audit schema, tables, and triggers that will be used to monitor and keep modifications history for the monitored tables. The audit.sql content can be executed in the PgAdmin query window or the file can be imported using the psql \i approach (local download needed).

And that's it, now we can enable the audit on desired table/s using the command:

SELECT audit.audit_table('demo_users');

The 'demo_users' table used in this example has a basic structure with three columns and a serial one (id) used as the primary key (pk):

CREATE TABLE demo_users
(
  id serial NOT NULL,
  username character varying NOT NULL,
  birthdate date NOT NULL,
  level integer NOT NULL DEFAULT 0,
  CONSTRAINT demo_users_pkey PRIMARY KEY (id)
);

When enabling audit on the 'demo_users' table, some triggers will be created on the table and every INSERT/UPDATE/DELETE operation on it will be logged to 'audit.logged_actions' schema table. The important columns on this 'logged_actions' table are 'row_data' and 'changed_fields', both of type hstore, columns where an exact copy of old and modified row data is stored upon a CRUD operation. The combinations between these two columns and other ones (eg event time, event id, maybe some other custom columns like a session application user, etc) represent a restore point and data within can be used to restore a production table data to a previous state or to create and display modifications history for that table.

To check that our audit triggers do its job, let's INSERT/UPDATE/DELETE some rows on the audited 'demo_users' table and see how our table modifications get stored and, later, how to work with these restore points.

INSERT INTO demo_users (username, birthdate, level) VALUES ('demo1','2000-01-01'::date, 0);
UPDATE demo_users SET birthdate = '2001-02-01'::date, level = 1 WHERE lower(username) = 'demo1';
UPDATE demo_users SET level = 2 WHERE lower(username) = 'demo1';
DELETE FROM demo_users WHERE lower(username) = 'demo1';

After all these operations, the 'demo_users' table will be empty and the 'audit.logged_actions' one will host a restore point for each operation executed above.

A basic report that shows all the restore points for our 'demo_users' Postgres table

SELECT event_id, action_tstamp_clk::timestamp(0), action,(populate_record(null::public.demo_users, coalesce(logged_actions.row_data,'')::hstore || coalesce(logged_actions.changed_fields,'')::hstore)).* FROM audit.logged_actions WHERE table_name = 'demo_users';

 event_id |  action_tstamp_clk  | action | id | username | birthdate  | level 
----------+---------------------+--------+----+----------+------------+-------
        1 | 2019-11-03 08:17:18 | I      |  1 | demo1    | 2000-01-01 |     0
        2 | 2019-11-03 08:17:29 | U      |  1 | demo1    | 2001-02-01 |     1
        3 | 2019-11-03 08:17:43 | U      |  1 | demo1    | 2001-02-01 |     2
        4 | 2019-11-03 08:19:34 | D      |  1 | demo1    | 2001-02-01 |     2

This report shows the 'demo_users' modification history in a form of restore points so we can use it to restore our 'public.demo_users' table rows (entirely, by a restore point id or by other advanced filters). The only secret here is the 'populate_record' hstore function, after that it's up to us on what pieces of information we need to add to our report (eg standard columns from 'logged_actions') or how we fine-tune the filters (eg ... AND (row_data->'id')::INT = 1 ), if we need a targeted report or specific restore points.

Case study, restore deleted rows using restore points from the audit schema

INSERT INTO demo_users SELECT * FROM (
SELECT
  (populate_record(null::public.demo_users, logged_actions.row_data)).*
FROM
  audit.logged_actions
WHERE
 table_name = 'demo_users' AND action = 'D' -- AND (row_data->'id')::INT = 1
 -- event_id = 4
) q ON CONFLICT DO NOTHING;

This query first generates a restore point report (as a subquery q) and uses the result to insert rows back to our public schema table. The (q) subquery can be fine-tunned with filters just as in the basic report (eg we can restore entire table deleted data, we can restore deleted data just for a specific user or we can restore data to a specific restore point id). The second and third case is already present in the query but commented with '--'. It's up to us what data we need back.

After executing the restore query our deleted user/s is now back to the exact state it was before deletion.

SELECT * FROM demo_users;
 id | username | birthdate  | level 
----+----------+------------+-------
  1 | demo1    | 2001-02-01 |     2

We have now the data back to the state it was upon deletion, but we can go further and restore our data to any logged restore point, this time by replacing row/s data with data available in those restore points.

Case study, restore row/s data (by updating their content) with data available on the audit schema

UPDATE demo_users
SET  (id, username, birthdate, level) = (q.id, q.username, q.birthdate, q.level)
FROM   
(
SELECT
  (populate_record(null::public.demo_users, coalesce(logged_actions.row_data,'')::hstore || coalesce(logged_actions.changed_fields,'')::hstore)).*
FROM
  audit.logged_actions
WHERE
    event_id = 2
 -- table_name = 'demo_users' AND action = 'U' -- AND (row_data->'id')::INT in (1,2,3,4)
) q
WHERE demo_users.id = q.id;

As we can see, our above query uses the same logic as the INSERT one, this time using the UPDATE FROM command. Again, the secret is the (q) subquery that could be extended with filters to match an exact usage case. Just remember my first ProTip to individually execute the q subquery before anything else just to be sure and see (mainly if playing with advanced filters) what rows will be updated upon executing the entire query.

After that, we can browse and see that our user/s data are now in a previous desired state.

SELECT * FROM demo_users;
 id | username | birthdate  | level 
----+----------+------------+-------
  1 | demo1    | 2001-02-01 |     1

Pro Tips:

ProTip 1: Don't forget to use indexes that match your query filters, the hstore has GiST and GIN index support but in our audit schema (and the examples) more important is that hstore keys can be used in btree indexes just as with regular columns. A proper index that will match our example query filters will be:

CREATE INDEX logged_actions_table_name_row_data_id_idx
  ON audit.logged_actions
  USING btree
  (table_name COLLATE pg_catalog."default", ((row_data -> 'id'::text)::integer));

When we have a small amount of data stored to a database table the index may never be used, but to ensure that we have enabled a usable one, even with a small amount of data we can do that by setting a temporary sequential scan bypass and test our query with explain analyze

set enable_seqscan = 0;
explain analyse
select ...;

that will show us if the index is/will be used or not (eg "-> Index Scan using our_index_name on our_table_name ..." vs "-> Seq Scan on our_table ...")

ProTip 2: Extend the data stored within our default audit_logged table.
In production, we have application logged users (session users) that operates the database data through the software logic. If we want to store this piece of information (the app logged user id), then it's possible through PostgreSQL configuration parameters. What we need is one more column called 'app_logged_user' (or anything you like) that defaults to current_setting('APP_LOGGED_USER.id'). Here, APP_LOGGED_USER.id it's a custom Postgres configuration parameter that can be declared globally inside the postgresql.conf ¹⁾ or, better, directly at the database level ²⁾

¹⁾ Edit postgresql.conf (run the psql query SHOW config_file; to find the exact file location)

#------------------------------------------------------------------------------
# CUSTOMIZED OPTIONS
#------------------------------------------------------------------------------
APP_LOGGED_USER.id = 0

²⁾ run this psql query

ALTER DATABASE 'a-database' SET APP_LOGGED_USER.id TO '0';

We can use this configuration variable to store the information we need, eg the application logged user id, and we can do that at the application logic by issuing a sql query for that, eg $db->Execute("SET 'APP_LOGGED_USER.id' = $1",array($_SESSION['logged_user_id']));

ProTip 3: Store less data to our default audit_logged table.
This is quite easy actually, eg we can use only one timestamp column instead of three and we can get rid of any extra columns that didn't fits in our application logic, eg client_addr/port/client_query/etc.

Bonus, a basic report that shows the flow between restore points (old and new values) at a column level

By a restore point id

SELECT event_id, action_tstamp_clk, skeys(row_data-(row_data||changed_fields)) as column, svals(row_data-(row_data||changed_fields)) as old_values, svals(changed_fields) as new_values FROM audit.logged_actions WHERE event_id = 11;

 event_id |       action_tstamp_clk       |  column   | old_values | new_values 
----------+-------------------------------+-----------+------------+------------
       11 | 2019-11-03 09:46:55.542603+00 | level     | 0          | 1
       11 | 2019-11-03 09:46:55.542603+00 | birthdate | 2000-01-01 | 2001-02-01
(2 rows)

By advanced filtering (table_name and audited table primary key)

SELECT event_id, action_tstamp_clk, skeys(row_data-(row_data||changed_fields)) as column, svals(row_data-(row_data||changed_fields)) as old_values, svals(changed_fields) as new_values FROM audit.logged_actions WHERE table_name = 'demo_users' AND (row_data->'id')::int = 1;

 event_id |       action_tstamp_clk       |  column   | old_values | new_values 
----------+-------------------------------+-----------+------------+------------
        2 | 2019-11-03 09:02:51.445477+00 | level     | 0          | 1
        2 | 2019-11-03 09:02:51.445477+00 | birthdate | 2000-01-01 | 2001-02-01
        3 | 2019-11-03 09:02:51.445674+00 | level     | 1          | 2
        7 | 2019-11-03 09:42:53.517047+00 | level     | 2          | 3
        8 | 2019-11-03 09:43:01.912676+00 | level     | 3          | 2
        9 | 2019-11-03 09:44:06.185648+00 | level     | 2          | 3
       13 | 2019-11-03 10:02:04.237171+00 | level     | 3          | 1
(7 rows)

Bibliography:
Audit trigger 91plus - PostgreSQL wiki
GitHub - 2ndQuadrant/audit-trigger: Simple, easily customised trigger-based auditing for PostgreSQL (Postgres)
PostgreSQL: Documentation: 11: F.16. hstore