Keeping track of database table data changes can be a software development requirement. This approach (auditing a database table) can be used to revert data to a previous state, to restore deleted data or simply to create audit reports.
The audit approach used by me and described in this article is based on hstore and Postgres triggers. Hstore is a key-value store within Postgres, available through contrib extensions.
To start using the audit we will need at least a database and a table to monitor for data modifications. I'll use a generic database named 'demo_db' and a generic table named 'demo_users' within.
First, we need to enable the hstore extension for the database in use.
The hstore extension must be enabled by the
postgres user by executing the following command.
CREATE EXTENSION hstore;
From now one, all SQL commands can be executed as a regular user using either PgAdmin query tools either the
psql console (server shell or Pgadmin plugin).
Second, we need to execute the 2ndQuadrant
audit.sql file available on GitHub (link available at the bottom of this article) to create the audit schema, tables, and triggers that will be used to monitor and keep modifications history for the monitored tables. The
audit.sql content can be executed in the PgAdmin query window or the file can be imported using the
psql \i approach (local download needed).
And that's it, now we can enable the audit on desired table/s using the command:
The 'demo_users' table used in this example has a basic structure with three columns and a serial one (id) used as the primary key (pk):
CREATE TABLE demo_users ( id serial NOT NULL, username character varying NOT NULL, birthdate date NOT NULL, level integer NOT NULL DEFAULT 0, CONSTRAINT demo_users_pkey PRIMARY KEY (id) );
INSERT/UPDATE/DELETEoperation on it will be logged to 'audit.logged_actions' schema table. The important columns on this 'logged_actions' table are 'row_data' and 'changed_fields', both of type hstore, columns where an exact copy of old and modified row data is stored upon a CRUD operation. The combinations between these two columns and other ones (eg event time, event id, maybe some other custom columns like a session application user, etc) represent a restore point and data within can be used to restore a production table data to a previous state or to create and display modifications history for that table.
To check that our audit triggers do its job, let's INSERT/UPDATE/DELETE some rows on the audited 'demo_users' table and see how our table modifications get stored and, later, how to work with these restore points.
INSERT INTO demo_users (username, birthdate, level) VALUES ('demo1','2000-01-01'::date, 0); UPDATE demo_users SET birthdate = '2001-02-01'::date, level = 1 WHERE lower(username) = 'demo1'; UPDATE demo_users SET level = 2 WHERE lower(username) = 'demo1'; DELETE FROM demo_users WHERE lower(username) = 'demo1';
After all these operations, the 'demo_users' table will be empty and the 'audit.logged_actions' one will host a restore point for each operation executed above.
A basic report that shows all the restore points for our 'demo_users' Postgres table
SELECT event_id, action_tstamp_clk::timestamp(0), action,(populate_record(null::public.demo_users, coalesce(logged_actions.row_data,'')::hstore || coalesce(logged_actions.changed_fields,'')::hstore)).* FROM audit.logged_actions WHERE table_name = 'demo_users'; event_id | action_tstamp_clk | action | id | username | birthdate | level ----------+---------------------+--------+----+----------+------------+------- 1 | 2019-11-03 08:17:18 | I | 1 | demo1 | 2000-01-01 | 0 2 | 2019-11-03 08:17:29 | U | 1 | demo1 | 2001-02-01 | 1 3 | 2019-11-03 08:17:43 | U | 1 | demo1 | 2001-02-01 | 2 4 | 2019-11-03 08:19:34 | D | 1 | demo1 | 2001-02-01 | 2
This report shows the 'demo_users' modification history in a form of restore points so we can use it to restore our 'public.demo_users' table rows (entirely, by a restore point id or by other advanced filters). The only secret here is the 'populate_record' hstore function, after that it's up to us on what pieces of information we need to add to our report (eg standard columns from 'logged_actions') or how we fine-tune the filters (eg
... AND (row_data->'id')::INT = 1 ), if we need a targeted report or specific restore points.
Case study, restore deleted rows using restore points from the audit schema
INSERT INTO demo_users SELECT * FROM ( SELECT (populate_record(null::public.demo_users, logged_actions.row_data)).* FROM audit.logged_actions WHERE table_name = 'demo_users' AND action = 'D' -- AND (row_data->'id')::INT = 1 -- event_id = 4 ) q ON CONFLICT DO NOTHING;
This query first generates a restore point report (as a subquery q) and uses the result to insert rows back to our public schema table. The (q) subquery can be fine-tunned with filters just as in the basic report (eg we can restore entire table deleted data, we can restore deleted data just for a specific user or we can restore data to a specific restore point id). The second and third case is already present in the query but commented with '--'. It's up to us what data we need back.
After executing the restore query our deleted user/s is now back to the exact state it was before deletion.
SELECT * FROM demo_users; id | username | birthdate | level ----+----------+------------+------- 1 | demo1 | 2001-02-01 | 2
We have now the data back to the state it was upon deletion, but we can go further and restore our data to any logged restore point, this time by replacing row/s data with data available in those restore points.
Case study, restore row/s data (by updating their content) with data available on the audit schema
UPDATE demo_users SET (id, username, birthdate, level) = (q.id, q.username, q.birthdate, q.level) FROM ( SELECT (populate_record(null::public.demo_users, coalesce(logged_actions.row_data,'')::hstore || coalesce(logged_actions.changed_fields,'')::hstore)).* FROM audit.logged_actions WHERE event_id = 2 -- table_name = 'demo_users' AND action = 'U' -- AND (row_data->'id')::INT in (1,2,3,4) ) q WHERE demo_users.id = q.id;
As we can see, our above query uses the same logic as the INSERT one, this time using the UPDATE FROM command. Again, the secret is the (q) subquery that could be extended with filters to match an exact usage case. Just remember my first ProTip to individually execute the q subquery before anything else just to be sure and see (mainly if playing with advanced filters) what rows will be updated upon executing the entire query.
After that, we can browse and see that our user/s data are now in a previous desired state.
SELECT * FROM demo_users; id | username | birthdate | level ----+----------+------------+------- 1 | demo1 | 2001-02-01 | 1
ProTip 1: Don't forget to use indexes that match your query filters, the hstore has GiST and GIN index support but in our audit schema (and the examples) more important is that hstore keys can be used in btree indexes just as with regular columns. A proper index that will match our example query filters will be:
CREATE INDEX logged_actions_table_name_row_data_id_idx ON audit.logged_actions USING btree (table_name COLLATE pg_catalog."default", ((row_data -> 'id'::text)::integer));
set enable_seqscan = 0; explain analyse select ...;
ProTip 2: Extend the data stored within our default audit_logged table.
In production, we have application logged users (session users) that operates the database data through the software logic. If we want to store this piece of information (the app logged user id), then it's possible through PostgreSQL configuration parameters. What we need is one more column called 'app_logged_user' (or anything you like) that defaults to
APP_LOGGED_USER.id it's a custom Postgres configuration parameter that can be declared globally inside the postgresql.conf 1) or, better, directly at the database level 2)
SHOW config_file;to find the exact file location)
#------------------------------------------------------------------------------ # CUSTOMIZED OPTIONS #------------------------------------------------------------------------------ APP_LOGGED_USER.id = 0
ALTER DATABASE 'a-database' SET APP_LOGGED_USER.id TO '0';
We can use this configuration variable to store the information we need, eg the application logged user id, and we can do that at the application logic by issuing a sql query for that, eg
$db->Execute("SET 'APP_LOGGED_USER.id' = $1",array($_SESSION['logged_user_id']));
ProTip 3: Store less data to our default audit_logged table.
This is quite easy actually, eg we can use only one timestamp column instead of three and we can get rid of any extra columns that didn't fits in our application logic, eg client_addr/port/client_query/etc.
Bonus, a basic report that shows the flow between restore points (old and new values) at a column levelBy a restore point id
SELECT event_id, action_tstamp_clk, skeys(row_data-(row_data||changed_fields)) as column, svals(row_data-(row_data||changed_fields)) as old_values, svals(changed_fields) as new_values FROM audit.logged_actions WHERE event_id = 11; event_id | action_tstamp_clk | column | old_values | new_values ----------+-------------------------------+-----------+------------+------------ 11 | 2019-11-03 09:46:55.542603+00 | level | 0 | 1 11 | 2019-11-03 09:46:55.542603+00 | birthdate | 2000-01-01 | 2001-02-01 (2 rows)
By advanced filtering (table_name and audited table primary key)
SELECT event_id, action_tstamp_clk, skeys(row_data-(row_data||changed_fields)) as column, svals(row_data-(row_data||changed_fields)) as old_values, svals(changed_fields) as new_values FROM audit.logged_actions WHERE table_name = 'demo_users' AND (row_data->'id')::int = 1; event_id | action_tstamp_clk | column | old_values | new_values ----------+-------------------------------+-----------+------------+------------ 2 | 2019-11-03 09:02:51.445477+00 | level | 0 | 1 2 | 2019-11-03 09:02:51.445477+00 | birthdate | 2000-01-01 | 2001-02-01 3 | 2019-11-03 09:02:51.445674+00 | level | 1 | 2 7 | 2019-11-03 09:42:53.517047+00 | level | 2 | 3 8 | 2019-11-03 09:43:01.912676+00 | level | 3 | 2 9 | 2019-11-03 09:44:06.185648+00 | level | 2 | 3 13 | 2019-11-03 10:02:04.237171+00 | level | 3 | 1 (7 rows)
Audit trigger 91plus - PostgreSQL wiki
GitHub - 2ndQuadrant/audit-trigger: Simple, easily customised trigger-based auditing for PostgreSQL (Postgres)
PostgreSQL: Documentation: 11: F.16. hstore