Chapter 32. Data Masking

Table of Contents

32.1. Data Masking via pg_dump
32.2. Data Masking via Logical Replication

Data masking (also known as data anonymization) is a feature that allows for transfer of table data to another database or cluster without leaking confidential information. The typical use case is that application developers, who have no direct access to the production database, need a database as similar to the production database as possible. Testing and troubleshooting are the typical use cases.

To mask particular column, you need to define a masking expression for it. For example, in the following table

CREATE TABLE public.customer (
    customer_id integer NOT NULL,
    store_id    integer NOT NULL,
    first_name  text NOT NULL,
    last_name   text NOT NULL,
    email       text,
    address_id  integer NOT NULL,
    active      boolean DEFAULT true NOT NULL,
    create_date date DEFAULT CURRENT_DATE NOT NULL,
    last_update timestamp with time zone DEFAULT now(),
    active      integer
);

you might need to mask the email column. You can do so by setting the sha224 function as the masking expression:

ALTER TABLE customer ALTER COLUMN email SET MASK sha224(email::bytea);

If you then replicate the table via logical decoding, queries on the subscriber database will only return the masked email address:

SELECT customer_id, email
FROM customer
WHERE customer_id = 8;
 customer_id |                           email
-------------+------------------------------------------------------------
           8 | \x0ca936fc361bb8e4c88e76ed113cdeebaadc0685ee24fd10c249b2c9
(1 row)

Another example shows how to apply hashing to numeric columns.

CREATE TABLE payment (
    payment_id          integer	NOT NULL,
    customer_id         integer NOT NULL,
    staff_id            integer NOT NULL,
    units               integer NOT NULL,
    amount              numeric NOT NULL,
    payment_date        timestamp with time zone NOT NULL
);

With this table, you can use the following command to set masking expression on the amount column:

ALTER TABLE payment ALTER COLUMN amount
    SET MASK substring(sha224(numeric_send(amount)), 1, 4)::int4::numeric;

Here, the numeric_send() function produces the binary representation of the numeric value and sha256() computes its secure hash. substring() then extracts four bytes that fit into an integer value, and the ::int4 cast converts bytes to integer value. Finally, the ::numeric cast computes the numeric value. (The intermediate integer value is needed because there is no SQL function that converts the array of bytes to numeric values directly.)

Likewise, you can setup masking of an integer column units:

ALTER TABLE payment ALTER COLUMN units
    SET MASK substring(sha224(int4send(units)), 1, 4)::int4;

Alternatively, you can prevent arbitrary column from being replicated to the masked database by using NULL as masking expression:

ALTER TABLE customer ALTER COLUMN email SET MASK NULL;

Once the masking expression exists, there are two ways to apply the masking.