Chapter 32. Data Masking

Table of Contents

32.1. Data Masking via pg_dump
32.2. Data Masking via Logical Replication

Data masking (also known as data anonymization) is a feature that allows for transfer of table data to another database or cluster without leaking confidential information. The typical use case is that application developers, who have no direct access to the production database, need a database as similar to the production database as possible, both for testing and troubleshooting.

To mask particular column, you need to define a masking expression for it. For example, you can replace a text string with its hash:

ALTER TABLE customer ALTER COLUMN email SET MASK sha224(email::bytea);

Alternatively, you can remove the values from the masked database by setting the value to NULL:

ALTER TABLE customer ALTER COLUMN email SET MASK NULL;

As another example, numeric column can be masked this way:

ALTER TABLE payment ALTER COLUMN amount
    SET MASK substring(sha224(numeric_send(amount)), 1, 4)::int4::numeric;

Here, the numeric_send() function produces the binary representation of the numeric value and sha256() computes its secure hash. substring() then extracts four bytes that fit into an integer value, and the ::int4 cast converts bytes to integer value. Finally, the ::numeric cast computes the numeric value. (The intermediate integer value is needed because there is no SQL function that converts the array of bytes to numeric values directly.)

Once the masking expression exists, there are two ways to transfer the data.