Data masking (also known as data anonymization) is a feature that allows for transfer of table data to another database or cluster without leaking confidential information. The typical use case is that application developers, who have no direct access to the production database, need a database as similar to the production database as possible, both for testing and troubleshooting.
To mask particular column, you need to define a masking expression for it. For example, you can replace a text string with its hash:
ALTER TABLE customer ALTER COLUMN email SET MASK sha224(email::bytea);
Alternatively, you can remove the values from the masked database by setting
the value to NULL
:
ALTER TABLE customer ALTER COLUMN email SET MASK NULL;
As another example, numeric column can be masked this way:
ALTER TABLE payment ALTER COLUMN amount SET MASK substring(sha224(numeric_send(amount)), 1, 4)::int4::numeric;
Here, the numeric_send()
function produces the binary
representation of the numeric value and sha256()
computes
its secure hash. substring()
then extracts four bytes that
fit into an integer value, and the ::int4
cast converts
bytes to integer value. Finally, the ::numeric
cast
computes the numeric value. (The intermediate integer value is needed because
there is no SQL function that converts the array of bytes to numeric values
directly.)
Once the masking expression exists, there are two ways to transfer the data.