Data masking (also known as data anonymization) is a feature that allows for transfer of table data to another database or cluster without leaking confidential information. The typical use case is that application developers, who have no direct access to the production database, need a database as similar to the production database as possible. Testing and troubleshooting are the typical use cases.
To mask particular column, you need to define a masking expression for it. For example, in the following table
CREATE TABLE public.customer (
customer_id integer NOT NULL,
store_id integer NOT NULL,
first_name text NOT NULL,
last_name text NOT NULL,
email text,
address_id integer NOT NULL,
active boolean DEFAULT true NOT NULL,
create_date date DEFAULT CURRENT_DATE NOT NULL,
last_update timestamp with time zone DEFAULT now(),
active integer
);
you might need to mask the email column. You can do so by
setting the sha224 function as the masking expression:
ALTER TABLE customer ALTER COLUMN email SET MASK sha224(email::bytea);
If you then replicate the table via logical decoding, queries on the subscriber database will only return the masked email address:
SELECT customer_id, email
FROM customer
WHERE customer_id = 8;
customer_id | email
-------------+------------------------------------------------------------
8 | \x0ca936fc361bb8e4c88e76ed113cdeebaadc0685ee24fd10c249b2c9
(1 row)
Another example shows how to apply hashing to numeric columns.
CREATE TABLE payment (
payment_id integer NOT NULL,
customer_id integer NOT NULL,
staff_id integer NOT NULL,
units integer NOT NULL,
amount numeric NOT NULL,
payment_date timestamp with time zone NOT NULL
);
With this table, you can use the following command to set masking expression
on the amount column:
ALTER TABLE payment ALTER COLUMN amount
SET MASK substring(sha224(numeric_send(amount)), 1, 4)::int4::numeric;
Here, the numeric_send() function produces the binary
representation of the numeric value and sha256() computes
its secure hash. substring() then extracts four bytes that
fit into an integer value, and the ::int4 cast converts
bytes to integer value. Finally, the ::numeric cast
computes the numeric value. (The intermediate integer value is needed because
there is no SQL function that converts the array of bytes to numeric values
directly.)
Likewise, you can setup masking of an integer
column units:
ALTER TABLE payment ALTER COLUMN units
SET MASK substring(sha224(int4send(units)), 1, 4)::int4;
Alternatively, you can prevent arbitrary column from being replicated to the
masked database by using NULL as masking expression:
ALTER TABLE customer ALTER COLUMN email SET MASK NULL;
Once the masking expression exists, there are two ways to apply the masking.