This section describes functions and operators for examining and
manipulating binary strings, that is values of type bytea
.
Many of these are equivalent, in purpose and syntax, to the
text-string functions described in the previous section.
SQL defines some string functions that use key words, rather than commas, to separate arguments. Details are in Table 9.11. PostgreSQL also provides versions of these functions that use the regular function invocation syntax (see Table 9.12).
Table 9.11. SQL Binary String Functions and Operators
Additional binary string manipulation functions are available and are listed in Table 9.12. Some of them are used internally to implement the SQL-standard string functions listed in Table 9.11.
Table 9.12. Other Binary String Functions
Function Description Example(s) |
---|
Returns the number of bits set in the binary string (also known as “popcount”).
|
Removes the longest string containing only bytes appearing in
|
Extracts n'th bit from binary string.
|
Extracts n'th byte from binary string.
|
Returns the number of bytes in the binary string.
|
Returns the number of characters in the binary string, assuming
that it is text in the given
|
Removes the longest string containing only bytes appearing in
|
Computes the MD5 hash of the binary string, with the result written in hexadecimal.
|
Removes the longest string containing only bytes appearing in
|
Sets n'th bit in
binary string to
|
Sets n'th byte in
binary string to
|
Computes the SHA-224 hash of the binary string.
|
Computes the SHA-256 hash of the binary string.
|
Computes the SHA-384 hash of the binary string.
|
Computes the SHA-512 hash of the binary string.
|
Extracts the substring of
|
Functions get_byte
and set_byte
number the first byte of a binary string as byte 0.
Functions get_bit
and set_bit
number bits from the right within each byte; for example bit 0 is the least
significant bit of the first byte, and bit 15 is the most significant bit
of the second byte.
For historical reasons, the function md5
returns a hex-encoded value of type text
whereas the SHA-2
functions return type bytea
. Use the functions
encode
and decode
to
convert between the two. For example write encode(sha256('abc'),
'hex')
to get a hex-encoded text representation,
or decode(md5('abc'), 'hex')
to get
a bytea
value.
Functions for converting strings between different character sets
(encodings), and for representing arbitrary binary data in textual
form, are shown in
Table 9.13. For these
functions, an argument or result of type text
is expressed
in the database's default encoding, while arguments or results of
type bytea
are in an encoding named by another argument.
Table 9.13. Text/Binary String Conversion Functions
Function Description Example(s) |
---|
Converts a binary string representing text in
encoding
|
Converts a binary string representing text in
encoding
|
Converts a
|
Encodes binary data into a textual representation; supported
|
Decodes binary data from a textual representation; supported
|
The encode
and decode
functions support the following textual formats:
The base64
format is that
of RFC
2045 Section 6.8. As per the RFC, encoded lines are
broken at 76 characters. However instead of the MIME CRLF
end-of-line marker, only a newline is used for end-of-line.
The decode
function ignores carriage-return,
newline, space, and tab characters. Otherwise, an error is
raised when decode
is supplied invalid
base64 data — including when trailing padding is incorrect.
The escape
format converts zero bytes and
bytes with the high bit set into octal escape sequences
(\
nnn
), and it doubles
backslashes. Other byte values are represented literally.
The decode
function will raise an error if a
backslash is not followed by either a second backslash or three
octal digits; it accepts other byte values unchanged.
The hex
format represents each 4 bits of
data as one hexadecimal digit, 0
through f
, writing the higher-order digit of
each byte first. The encode
function outputs
the a
-f
hex digits in lower
case. Because the smallest unit of data is 8 bits, there are
always an even number of characters returned
by encode
.
The decode
function
accepts the a
-f
characters in
either upper or lower case. An error is raised
when decode
is given invalid hex data
— including when given an odd number of characters.
See also the aggregate function string_agg
in
Section 9.21 and the large object functions
in Section 37.4.