Use RoaringBitMap Bitmap Operations
SynxDB supports RoaringBitMap, an efficient bitmap compression algorithm that has been widely adopted across various programming languages and big data platforms. By integrating RoaringBitmap functionality, SynxDB provides a native database data type that supports rich database functions and aggregation operations.
RoaringBitmap outperforms traditional compressed bitmaps such as WAH, EWAH, or Concise, particularly in terms of performance. In some cases, RoaringBitmap can be hundreds of times faster than traditional compression methods while providing superior compression efficiency. In certain application scenarios, its performance can even exceed that of uncompressed bitmaps. This high efficiency in bit operations is particularly suitable for big data cardinality calculations, commonly used in data deduplication, tag filtering, and time series complex calculations.
Usage
Load the plugin
Before using RoaringBitMap, you need to load the roaringbitmap
plugin first. Please ensure that the plugin file has been synchronized to the corresponding directories on all nodes.
CREATE EXTENSION roaringbitmap;
Create tables
You can specify fields of type roaringbitmap
in the CREATE TABLE
statement.
The following table t1
contains a field named bitmap
with the data type roaringbitmap
. This field is specifically used to store and manipulate RoaringBitmap
type data.
CREATE TABLE t1 (id integer, bitmap roaringbitmap);
Create a bitmap
Use the
RB_BUILD
function to create a bitmap:INSERT INTO t1 SELECT 1,RB_BUILD(ARRAY[1,2,3,4,5,6,7,8,9,200]);
The above statement inserts a row of data into table
t1
. Theid
field has a value of1
, while thebitmap
field is generated through theRB_BUILD
function. TheRB_BUILD
function accepts an integer array as a parameter and constructs aRoaringBitmap
object. In this example, it constructs a bitmap containing the numbers 1, 2, 3, 4, 5, 6, 7, 8, 9, and 200. This means that in this bitmap, these specific positions will be set to 1, indicating that these elements exist.Use the
RB_BUILD_AGG
function to aggregate values into a bitmap:INSERT INTO t1 SELECT 2,RB_BUILD_AGG(e) FROM GENERATE_SERIES(1,100) e;
The above statement uses PostgreSQL’s
GENERATE_SERIES
function to generate a sequence from 1 to 100, with each number serving as the value of variablee
. TheRB_BUILD_AGG
function aggregates these values into aRoaringBitmap
. This function is typically used for aggregation operations, integrating multiple input values into oneRoaringBitmap
. Therefore, theRoaringBitmap
created here contains all integers from 1 to 100. In this insert statement, theid
field has a value of2
.
Bitmap union operations
SELECT RB_OR(a.bitmap,b.bitmap) FROM (SELECT bitmap FROM t1 WHERE id = 1) AS a,(SELECT bitmap FROM t1 WHERE id = 2) AS b;
The above SELECT
statement uses the RB_OR
function to perform a logical “OR” operation on two RoaringBitmap
data sets, which is a union operation.
This statement merges the contents of two specific RoaringBitmap
data sets, making the resulting new RoaringBitmap
contain all elements from both original data sets. This is particularly useful when processing large data sets, such as merging user behavior data from two different time periods or conditions for comprehensive analysis.
Bitmap union and intersection aggregation operations
Use
RB_OR_AGG
for union aggregation operations:SELECT RB_OR_AGG(bitmap) FROM t1;
The above statement uses the
RB_OR_AGG
function to perform a logical “OR” aggregation operation on thebitmap
column of all rows in tablet1
, which is a union operation. This means it will traverse thebitmap
data in each row of the table and merge allbitmaps
into one, where any bit marked as 1 in any row will also be marked as 1 in the resultbitmap
. This is suitable for merging multiple data sets to obtain elements that appear in any of the data sets.Use
RB_AND_AGG
for intersection aggregation operations:SELECT RB_AND_AGG(bitmap) FROM t1;
The
RB_AND_AGG
function performs a logical “AND” aggregation operation on thebitmap
data in tablet1
, which is an intersection operation. This function only sets a corresponding bit in the resultbitmap
to 1 when the corresponding bits in allbitmaps
are 1. This operation is typically used to find common elements across all data sets.Use
RB_XOR_AGG
for symmetric difference aggregation operations:SELECT RB_XOR_AGG(bitmap) FROM t1;
The
RB_XOR_AGG
function performs a logical “XOR” aggregation operation, which is a symmetric difference. This operation compares allbitmaps
and sets those bits in the resultbitmap
that appear in an odd number ofbitmaps
. This means if a bit is 1 in an odd number ofbitmaps
, it will also be 1 in the result; if it is 1 in an even number ofbitmaps
, it will be 0 in the result. This helps identify special elements that appear only in some data sets.Use the
RB_BUILD_AGG
function for aggregation operations:SELECT RB_BUILD_AGG(e) FROM GENERATE_SERIES(1,100) e;
The above statement uses the
RB_BUILD_AGG
function to aggregate consecutive integers generated fromGENERATE_SERIES(1,100)
, creating aRoaringBitmap
that contains these integers. This function is typically used to create a compactRoaringBitmap
from a series of individual elements, which is very useful when processing continuous data or when efficient compression is needed.
Statistical cardinality
SELECT RB_CARDINALITY(bitmap) FROM t1;
The above statement uses the RB_CARDINALITY
function, which aims to calculate the cardinality of the bitmap
field in each row of table t1
, that is, the number of bits set to 1 in that RoaringBitmap
. Cardinality refers to the number of unique elements in a set. In the context of RoaringBitmap
, this means calculating how many different integers are in the set represented by the bitmap. For example, if a RoaringBitmap
contains the numbers 1, 2, 3, and 5, its cardinality is 4.
Convert bitmap to SETOF integer
Use the
RB_ITERATE
function for conversion:SELECT RB_ITERATE(bitmap) FROM t1 WHERE id = 1;
The
RB_ITERATE
function is typically used to iterate through all bits set to 1 in aRoaringBitmap
and return the values they represent. In the above query, this function will iterate through all elements in thebitmap
field whereid
is 1, returning a result set containing all these elements. This makes it convenient to view or process all elements in this bitmap.Use the
RB_ITERATE_DECREMENT
function for conversion:SELECT RB_ITERATE_DECREMENT(bitmap) FROM t1 WHERE id = 1;
The
RB_ITERATE_DECREMENT
function works similarly toRB_ITERATE
, but it returns elements in the bitmap in descending order during iteration. This means if the bitmap contains elements 1, 2, 3, 4, this function will return 4, 3, 2, 1. This is very useful when you need to process or examine elements in the bitmap in order from high to low.
Conversion between bytea and roaringbitmap data types
Convert
roaringbitmap
type tobytea
type:SELECT RB_BUILD('{1,2,3}')::BYTEA;
The above statement first uses the
RB_BUILD
function to create aRoaringBitmap
containing integers 1, 2, and 3. Subsequently, through the type conversion operator::BYTEA
, thisRoaringBitmap
object is converted tobytea
type.bytea
is a data type in PostgreSQL used to store binary data, which means the converted result is the binary representation of theRoaringBitmap
object.Convert
bytea
type toroaringbitmap
type:SELECT '\x3a3000000100000000000000100000000100'::ROARINGBITMAP;
The above statement starts with a binary string of
bytea
type, given in literal form (starting with\x
to indicate hexadecimal binary data). Through the type conversion operator::ROARINGBITMAP
, this binary data is converted toRoaringBitmap
type. This indicates that you can convertRoaringBitmap
data stored in binary format back to its corresponding bitmap data structure.