Overview
GreptimeDB provides a continuous aggregation feature that allows you to aggregate data in real-time. This feature is useful when you need to calculate and query the sum, average, or other aggregations on the fly. The continuous aggregation feature is provided by the Flow engine. It continuously updates the aggregated data based on the incoming data and materialize it.
When you insert data into the source table, the data is also sent to and stored in the Flow engine. The Flow engine calculate the aggregation by time windows and store the result in the sink table. The entire process is illustrated in the following image:
Quick start with an example
Here is a complete example of how a continuous aggregation query looks like.
First, create a source table numbers_input
and a sink table out_num_cnt
with following clauses:
CREATE TABLE numbers_input (
number INT,
ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY(number),
TIME INDEX(ts)
);
CREATE TABLE out_num_cnt (
sum_number BIGINT,
start_window TIMESTAMP TIME INDEX,
end_window TIMESTAMP,
update_at TIMESTAMP,
);
Then create the flow test_numbers
to aggregate the sum of number
column in numbers_input
table. The aggregation is calculated in 1-second fixed windows.
CREATE FLOW test_numbers
SINK TO out_num_cnt
AS
SELECT sum(number) FROM numbers_input GROUP BY tumble(ts, '1 second', '2021-07-01 00:00:00');
To observe the outcome of the continuous aggregation in the out_num_cnt
table, insert some data into the source table numbers_input
.
INSERT INTO numbers_input
VALUES
(20, "2021-07-01 00:00:00.200"),
(22, "2021-07-01 00:00:00.600");
The sum of the number
column is 42 (20+22), so the sink table out_num_cnt
should have the following data:
SELECT * FROM out_num_cnt;
sum_number | start_window | end_window | update_at
------------+----------------------------+----------------------------+----------------------------
42 | 2021-07-01 00:00:00.000000 | 2021-07-01 00:00:01.000000 | 2024-05-17 08:32:56.026000
(1 row)
Try to insert more data into the numbers_input
table:
INSERT INTO numbers_input
VALUES
(23,"2021-07-01 00:00:01.000"),
(24,"2021-07-01 00:00:01.500");
The sink table out_num_cnt
now contains two rows: representing the sum data 42 and 47 (23+24) for the two respective 1-second windows.
SELECT * FROM out_num_cnt;
sum_number | start_window | end_window | update_at
------------+----------------------------+----------------------------+----------------------------
42 | 2021-07-01 00:00:00.000000 | 2021-07-01 00:00:01.000000 | 2024-05-17 08:32:56.026000
47 | 2021-07-01 00:00:01.000000 | 2021-07-01 00:00:02.000000 | 2024-05-17 08:33:10.048000
(2 rows)
Here is the explanation of the columns in the out_num_cnt
table:
sum_number
: the sum of thenumber
column in the window.start_window
: the start time of the window.end_window
: the end time of the window.update_at
: the time when the row data is updated.
The start_window
, end_window
, and update_at
columns are automatically added by the time window functions of Flow engine.
Next Steps
Congratulations you already have a preliminary understanding of the continuous aggregation feature. Please refer to the following sections to learn more:
- Manage Flows describes how to create, update, and delete a flow. Each of your continuous aggregation query is a flow.
- Write a Query describes how to write a continuous aggregation query.
- Define Time Window describes how to define the time window for the continuous aggregation. Time window is an important attribute of your continuous aggregation query. It defines the time interval for the aggregation.
- Expression is a reference of available expressions in the continuous aggregation query.