PostgreSQL 百万级每秒的流式实时统计应用

PipelineDB是基于PostgreSQL研发的一种流式关系数据库(0.8.1基于9.4.4),这种数据库的特点是自动处理流式数据,不存储原始数据,只存储处理后的数据,所以非常适合当下流行的实时流式数据处理,例如网站流量统计,IT服务的监控统计,APPStore的浏览统计等等。

http://www.postgresql.org/about/news/1596/
PipelineDB, an open-source relational streaming-SQL database, publicly released version (0.7.7) today and made the product available as open-source via their website and GitHub. PipelineDB is based on, and is wire compatible with, PostgreSQL 9.4 and has added functionality including continuous SQL queries, probabilistic data structures, sliding windowing, and stream-table joins. For a full description of PipelineDB and its capabilities see their technical documentation.

PipelineDB’s fundamental abstraction is what is called a continuous view. These are much like regular SQL views, except that their defining SELECT queries can include streams as a source to read from. The most important property of continuous views is that they only store their output in the database. That output is then continuously updated incrementally as new data flows through streams, and raw stream data is discarded once all continuous views have read it. Let's look at a canonical example:

   CREATE CONTINUOUS VIEW v AS SELECT COUNT(*) FROM stream
Only one row would ever physically exist in PipelineDB for this continuous view, and its value would simply be incremented for each new event ingested.

For more information on PipelineDB as a company, product and for examples and benefits, please check out their first blog post on their new website.

例子:
创建动态流视图,不需要对表进行定义,太棒了,这类似活生生的NoSQL。

pipeline=# CREATE CONTINUOUS VIEW v0 AS SELECT COUNT(*) FROM stream;
CREATE CONTINUOUS VIEW
pipeline=# CREATE CONTINUOUS VIEW v1 AS SELECT COUNT(*) FROM stream;
CREATE CONTINUOUS VIEW

激活流视图

pipeline=# ACTIVATE;
ACTIVATE 2

往流写入数据

pipeline=# INSERT INTO stream (x) VALUES (1);
INSERT 0 1
pipeline=# SET stream_targets TO v0;
SET
pipeline=# INSERT INTO stream (x) VALUES (1);
INSERT 0 1
pipeline=# SET stream_targets TO DEFAULT;
SET
pipeline=# INSERT INTO stream (x) VALUES (1);
INSERT 0 1
-- 如果不想接收流数据了,停止即可
pipeline=# DEACTIVATE;
DEACTIVATE 2

查询流视图

pipeline=# SELECT count FROM v0;
 count
-------
     3
(1 row)
pipeline=# SELECT count FROM v1;
 count
-------
     2
(1 row)
pipeline=#

在本地虚拟机进行试用
安装

[root@digoal soft_bak]# rpm -ivh pipelinedb-0.8.1-centos6-x86_64.rpm
Preparing...                ########################################### [100%]
   1:pipelinedb             ########################################### [100%]
/sbin/ldconfig: /opt/gcc4.9.3/lib/libstdc++.so.6.0.20-gdb.py is not an ELF file - it has the wrong magic bytes at the start.

/sbin/ldconfig: /opt/gcc4.9.3/lib64/libstdc++.so.6.0.20-gdb.py is not an ELF file - it has the wrong magic bytes at the start.

    _  _                          
   /  \(_)_    / (_)  _  /  \/ __ )
  / /_/ / /  \/ _ \/ / /  \/ _ \/ / / / __  |
 / ____/ / /_/ /  __/ / / / / /  __/ /_/ / /_/ /
/_/   /_/ .___/\___/_/_/_/ /_/\___/_____/_____/
       /_/

PipelineDB successfully installed. To get started, initialize a
database directory:

pipeline-init -D <data directory>

where <data directory> is a nonexistent directory where you'd
like all of your database files to live. 

You can find the PipelineDB documentation at:

http://docs.pipelinedb.com

配置

[root@digoal soft_bak]# cd /usr/lib/pipelinedb
[root@digoal pipelinedb]# ll
total 16
drwxr-xr-x 2 root root 4096 Oct 15 10:47 bin
drwxr-xr-x 5 root root 4096 Oct 15 10:47 include
drwxr-xr-x 6 root root 4096 Oct 15 10:47 lib
drwxr-xr-x 4 root root 4096 Oct 15 10:47 share

[root@digoal pipelinedb]# useradd pdb
[root@digoal pipelinedb]# vi /home/pdb/.bash_profile
# add by digoal
export PS1="$USER@`/bin/hostname -s`-> "
export PGPORT=1953
export PGDATA=/data01/pg_root_1953
export LANG=en_US.utf8
export PGHOME=/usr/lib/pipelinedb
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH
export DATE=`date +"%Y%m%d%H%M"`
export PATH=$PGHOME/bin:$PATH:.
export MANPATH=$PGHOME/share/man:$MANPATH
export PGHOST=$PGDATA
export PGDATABASE=pipeline
export PGUSER=postgres
alias rm='rm -i'
alias ll='ls -lh'
unalias vi

[root@digoal pipelinedb]# mkdir /data01/pg_root_1953
[root@digoal pipelinedb]# chown pdb:pdb /data01/pg_root_1953
[root@digoal pipelinedb]# chmod 700 /data01/pg_root_1953

[root@digoal pipelinedb]# su - pdb
pdb@digoal-> which psql
/usr/lib/pipelinedb/bin/psql

初始化数据库

pdb@digoal-> psql -V
psql (PostgreSQL) 9.4.4

pdb@digoal-> cd /usr/lib/pipelinedb/bin/
pdb@digoal-> ll
total 13M
-rwxr-xr-x 1 root root  62K Sep 18 01:01 clusterdb
-rwxr-xr-x 1 root root  62K Sep 18 01:01 createdb
-rwxr-xr-x 1 root root  66K Sep 18 01:01 createlang
-rwxr-xr-x 1 root root  63K Sep 18 01:01 createuser
-rwxr-xr-x 1 root root  44K Sep 18 01:02 cs2cs
-rwxr-xr-x 1 root root  58K Sep 18 01:01 dropdb
-rwxr-xr-x 1 root root  66K Sep 18 01:01 droplang
-rwxr-xr-x 1 root root  58K Sep 18 01:01 dropuser
-rwxr-xr-x 1 root root 776K Sep 18 01:01 ecpg
-rwxr-xr-x 1 root root  28K Sep 18 00:57 gdaladdo
-rwxr-xr-x 1 root root  79K Sep 18 00:57 gdalbuildvrt
-rwxr-xr-x 1 root root 1.3K Sep 18 00:57 gdal-config
-rwxr-xr-x 1 root root  33K Sep 18 00:57 gdal_contour
-rwxr-xr-x 1 root root 188K Sep 18 00:57 gdaldem
-rwxr-xr-x 1 root root  74K Sep 18 00:57 gdalenhance
-rwxr-xr-x 1 root root 131K Sep 18 00:57 gdal_grid
-rwxr-xr-x 1 root root  83K Sep 18 00:57 gdalinfo
-rwxr-xr-x 1 root root  90K Sep 18 00:57 gdallocationinfo
-rwxr-xr-x 1 root root  42K Sep 18 00:57 gdalmanage
-rwxr-xr-x 1 root root 236K Sep 18 00:57 gdal_rasterize
-rwxr-xr-x 1 root root  25K Sep 18 00:57 gdalserver
-rwxr-xr-x 1 root root  77K Sep 18 00:57 gdalsrsinfo
-rwxr-xr-x 1 root root  49K Sep 18 00:57 gdaltindex
-rwxr-xr-x 1 root root  33K Sep 18 00:57 gdaltransform
-rwxr-xr-x 1 root root 158K Sep 18 00:57 gdal_translate
-rwxr-xr-x 1 root root 168K Sep 18 00:57 gdalwarp
-rwxr-xr-x 1 root root  41K Sep 18 01:02 geod
-rwxr-xr-x 1 root root 1.3K Sep 18 00:51 geos-config
lrwxrwxrwx 1 root root    4 Oct 15 10:47 invgeod -> geod
lrwxrwxrwx 1 root root    4 Oct 15 10:47 invproj -> proj
-rwxr-xr-x 1 root root  20K Sep 18 01:02 nad2bin
-rwxr-xr-x 1 root root 186K Sep 18 00:57 nearblack
-rwxr-xr-x 1 root root 374K Sep 18 00:57 ogr2ogr
-rwxr-xr-x 1 root root  77K Sep 18 00:57 ogrinfo
-rwxr-xr-x 1 root root 283K Sep 18 00:57 ogrlineref
-rwxr-xr-x 1 root root  47K Sep 18 00:57 ogrtindex
-rwxr-xr-x 1 root root  30K Sep 18 01:01 pg_config
-rwxr-xr-x 1 root root  30K Sep 18 01:01 pg_controldata
-rwxr-xr-x 1 root root  33K Sep 18 01:01 pg_isready
-rwxr-xr-x 1 root root  39K Sep 18 01:01 pg_resetxlog
-rwxr-xr-x 1 root root 183K Sep 18 01:02 pgsql2shp
lrwxrwxrwx 1 root root    4 Oct 15 10:47 pipeline -> psql
-rwxr-xr-x 1 root root  74K Sep 18 01:01 pipeline-basebackup
lrwxrwxrwx 1 root root    9 Oct 15 10:47 pipeline-config -> pg_config
-rwxr-xr-x 1 root root  44K Sep 18 01:01 pipeline-ctl
-rwxr-xr-x 1 root root 355K Sep 18 01:01 pipeline-dump
-rwxr-xr-x 1 root root  83K Sep 18 01:01 pipeline-dumpall
-rwxr-xr-x 1 root root 105K Sep 18 01:01 pipeline-init
-rwxr-xr-x 1 root root  50K Sep 18 01:01 pipeline-receivexlog
-rwxr-xr-x 1 root root  56K Sep 18 01:01 pipeline-recvlogical
-rwxr-xr-x 1 root root 153K Sep 18 01:01 pipeline-restore
-rwxr-xr-x 1 root root 6.2M Sep 18 01:01 pipeline-server
lrwxrwxrwx 1 root root   15 Oct 15 10:47 postmaster -> pipeline-server
-rwxr-xr-x 1 root root  49K Sep 18 01:02 proj
-rwxr-xr-x 1 root root 445K Sep 18 01:01 psql
-rwxr-xr-x 1 root root 439K Sep 18 01:02 raster2pgsql
-rwxr-xr-x 1 root root  62K Sep 18 01:01 reindexdb
-rwxr-xr-x 1 root root 181K Sep 18 01:02 shp2pgsql
-rwxr-xr-x 1 root root  27K Sep 18 00:57 testepsg
-rwxr-xr-x 1 root root  63K Sep 18 01:01 vacuumdb

pdb@digoal-> pipeline-init -D $PGDATA -U postgres -E UTF8 --locale=C -W
pdb@digoal-> cd $PGDATA
pdb@digoal-> ll
total 108K
drwx------ 5 pdb pdb 4.0K Oct 15 10:57 base
drwx------ 2 pdb pdb 4.0K Oct 15 10:57 global
drwx------ 2 pdb pdb 4.0K Oct 15 10:57 pg_clog
drwx------ 2 pdb pdb 4.0K Oct 15 10:57 pg_dynshmem
-rw------- 1 pdb pdb 4.4K Oct 15 10:57 pg_hba.conf
-rw------- 1 pdb pdb 1.6K Oct 15 10:57 pg_ident.conf
drwx------ 4 pdb pdb 4.0K Oct 15 10:57 pg_logical
drwx------ 4 pdb pdb 4.0K Oct 15 10:57 pg_multixact
drwx------ 2 pdb pdb 4.0K Oct 15 10:57 pg_notify
drwx------ 2 pdb pdb 4.0K Oct 15 10:57 pg_replslot
drwx------ 2 pdb pdb 4.0K Oct 15 10:57 pg_serial
drwx------ 2 pdb pdb 4.0K Oct 15 10:57 pg_snapshots
drwx------ 2 pdb pdb 4.0K Oct 15 10:57 pg_stat
drwx------ 2 pdb pdb 4.0K Oct 15 10:57 pg_stat_tmp
drwx------ 2 pdb pdb 4.0K Oct 15 10:57 pg_subtrans
drwx------ 2 pdb pdb 4.0K Oct 15 10:57 pg_tblspc
drwx------ 2 pdb pdb 4.0K Oct 15 10:57 pg_twophase
-rw------- 1 pdb pdb    4 Oct 15 10:57 PG_VERSION
drwx------ 3 pdb pdb 4.0K Oct 15 10:57 pg_xlog
-rw------- 1 pdb pdb   88 Oct 15 10:57 pipelinedb.auto.conf
-rw------- 1 pdb pdb  23K Oct 15 10:57 pipelinedb.conf

和流处理相关的参数,例如设置内存大小,是否同步,合并的batch,工作进程数等等。

pipelinedb.conf
#------------------------------------------------------------------------------
# CONTINUOUS VIEW OPTIONS
#------------------------------------------------------------------------------

# size of the buffer for storing unread stream tuples
#tuple_buffer_blocks = 128MB

# synchronization level for combiner commits; off, local, remote_write, or on
#continuous_query_combiner_synchronous_commit = off

# maximum amount of memory to use for combiner query executions
#continuous_query_combiner_work_mem = 256MB

# maximum memory to be used by the combiner for caching; this is independent
# of combiner_work_mem
#continuous_query_combiner_cache_mem = 32MB

# the default fillfactor to use for continuous views
#continuous_view_fillfactor = 50

# the time in milliseconds a continuous query process will wait for a batch
# to accumulate
# continuous_query_max_wait = 10

# the maximum number of events to accumulate before executing a continuous query
# plan on them
#continuous_query_batch_size = 10000

# the number of parallel continuous query combiner processes to use for
# each database
#continuous_query_num_combiners = 2

# the number of parallel continuous query worker processes to use for
# each database
#continuous_query_num_workers = 2

# allow direct changes to be made to materialization tables?
#continuous_query_materialization_table_updatable = off

# inserts into streams should be synchronous?
#synchronous_stream_insert = off

# continuous views that should be affected when writing to streams.
# it is string with comma separated values for continuous view names.
#stream_targets = ''

启动数据库,可以看到原生是支持postgis的,吐个槽,这个项目是专门为NASA研发的么?

pdb@digoal-> pipeline-ctl start
pdb@digoal-> psql pipeline postgres
psql (9.4.4)
Type "help" for help.

pipeline=# \l
                             List of databases
   Name    |  Owner   | Encoding | Collate | Ctype |   Access privileges
-----------+----------+----------+---------+-------+-----------------------
 pipeline  | postgres | UTF8     | C       | C     |
 template0 | postgres | UTF8     | C       | C     | =c/postgres          +
           |          |          |         |       | postgres=CTc/postgres
 template1 | postgres | UTF8     | C       | C     | =c/postgres          +
           |          |          |         |       | postgres=CTc/postgres
(3 rows)
pipeline=# \dx
                                          List of installed extensions
       Name       | Version  |   Schema   |                             Description
------------------+----------+------------+---------------------------------------------------------------------
 plpgsql          | 1.0      | pg_catalog | PL/pgSQL procedural language
 postgis          | 2.2.0dev | pg_catalog | PostGIS geometry, geography, and raster spatial types and functions
 postgis_topology | 2.2.0dev | topology   | PostGIS topology spatial types and functions
(3 rows)

查看pipelinedb加了哪些函数,有些是插件形式加入的,如POSTGIS,有些是我们可以借鉴,直接拿来用的。

pipeline=# select proname from pg_proc order by oid desc;
......
 second
 minute
 hour
 day
 month
 year
......
 cmsketch_empty
 tdigest_add
 tdigest_empty
 tdigest_empty
 bloom_add
 bloom_empty
 bloom_empty
 hll_add
 hll_empty
 hll_empty
......

可以看到pipelinedb加入了hll,bloom,tdigest,cmsketch算法,还有很多可以发掘,例如支持grouping set, 窗口查询的流视图等等。

在我自己的笔记本中的虚拟机中的性能测试:
创建5个动态流视图,动态流视图就是不需要建立基表的流视图。

CREATE CONTINUOUS VIEW v0 AS SELECT COUNT(*) FROM stream;
CREATE CONTINUOUS VIEW v1 AS SELECT sum(x::int),count(*),avg(y::int) FROM stream;
CREATE CONTINUOUS VIEW v001 AS SELECT sum(x::int),count(*),avg(y::int) FROM stream1;
CREATE CONTINUOUS VIEW v002 AS SELECT sum(x::int),count(*),avg(y::int) FROM stream2;
CREATE CONTINUOUS VIEW v003 AS SELECT sum(x::int),count(*),avg(y::int) FROM stream3;

激活流统计

activate;

查看数据字典

select relname from pg_class where relkind='C';

批量插入测试

pdb@digoal-> vi test.sql
insert into stream(x,y,z) select generate_series(1,1000),1,1;
insert into stream1(x,y,z) select generate_series(1,1000),1,1;
insert into stream2(x,y,z) select generate_series(1,1000),1,1;
insert into stream3(x,y,z) select generate_series(1,1000),1,1;

测试结果,注意这里需要使用simple或者extended , 如果用prepared会导致只有最后一条SQL起作用。现在不清楚是pipelinedb还是pgbench的BUG。

pdb@digoal-> /opt/pgsql/bin/pgbench -M extended -n -r -f ./test.sql -P 1 -c 10 -j 10 -T 100000
progress: 1.0 s, 133.8 tps, lat 68.279 ms stddev 58.444
progress: 2.0 s, 143.9 tps, lat 71.623 ms stddev 53.880
progress: 3.0 s, 149.5 tps, lat 66.452 ms stddev 49.727
progress: 4.0 s, 148.3 tps, lat 67.085 ms stddev 55.484
progress: 5.1 s, 145.7 tps, lat 68.624 ms stddev 67.795

每秒入库约58万条记录,并完成5个流视图的统计。
如果用物理机的话,估计可以到500万每秒的级别。后面有时间再试试。
因为都在内存中完成,所以速度非常快。
pipelinedb使用了worker进程来处理数据合并。
压测时的top如下:

top - 11:23:07 up  2:49,  4 users,  load average: 1.83, 3.08, 1.78
Tasks: 177 total,   5 running, 172 sleeping,   0 stopped,   0 zombie
Cpu(s): 11.6%us, 15.0%sy, 10.3%ni, 63.0%id,  0.0%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:   3916744k total,   605084k used,  3311660k free,    27872k buffers
Swap:  1048572k total,        0k used,  1048572k free,   401748k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
11469 pdb       25   5  405m  75m  67m R 52.9  2.0   1:56.45 pipeline: bgworker: worker0 [pipeline]
12246 pdb       20   0  400m  69m  67m S 14.3  1.8   0:10.55 pipeline: postgres pipeline [local] idle
12243 pdb       20   0  400m  69m  67m S 13.3  1.8   0:10.45 pipeline: postgres pipeline [local] idle
12248 pdb       20   0  400m  69m  67m S 13.3  1.8   0:10.40 pipeline: postgres pipeline [local] idle
12244 pdb       20   0  400m  69m  67m S 12.6  1.8   0:10.50 pipeline: postgres pipeline [local] idle
12237 pdb       20   0  400m  69m  67m R 12.3  1.8   0:10.52 pipeline: postgres pipeline [local] idle
12247 pdb       20   0  402m  70m  67m R 12.3  1.8   0:10.70 pipeline: postgres pipeline [local] idle
12245 pdb       20   0  401m  69m  67m S 12.0  1.8   0:10.78 pipeline: postgres pipeline [local] idle
12235 pdb       20   0  400m  69m  67m S 11.3  1.8   0:10.88 pipeline: postgres pipeline [local] idle
12239 pdb       20   0  400m  69m  67m S 11.0  1.8   0:10.79 pipeline: postgres pipeline [local] idle
12241 pdb       20   0  400m  69m  67m S 11.0  1.8   0:10.53 pipeline: postgres pipeline [local] idle
11466 pdb       20   0  119m 1480  908 R  5.3  0.0   0:58.39 pipeline: stats collector process
11468 pdb       25   5  401m  12m 9744 S  2.3  0.3   0:16.49 pipeline: bgworker: combiner0 [pipeline]
12228 pdb       20   0  678m 3408  884 S  2.3  0.1   0:02.36 /opt/pgsql/bin/pgbench -M extended -n -r -f ./test.sql -P 1 -c 10 -j 10 -T 100000
11464 pdb       20   0  398m  17m  16m S  1.7  0.4   0:10.47 pipeline: wal writer process
11459 pdb       20   0  398m 153m 153m S  0.0  4.0   0:00.37 /usr/lib/pipelinedb/bin/pipeline-server
11460 pdb       20   0  115m  852  424 S  0.0  0.0   0:00.02 pipeline: logger process
11462 pdb       20   0  398m 3336 2816 S  0.0  0.1   0:00.06 pipeline: checkpointer process
11463 pdb       20   0  398m 2080 1604 S  0.0  0.1   0:00.08 pipeline: writer process
11465 pdb       20   0  401m 4460 1184 S  0.0  0.1   0:00.33 pipeline: autovacuum launcher process
11467 pdb       20   0  398m 1992 1056 S  0.0  0.1   0:00.00 pipeline: continuous query scheduler process

pdb@digoal-> psql
psql (9.4.4)
Type "help" for help.
pipeline=# select * from v0;
  count
---------
 9732439
(1 row)

pipeline=# select * from v1;
    sum     |  count  |          avg
------------+---------+------------------------
 4923514276 | 9837585 | 1.00000000000000000000
(1 row)

pipeline=# select * from v001;
     sum      |  count   |          avg
--------------+----------+------------------------
 505023543131 | 11036501 | 1.00000000000000000000
(1 row)

pipeline=# select * from v002;
      sum      |  count   |          avg
---------------+----------+------------------------
 1005065536319 | 12119513 | 1.00000000000000000000
(1 row)

pipeline=# select * from v003;
     sum     |  count   |          avg
-------------+----------+------------------------
 14948355485 | 29867002 | 1.00000000000000000000
(1 row)

在写入 10 亿 流数据后,数据库的大小依旧只有13MB,因为流数据都在内存中,处理完就丢弃了。

pipeline=# \l+
                                                              List of databases
   Name    |  Owner   | Encoding | Collate | Ctype |   Access privileges   | Size  | Tablespace |                Description
-----------+----------+----------+---------+-------+-----------------------+-------+------------+--------------------------------------------
 pipeline  | postgres | UTF8     | C       | C     |                       | 13 MB | pg_default | default administrative connection database
 template0 | postgres | UTF8     | C       | C     | =c/postgres          +| 12 MB | pg_default | unmodifiable empty database
           |          |          |         |       | postgres=CTc/postgres |       |            |
 template1 | postgres | UTF8     | C       | C     | =c/postgres          +| 12 MB | pg_default | default template for new databases
           |          |          |         |       | postgres=CTc/postgres |       |            |
(3 rows)

如果你的应用有类似场景,恭喜你,找到杀手锏了。

[参考]
https://github.com/pipelinedb/pipelinedb
https://www.pipelinedb.com/

时间: 2024-04-26 14:45:46

PostgreSQL 百万级每秒的流式实时统计应用的相关文章

从Storm和Spark 学习流式实时分布式计算的设计

0. 背景 最近我在做流式实时分布式计算系统的架构设计,而正好又要参加CSDN博文大赛的决赛.本来想就写Spark源码分析的文章吧.但是又想毕竟是决赛,要拿出一些自己的干货出来,仅仅是源码分析貌似分量不够.因此,我将最近一直在做的系统架构的思路整理出来,形成此文.为什么要参考Storm和Spark,因为没有参照效果可能不会太好,尤其是对于Storm和Spark由了解的同学来说,可能通过对比,更能体会到每个具体实现背后的意义. 本文对流式系统出现的背景,特点,数据HA,服务HA,节点间和计算逻辑间

旋转门数据压缩算法在PostgreSQL中的实现 - 流式压缩在物联网、监控、传感器等场景的应用

背景 在物联网.监控.传感器.金融等应用领域,数据在时间维度上流式的产生,而且数据量非常庞大. 例如我们经常看到的性能监控视图,就是很多点在时间维度上描绘的曲线. 又比如金融行业的走势数据等等. 我们想象一下,如果每个传感器或指标每100毫秒产生1个点,一天就是864000个点. 而传感器或指标是非常多的,例如有100万个传感器或指标,一天的量就接近一亿的量. 假设我们要描绘一个时间段的图形,这么多的点,渲染估计都要很久. 那么有没有好的压缩算法,即能保证失真度,又能很好的对数据进行压缩呢? 旋

HTAP数据库 PostgreSQL 场景与性能测试之 32 - (OLTP) 高吞吐数据进出(堆存、行扫、无需索引) - 阅后即焚(JSON + 函数流式计算)

标签 PostgreSQL , HTAP , OLTP , OLAP , 场景与性能测试 背景 PostgreSQL是一个历史悠久的数据库,历史可以追溯到1973年,最早由2014计算机图灵奖得主,关系数据库的鼻祖Michael_Stonebraker 操刀设计,PostgreSQL具备与Oracle类似的功能.性能.架构以及稳定性. PostgreSQL社区的贡献者众多,来自全球各个行业,历经数年,PostgreSQL 每年发布一个大版本,以持久的生命力和稳定性著称. 2017年10月,Pos

PostgreSQL 百亿数据 秒级响应 正则及模糊查询

正则匹配和模糊匹配通常是搜索引擎的特长,但是如果你使用的是 PostgreSQL 数据库照样能实现,并且性能不赖,加上分布式方案 (譬如 plproxy, pg_shard, fdw shard, pg-xc, pg-xl, greenplum),处理百亿以上数据量的正则匹配和模糊匹配效果杠杠的,同时还不失数据库固有的功能,一举多得. 物联网中有大量的数据,除了数字数据,还有字符串类的数据,例如条形码,车牌,手机号,邮箱,姓名等等. 假设用户需要在大量的传感数据中进行模糊检索,甚至规则表达式匹配

PostgreSQL 流式统计 - insert on conflict 实现 流式 UV(distinct), min, max, avg, sum, count ...

标签 PostgreSQL , 流式统计 , insert on conflict , count , avg , min , max , sum 背景 流式统计count, avg, min, max, sum等是一个比较有意思的场景,可用于实时大屏,实时绘制统计图表. 比如菜鸟.淘宝.阿里游戏.以及其他业务系统的FEED日志,按各个维度实时统计输出结果.(实时FEED统计,实时各维度在线人数等) PostgreSQL insert on conflict语法以及rule, trigger的功

PostgreSQL &quot;物联网&quot;应用 - 1 实时流式数据处理案例(万亿每天)

物联网的一个特点是万物联网,会产生大量的数据.例如 :一盒药,从生产,到运输,到药店,到售卖.每流经一个节点,都会记录它的信息.又如 :健康手环,儿童防丢手表,一些动物迁徙研究的传感器(如中华鲟),水纹监测,电网监测,煤气管道监测,气象监测等等这些信息.股价的实时预测.车流实时数据统计,车辆轨迹实时合并.商场人流实时统计.数据监控实时处理,例如数据库的监控,服务器的监控,操作系统的监控等.等等......传感器种类繁多,采集的数据量已经达到了海量.这些数据比电商双十一的量有过之而不及,怎样才能处

PostgreSQL 10 新特性, 流式接收端在线压缩redo

标签 PostgreSQL , redo 在线压缩 , wal 在线压缩 , 接收端压缩 , pg_receivexlog , pg_basebackup , 断点续传 背景 虽然现在磁盘已经很廉价,大多数时候不需要压缩了. 但是在一些嵌入式系统,或者一些未扩容的产品环境中,压缩还是非常有必要的. 特别是数据变化大.或者数据增量大的场景,例如物联网(IoT),每天单库可能新增TB级的增量. 比如 <PostgreSQL 如何潇洒的处理每天上百TB的数据增量> 指的就是IoT场景. 这样的场景,

PostgreSQL 10 流式物理、逻辑主从 最佳实践

标签 PostgreSQL , 流复制 , 主从 , 逻辑订阅 背景 流复制起源 PostgreSQL 自从2010年推出的9.0版本开始,支持流式物理复制,用户可以通过流式复制,构建只读备库(主备物理复制,块级别一致).流式物理复制可以做到极低的延迟(通常在1毫秒以内). 同步流复制 2011年推出的9.1版本,支持同步复制,当时只支持一个同步流复制备节点(例如配置了3个备,只有一个是同步模式的,其他都是异步模式). 在同步复制模式下,当用户提交事务时,需要等待这笔事务的WAL日志复制到同步流

阿里云RDS PG实践 - 流式标签 - 万亿级,实时任意标签圈人

标签 PostgreSQL , 阅后即焚 , 流计算 , 标签 背景 varbitx是阿里云RDS PG提供的一个BIT操作插件,使用这个插件已经成功的帮助用户提供了万亿级的毫秒级实时圈人功能. <阿里云RDS for PostgreSQL varbitx插件与实时画像应用场景介绍> <基于 阿里云 RDS PostgreSQL 打造实时用户画像推荐系统(varbitx)> 结合阅后即焚的流式批量处理,schemaless UDF,可以实现高效的增.删标签,以及毫秒级别的按标签圈人