Below isPresto,Impala Simple test comparison of these two typical memory databases, Of course, this kind of memory database is similar tospark
sql, This kind of database has a large amount of data, When multi table associated query, Will show their own advantages, Here is a groupimpala andpresto Performance comparison chart of:

Environmental preparation:1 platform32G Memory,2 platform16G Memory, Not fully saturated memory configuration

test data:hive in3 Zhang2000W Table of data quantity

colony:impala andpresro Deployed in3 On the machine

presto Edition:presto-server-0.191
(presto install:

impala Edition:2.8.0-cdh5.11.0

1, Aggregate operation of single table


1s(presto At present, it is only accurate to integer, So less1s Also show1s)




Take away3 second, Namely:4,3,3 (s)


Take away3 second:0.74,0.75,0.76(s)

2, Single valued query

Presto : Query oneID Records

3 second:6,5,6(s)


3 Times are in1.7s About

3, Two table correlation(2 Zhang2000W Table dojoin)


3 Secondary result:9,11,9


3 Secondary result7s About

4,3 Table correlation(3 Zhang2000W Table dojoin)


4 Secondary result:13,11,15,12(s)


3 Secondary result8.9s About

summary: This is a comparison of query efficiency in some scenarios, Not a lot of data, But I can see some problems, What they have in common is to eat memory, Of course, with enough memory, And there are clusters of appropriate scale, Performance should be better, As can be seen from the figure aboveImpala Slightly ahead of performancepresto, howeverpresto Rich data source support, Includehive, Graph database, Traditional relational database,Redis etc.

shortcoming: These two kinds of pairs.hbase No good support,presto
I won't support it, But yes.hdfs,hive Good compatibility, In fact, it's natural, So data source processing is very important, In the light ofhbase The second level index query ofphoenix, It's not bad either