Below isPresto,Impala Simple test comparison of these two typical memory databases, Of course, this kind of memory database is similar tospark
sql, This kind of database has a large amount of data, When multi table associated query, Will show their own advantages, Here is a groupimpala andpresto Performance comparison chart of:



Environmental preparation:1 platform32G Memory,2 platform16G Memory, Not fully saturated memory configuration

test data:hive in3 Zhang2000W Table of data quantity

colony:impala andpresro Deployed in3 On the machine



presto Edition:presto-server-0.191
(presto install:http://blog.csdn.net/u012551524/article/details/79013194)

impala Edition:2.8.0-cdh5.11.0




1, Aggregate operation of single table




Presto:count







1s(presto At present, it is only accurate to integer, So less1s Also show1s)




Impala:count




0.24s




Presto:count,distinct





Take away3 second, Namely:4,3,3 (s)







Impala:count,distinct




Take away3 second:0.74,0.75,0.76(s)




2, Single valued query



Presto : Query oneID Records


3 second:6,5,6(s)




Impala:




3 Times are in1.7s About




3, Two table correlation(2 Zhang2000W Table dojoin)

Presto:




3 Secondary result:9,11,9




Impala:




3 Secondary result7s About




4,3 Table correlation(3 Zhang2000W Table dojoin)

Presto:




4 Secondary result:13,11,15,12(s)




Impala:







3 Secondary result8.9s About





summary: This is a comparison of query efficiency in some scenarios, Not a lot of data, But I can see some problems, What they have in common is to eat memory, Of course, with enough memory, And there are clusters of appropriate scale, Performance should be better, As can be seen from the figure aboveImpala Slightly ahead of performancepresto, howeverpresto Rich data source support, Includehive, Graph database, Traditional relational database,Redis etc.




shortcoming: These two kinds of pairs.hbase No good support,presto
I won't support it, But yes.hdfs,hive Good compatibility, In fact, it's natural, So data source processing is very important, In the light ofhbase The second level index query ofphoenix, It's not bad either