Linux.中国 - 开源社区

 找回密码
 骑士注册

QQ登录

微博登录


Apache Drill 1.0发布

2015-5-21 14:53       

https://dn-linuxcn.qbox.me/data/attachment/album/201505/21/145302d7pbp8dr7pp084p4.jpg

虽然大数据往往将关系型数据库当作靶子,但事实上真正生产环境的Hadoop和Spark等大数据平台,每天大部分工作仍然是为SQL查询提供服务,所以,SQL on Hadoop就成了竞争最激烈的技术领域。

5月19日,Apache基金会宣布针对Hadoop、NoSQL(MongoDB和HBase)和云存储(Amazon S3, Google Cloud Storage, Azure Blog Storage, Swift)的无模式SQL查询引擎Drill 1.0发布。

项目的PMC成员Tomer Shiran说:

这是许多公司数十名工程师将近三年开发的成果。Apache Drill的灵活性和易用性已经吸引了数千维护,而1.0版的企业级可靠性、安全与性能将进一步加速采用。

发布声明中列出的相对于0.9的重要改进包括:

  • Substantial improvements in stability, memory handling and performance
  • Improvements in Drill CLI experience with addition of convenience shortcuts and improved colors/alignment
  • Substantial additions to documentation including coverage of troubleshooting, performance tuning and many additions to the SQL reference
  • Enhancements in join planning to facilitate high speed planning of large and complicated joins
  • Add support for new context functions including CURRENTUSER and CURRENTSCHEMA
  • Ability to treat all numbers as approximate decimals when reading JSON
  • Enhancements in Drill's text and CSV handling to support first row skipping, configurable field/line delimiters and configurable quoting
  • Improved JDBC compatibility (and tracing proxy for easy debugging).
  • Ability to do JDBC connections with direct urls (avoiding ZooKeeper)
  • Automatic selection of spooling or back-pressure exchange semantics to avoid distributed deadlocks in complex sort-heavy queries
  • Improvements in query profile reporting
  • Addition of ILIKE(VARCHAR, PATTERN) and SUBSTR(VARCHAR, REGEX) functions

更多详情可以参考官方网站:http://drill.apache.org

Drill实际上是MapR在主导的,项目负责人和核心开发者大多来自MapR。它实际上是众多SQL on Hadoop中的一个,此外还包括:

  • Hadoop上原生的Hive
  • Hortonworks主导的Hive演进项目Stinger
  • Cloudera主导的Impala
  • MapR主导的Apache Drill
  • Facebook的Presto
  • Pivotal的Greenplum
  • Salesforce最初开发的Apache Phoenix
  • 出自韩国的Apache Tajo(Google Tenzing的模仿)
  • Spark社区的Spark SQL
  • Splice Machine

从逻辑上来说,除了Spark SQL会借助Spark的火势取得一定优势外,其余最值得关注的还是Hadoop三巨头分别支持的Impala、Stinger和Drill。Impala和Drill都是在Google Dremel启发下产生的,之前好像Impala势头较猛,但现在Drill有迎头赶上的意思。

Drill这次正式发布,在FAQ里专门做了比较:

https://dn-linuxcn.qbox.me/data/attachment/album/201505/21/145302dv99v0p30zy2kf2c.png

 

发表评论


最新评论

我也要发表评论

相关阅读

返回顶部

分享到微信朋友圈

打开微信,点击底部的“发现”,
使用“扫一扫”将网页分享至朋友圈。