For Hadoop 2.2, let’s have a look at major players in the Hadoop ecosystem: Hortonworks, Cloudera and MapR.
Hortonworks announced in June the general availability of their Hortonworks Data Platform (HDP). The HDP distro is 100% Apache open source code. The major difference from Cloudera and MapR is that HDP usesApache Ambari for cluster management and monitoring. In its current 0.9 version, Ambari certainly can’t be so mature as Cloudera’s Manager or MapR’s Heatmap. The Hortonworks Data Platform is open source to its core – no proprietary layers. You’ll therefore never have a vendor lock-in. Lately, HDP is migrated to latest Hadoop 2.0 codebase.
Microsoft, Teradata announced that they partnered up with Hortonworks.
They were the first on the market with their Cloudera Distribution including Apache Hadoop (CDH). This helped them to acquire valuable experience and to establish a solid customer base. Besides the core Hadoop plattform (HDFS, MapReduce, Hadoop Commons), CDH integrates 10 open source projects including HBase, Mahout, Pig, ZooKeeper, and others. Cloudera offers CDH, which is 100% open source, as a free download as well as a free edition of their mature Cloudera Manager console for administering and managing Hadoop clusters of up to 50 nodes. The enterprise version on the other hand combines CDH and a more sophisticated Manager plus an enterprise support package.
Recently, Cloudera inked two significant relationships. IBM announced that besides their own Hadoop distribution, BigInsights will run the CDH distro. This was closely followed by partnering with HP.
The major differences to CDH and HDP is that MapR uses their proprietary file system MapR-FS instead of HDFS. The reason for their Unix-based file system is that MapR considers HDFS as a single point of failure. The current version (v2.0) of their product is based on Apache Hadoop 0.20.2 and is known as M3 and M5. The fundamental difference between the free community edition M3 and the enterprise edition M5, is the extra high-availability features. There is MapR 2.o Beta available which I suppose will be built on Hadoop 2.0.
The company announced two prominent partnerships in June: Firstly, both editions (M3 and M5) have been selected in addition to Amazon’s own version of Hadoop (version 0.20.205) on their Elastic MapReduce service. Secondly, MapR is now available on Google Compute Engine.
A list of key players offering Hadoop platform (in alphabetical order):
Amazon Web Services, Bigtop, Cloudera, Cloudspace, Datameer, Data Mine Lab, Datasalt, DataStax, Debian, Greenplum, A Division of EMC, Hortonworks, HStreaming, IBM, Impetus, Intel, Karmasphere, Mahout, MapR Technologies, Nutch, NGDATA, Pentaho, Pervasive Software, Platform Computing, Sematext International, Talend, Think Big Analytics, Tresata, VMware, Serengeti, WANdisco,
The aforementioned companies stand above the rest at the moment. Although they all offer an Hadoop platform there are slight differences between the distributions in terms of included projects and versions. Hortonworks relies on stable, fully tested and 100% open source products. Cloudera focuses on innovation (or better technology) to drive growth. MapR is taking a different path than the other two with it’s largely proprietary Hadoop distribution. MapR’s sophisticated architecture is getting some leverage as shown by the two partnerships with Amazon and Google recently. At the end it’s all down to what you need from each distribution, as all offer something different.