HSQLDB – Hyper SQL Database

HSQLDB (Hyper SQL Database) is a relational database management system written in Java. It has a JDBC driver and supports a large subset of SQL-92 and SQL:2008 standards.[2] It offers a fast,[3] small (around 1300 kilobytes in version 2.2) database engine which offers both in-memory and disk-based tables. Both embedded and server modes are available for purchase.
Additionally, it includes tools such as a minimal web server, command line and GUI management tools (can be run as applets), and a number of demonstration examples. It can run on Java runtimes from version 1.1 upwards, including free Java runtimes such as Kaffe.
HSQLDB is available under a BSD license. It is used as a database and persistence engine in many open source software projects, such as OpenOffice Base, LibreOffice Base, and the Standalone Roller Demo,[4] as well as in commercial products, such as Mathematica and InstallAnywhere (starting with version 8.0).[5]

Posted in General | Tagged , , | Leave a comment

A Comparative Analysis of Hadoop Players

For Hadoop 2.2, let’s have a look at major players in the Hadoop ecosystem:  Hortonworks, Cloudera and MapR.


Hortonworks announced in June the general availability of their Hortonworks Data Platform (HDP). The HDP distro is 100% Apache open source code. The major difference from Cloudera and MapR is that HDP usesApache Ambari for cluster management and monitoring. In its current 0.9 version, Ambari certainly can’t be so mature as Cloudera’s Manager or MapR’s Heatmap. The Hortonworks Data Platform is open source to its core  – no proprietary layers. You’ll therefore never have a vendor lock-in. Lately, HDP is migrated to latest Hadoop 2.0 codebase.

Microsoft, Teradata announced that they partnered up with Hortonworks.


They were the first on the market with their Cloudera Distribution including Apache Hadoop (CDH). This helped them to acquire valuable experience and to establish a solid customer base. Besides the core Hadoop plattform (HDFS, MapReduce, Hadoop Commons), CDH integrates 10 open source projects including HBase, Mahout, Pig, ZooKeeper, and others. Cloudera offers CDH, which is 100% open source, as a free download as well as a free edition of their mature Cloudera Manager console for administering and managing Hadoop clusters of up to 50 nodes. The enterprise version on the other hand combines CDH and a more sophisticated Manager plus an enterprise support package.

Recently, Cloudera inked two significant relationships. IBM announced that besides their own Hadoop distribution, BigInsights will run the CDH distro. This was closely followed by partnering with HP.


The major differences to CDH and HDP is that MapR uses their proprietary file system MapR-FS instead of HDFS. The reason for their Unix-based file system is that MapR considers HDFS as a single point of failure. The current version (v2.0) of their product is based on Apache Hadoop 0.20.2 and is known as M3 and M5. The fundamental difference between the free community edition M3 and the enterprise edition M5, is the extra high-availability features. There is MapR 2.o Beta available which I suppose will be built on Hadoop 2.0.

The company announced two prominent partnerships in June: Firstly, both editions (M3 and M5) have been selected in addition to Amazon’s own version of Hadoop (version 0.20.205) on their Elastic MapReduce service. Secondly, MapR is now available on Google Compute Engine.

A list of key players offering Hadoop platform (in alphabetical order):

Amazon Web Services, Bigtop, Cloudera, Cloudspace, Datameer, Data Mine Lab, Datasalt, DataStax, Debian, Greenplum, A Division of EMC, Hortonworks, HStreaming, IBM, Impetus, Intel, Karmasphere, Mahout, MapR Technologies, Nutch, NGDATA, Pentaho, Pervasive Software, Platform Computing, Sematext International, Talend, Think Big Analytics, Tresata, VMware, Serengeti, WANdisco,


The aforementioned companies stand above the rest at the moment. Although they all offer an Hadoop platform there are slight differences between the distributions in terms of included projects and versions. Hortonworks relies on stable, fully tested and 100% open source products. Cloudera focuses on innovation (or better technology) to drive growth. MapR is taking a different path than the other two with it’s largely proprietary Hadoop distribution. MapR’s sophisticated architecture is getting some leverage as shown by the two partnerships with Amazon and Google recently. At the end it’s all down to what you need from each distribution, as all offer something different.

Posted in Cloud Computing, Database Technologies, Emerging Trends, General, Open Source | Tagged , , , , , , , | Leave a comment

What Is Hadoop?

The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

The project includes these modules:

  • Hadoop Common: The common utilities that support the other Hadoop modules.
  • Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.
  • Hadoop YARN: A framework for job scheduling and cluster resource management.
  • Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.

Other Hadoop-related projects at Apache include:

  • Ambari™: A web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters which includes support for Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop. Ambari also provides a dashboard for viewing cluster health such as heatmaps and ability to view MapReduce, Pig and Hive applications visually alongwith features to diagnose their performance characteristics in a user-friendly manner.
  • Avro™: A data serialization system.
  • Cassandra™: A scalable multi-master database with no single points of failure.
  • Chukwa™: A data collection system for managing large distributed systems.
  • HBase™: A scalable, distributed database that supports structured data storage for large tables.
  • Hive™: A data warehouse infrastructure that provides data summarization and ad hoc querying.
  • Mahout™: A Scalable machine learning and data mining library.
  • Pig™: A high-level data-flow language and execution framework for parallel computation.
  • Spark™: A fast and general compute engine for Hadoop data. Spark provides a simple and expressive programming model that supports a wide range of applications, including ETL, machine learning, stream processing, and graph computation.
  • ZooKeeper™: A high-performance coordination service for distributed applications.
Posted in General | Tagged , , | Leave a comment

Big Data

Without doubt, “big data” is the hottest topic in enterprise IT since cloud computing came to prominence five years ago. And the most concrete technology behind the big data trend is Hadoop.

Most enterprises are at least experimenting with Hadoop, and the potential for transformative business improvement is real.

Big data analytics is an over-hyped, poorly-defined and over-used term.  Despite that, and despite the challenges outlined above, I believe that for many businesses, the opportunities presented by the big data revolution are as significant and fundamental as those presented by e-commerce 15 years ago.  Companies (particularly retailers) should be bold and determined in reacting to these challenges.

Posted in General | Tagged , | Leave a comment

Oracle Database License price with OEM

Product Named-User License Processor License
Oracle Database Server
Standard Edition One $180 $5,800
Standard Edition $350 $17,500
Enterprise Edition: $950 $47,500
    – Real Application Clusters (RAC) $460 $23,000
    – Active Data Guard $120 $5,800
    – Partitioning $230 $11,500
    – OLAP $460 $23,000
    – Data Mining $460 $23,000
    – Spatial $230 $11,500
    – Advanced Security $230 $11,500
    – Label Security $230 $11,500

Oracle License price in US $ (per User, per CPU/CORE) + 15-20% support cost per year.

SummaryIts good to follow ‘per user licensing’ if you ve got =< 50 users. Go by ‘per Core’ if  no of users > 50

For OEM, The following packs are required.

Product Named-User License Processor License
Diagnostics Pack $70 $3,500
Tuning Pack  $70 $3,500

Summary: It may be good to buy OEM for few users as not many would be using it

Posted in General | Tagged , , | Leave a comment

Oracle Database 12c : Deprecated and Desupported Features

Oracle Database Changes

Oracle Database 12c introduces changes that affect Oracle Database in general.

This section contains these topics:

Posted in General | Tagged , , , | Leave a comment

Windows Azure SQL Database (formerly SQL Azure)

Windows Azure SQL Database (formerly SQL Azure, SQL Server Data Services, and later SQL Services) is a cloud-based service from Microsoft offering data-storage capabilities (similar to Amazon Relational Database Service) as a part of the Azure Services Platform.

Unlike similar cloud-based databases, SQL Azure allows users to make relational queries against stored data, which can either be structured or semi-structured, or even unstructured documents. SQL Azure features querying data, search, data analysis and data synchronization.
SQL Azure uses a special version of Microsoft SQL Server as its backend.

It provides high availability by storing multiple copies of databases, elastic scale and rapid provisioning.

It exposes a subset of the full SQL Server functionality, including only a subset of the data types — including string, numeric, date and boolean.

It uses an XML-based format for data transfer. Like Microsoft SQL Server, SQL Azure uses T-SQL as the query language and Tabular Data Stream (TDS) as the protocol to access the service over internet. (The product does not provide a REST-based API to access the service over HTTP- Microsoft recommends using ADO.NET Data Services for this purpose.)

Posted in General | Tagged , , , , | Leave a comment