What is multitenancy? Why it’s becoming so important?

The term “Software Multitenancy” refers to a software architecture in which a single instance of software runs on a server and serves multiple tenants. A tenant is a group of users who share a common access with specific privileges to the software instance. With a multitenant architecture, a software application is designed to provide every tenant a dedicated share of the instance – including its data, configuration, user management, tenant individual functionality and non-functional properties. Multitenancy contrasts with multi-instance architectures, where separate software instances operate on behalf of different tenants.
Some regard multitenancy as an important feature of cloud computing.

While multi-tenancy takes forward some of the concepts of mainframe computing to the x86 server ecosystems, its ongoing efforts to scale up these mainframe concepts to support thousands of Intra- and inter-enterprise tenants (not users) are complex, commendable and quite revolutionary. It’s only when the required degree of multi-tenancy is incorporated into all the layers of public and private clouds that the promises of improved scalability, agility and economies of scale can be fully delivered.

In cloud computing, the meaning of multi-tenancy architecture has broadened because of new service models that take advantage of virtualization and remote access. A software-as-a-service (SaaS) provider, for example, can run one instance of its application on one instance of a database and provide web access to multiple customers. In such a scenario, each tenant’s data is isolated and remains invisible to other tenants.

Database (Oracle) Multitenancy?

A new option for Oracle Database 12c, Oracle Multitenant helps customers reduce IT costs by simplifying consolidation, provisioning, upgrades, and more.

  • High Consolidation Density
  • Rapid Provisioning and Cloning Using SQL
  • Rapid Patching and Upgrades
  • Manage Many Databases as One
  • Pluggable Database Resource Management
Posted in Cloud Computing, Database Technologies, Emerging Trends, Oracle | Leave a comment

Oracle 12c top 12 features

Tom Kyte has picked his top 12 features of Oracle Database 12c and put them into a presentation. Here are his picks:

  1. Even better PL/SQL from SQL
  2. Improved defaults
  3. Increased size limits for some datatypes
  4. Easy top-n and pagination queries
  5. Row pattern matching
  6. Partitioning improvements
  7. Adaptive execution plans
  8. Enhanced statistics
  9. Temporary undo
  10. Data optimization capabilities
  11. Application Continuity and Transaction Guard
  12. Pluggable databases




Posted in Database Technologies, Oracle | Tagged , , | Leave a comment

In-Memory Column Store in Oracle Database 12c Release 1 (

In-Memory Column Store in Oracle Database 12c Release 1 (

In-Memory Column Store feature in Oracle Database 12c Release 1 ( promises to address such kind of real-time reporting and statistic requirements at ease. This feature offers users best of both worlds (power of OLTP and OLAP/Analytics with a single database). One can push the entire table in memory or choose only selective columns to be in-memory.
Though, it comes at a cost (12c license and huge SGA memory) but worth exploring. It should assume a place in our product roadmap.
This feature allows you to store table columns in memory in a columnar format, rather than the typical row format.

— Row Oriented systems
To serialize each row of data, like this;


— Column Oriented systems
A column-oriented database serializes all of the values of a column together, then the values of the next column, and so on. For our example table, the data would be stored in this fashion:


–Comparisons and Benefits
If an application can be reasonably assured to fit most/all data into memory, in which case huge optimizations are available from in-memory database systems.

Column-oriented organizations are more efficient when an aggregate needs to be computed over many rows but only for a notably smaller subset of all columns of data, because reading that smaller subset of data can be faster than reading all data.

In practice, row-oriented storage layouts are well-suited for OLTP-like workloads which are more heavily loaded with interactive transactions. Column-oriented storage layouts are well-suited for OLAP-like workloads (e.g., data warehouses) which typically involve a smaller number of highly complex queries over all data (possibly terabytes).

For further details, one can refer to:



Posted in Cloud Computing, Database Technologies, Oracle | Tagged , , , , , , , , | Leave a comment

Re-Organize / Defrag database schema using ALTER TABLE MOVE TABLESPACE option

There are several ways for a complete reorganization and space reclamation:-
This post provides a trick to reorganize all schema objects using the following script using ‘ALTER TABLE MOVE TABLESPACE’ option.
— with LOB and partition segments
— Move tables
— Rebuild indexes
— Move a table partition segment. 
— Rebuild an index partition segment.
— Move LOB segments if we had them.
select ‘ALTER TABLE DASCORE_37.’|| table_name || ‘ MOVE LOB(‘|| column_name || ‘) STORE AS (TABLESPACE DASCORE_37);’ from dba_tab_columns 
where owner like ‘DASCORE_37’ and data_type like ‘%LOB%’;
–ALTER TABLE tab1 MOVE LOB(lob_column_name) STORE AS (TABLESPACE new_ts);
One need to run the execute the output of dynamic SQLs run above to reorganize and de-fragment tablespace and free up disk space.
Posted in Database Technologies, Oracle | Tagged , , , , , , | Leave a comment

HSQLDB – Hyper SQL Database

HSQLDB (Hyper SQL Database) is a relational database management system written in Java. It has a JDBC driver and supports a large subset of SQL-92 and SQL:2008 standards.[2] It offers a fast,[3] small (around 1300 kilobytes in version 2.2) database engine which offers both in-memory and disk-based tables. Both embedded and server modes are available for purchase.
Additionally, it includes tools such as a minimal web server, command line and GUI management tools (can be run as applets), and a number of demonstration examples. It can run on Java runtimes from version 1.1 upwards, including free Java runtimes such as Kaffe.
HSQLDB is available under a BSD license. It is used as a database and persistence engine in many open source software projects, such as OpenOffice Base, LibreOffice Base, and the Standalone Roller Demo,[4] as well as in commercial products, such as Mathematica and InstallAnywhere (starting with version 8.0).[5]

Posted in Database Technologies | Tagged , , | Leave a comment

A Comparative Analysis of Hadoop Players

For Hadoop 2.2, let’s have a look at major players in the Hadoop ecosystem:  Hortonworks, Cloudera and MapR.


Hortonworks announced in June the general availability of their Hortonworks Data Platform (HDP). The HDP distro is 100% Apache open source code. The major difference from Cloudera and MapR is that HDP usesApache Ambari for cluster management and monitoring. In its current 0.9 version, Ambari certainly can’t be so mature as Cloudera’s Manager or MapR’s Heatmap. The Hortonworks Data Platform is open source to its core  – no proprietary layers. You’ll therefore never have a vendor lock-in. Lately, HDP is migrated to latest Hadoop 2.0 codebase.

Microsoft, Teradata announced that they partnered up with Hortonworks.


They were the first on the market with their Cloudera Distribution including Apache Hadoop (CDH). This helped them to acquire valuable experience and to establish a solid customer base. Besides the core Hadoop plattform (HDFS, MapReduce, Hadoop Commons), CDH integrates 10 open source projects including HBase, Mahout, Pig, ZooKeeper, and others. Cloudera offers CDH, which is 100% open source, as a free download as well as a free edition of their mature Cloudera Manager console for administering and managing Hadoop clusters of up to 50 nodes. The enterprise version on the other hand combines CDH and a more sophisticated Manager plus an enterprise support package.

Recently, Cloudera inked two significant relationships. IBM announced that besides their own Hadoop distribution, BigInsights will run the CDH distro. This was closely followed by partnering with HP.


The major differences to CDH and HDP is that MapR uses their proprietary file system MapR-FS instead of HDFS. The reason for their Unix-based file system is that MapR considers HDFS as a single point of failure. The current version (v2.0) of their product is based on Apache Hadoop 0.20.2 and is known as M3 and M5. The fundamental difference between the free community edition M3 and the enterprise edition M5, is the extra high-availability features. There is MapR 2.o Beta available which I suppose will be built on Hadoop 2.0.

The company announced two prominent partnerships in June: Firstly, both editions (M3 and M5) have been selected in addition to Amazon’s own version of Hadoop (version 0.20.205) on their Elastic MapReduce service. Secondly, MapR is now available on Google Compute Engine.

A list of key players offering Hadoop platform (in alphabetical order):

Amazon Web Services, Bigtop, Cloudera, Cloudspace, Datameer, Data Mine Lab, Datasalt, DataStax, Debian, Greenplum, A Division of EMC, Hortonworks, HStreaming, IBM, Impetus, Intel, Karmasphere, Mahout, MapR Technologies, Nutch, NGDATA, Pentaho, Pervasive Software, Platform Computing, Sematext International, Talend, Think Big Analytics, Tresata, VMware, Serengeti, WANdisco,


The aforementioned companies stand above the rest at the moment. Although they all offer an Hadoop platform there are slight differences between the distributions in terms of included projects and versions. Hortonworks relies on stable, fully tested and 100% open source products. Cloudera focuses on innovation (or better technology) to drive growth. MapR is taking a different path than the other two with it’s largely proprietary Hadoop distribution. MapR’s sophisticated architecture is getting some leverage as shown by the two partnerships with Amazon and Google recently. At the end it’s all down to what you need from each distribution, as all offer something different.

Posted in Cloud Computing, Database Technologies, Emerging Trends, General, Open Source | Tagged , , , , , , , | Leave a comment

What Is Hadoop?

The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

The project includes these modules:

  • Hadoop Common: The common utilities that support the other Hadoop modules.
  • Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.
  • Hadoop YARN: A framework for job scheduling and cluster resource management.
  • Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.

Other Hadoop-related projects at Apache include:

  • Ambari™: A web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters which includes support for Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop. Ambari also provides a dashboard for viewing cluster health such as heatmaps and ability to view MapReduce, Pig and Hive applications visually alongwith features to diagnose their performance characteristics in a user-friendly manner.
  • Avro™: A data serialization system.
  • Cassandra™: A scalable multi-master database with no single points of failure.
  • Chukwa™: A data collection system for managing large distributed systems.
  • HBase™: A scalable, distributed database that supports structured data storage for large tables.
  • Hive™: A data warehouse infrastructure that provides data summarization and ad hoc querying.
  • Mahout™: A Scalable machine learning and data mining library.
  • Pig™: A high-level data-flow language and execution framework for parallel computation.
  • Spark™: A fast and general compute engine for Hadoop data. Spark provides a simple and expressive programming model that supports a wide range of applications, including ETL, machine learning, stream processing, and graph computation.
  • ZooKeeper™: A high-performance coordination service for distributed applications.
Posted in Database Technologies, Emerging Trends, Open Source | Tagged , , | Leave a comment