Using Hadoop data in a report
With data storage requirements approaching several petabytes, relational databases no longer meet the needs of many organizations. Facebook, for example, analyzes 15 terabytes of log data each day. To store and process vast amounts of data, organizations use “big data” systems such as Hadoop. An open-source software framework designed for scalable, distributed computing, Hadoop spreads and manages data on clusters of servers and coordinates work among them.
To achieve reliability and efficiency in distributed processing, Hadoop uses a MapReduce programming model, which uses map and reduce operations to divide data-intensive tasks, such as data searches or data aggregation, into discrete tasks that can be done in parallel across clusters of servers. The map phase occurs when each discrete task is distributed, or mapped, to all the servers. The reduce phase occurs when the intermediate results are merged, or reduced, into one result set.
Actuate BIRT Designer supports access to Hadoop data through Hive, which is a data warehouse infrastructure built on top of Hadoop. Hive facilitates data summarization, queries, and analysis. It provides a mechanism for structuring large data sets and querying the data using a SQL-like language called Hive Query Language (HQL). By using Hive to access data, you can write HQL queries, instead of MapReduce functions, to specify the data to retrieve.
As with other types of data sources, for a report to use data from a Hadoop system, you must create the following BIRT objects:

Additional Links:

Copyright Actuate Corporation 2012