Accessing data in HBase
HBase is the open-source implementation of Google’s BigTable, a distributed data storage system. HBase runs on top of either Hadoop's Distributed File System (HDFS) or Amazon's Simple Storage Service (S3).
At a very basic level, HBase is a map, an abstract data type that is composed of keys and values. An additional dimension of data is also available. For example, if the key/value pair identifies a cell value, as in a table, an additional dimension can identify time, enabling users to retrieve a particular data value from a particular time. HBase stores the key/value pairs in alphabetical order, further available by time, where the highest time values are the most recent.
BIRT Designer Professional supports access to HBase data through Hive, which is a data warehouse infrastructure built on top of Hadoop. Hive facilitates data summarization, queries, and analysis. It provides a mechanism for structuring large data sets and querying the data using a SQL-like language called Hive Query Language (HQL). By using Hive to access data, you can write HQL queries to specify the data to retrieve.
As with other types of data sources, for a report to use data from an HBase system, you must create the following BIRT objects:
*A data source that contains the information to connect to a Hive system
*A data set that specifies the data to retrieve