Tuesday, March 6, 2012

HBase Composite Row Key Design: Doing Table Scan Using Partial Row Key

HBase is a column oriented NoSql Database which uses Hadoop file system as the data storage.The rows in HBase are stored in sorted order, meaning they are alphabetically sorted.Typically one creates an HTable with one or more column family and store data in it.

A column family is a collection of dynamic columns, meaning column name can be defined when storing data and so one can have n number of columns and there is no limit to it.Data in a table is stored using a row key and specifying column family, column name and column value. The row key is just a byte array.


The row key of a HBase table can be a composite key, consisting of multiple individual keys.For example one can design a key to keep track of user and number of session a user has.In this case a row key can be:


userIdBytes+seperatorByte+sessionIdBytes.


Here seperatorByte should be choosen in such a way which does not conflict with userId and sessionId bytes values. For example use LF (decimal 10)

This row key design allows partial table scanning, where to get all the sessionIds for a user one can simple create a HBase Scan object with userId as the start row of the Scan and get all table rows for this userId. Meaning one can get all the rows which start with this userId.

The other advantage of this key design is that one can extract sessionId from the row key itself without actually loading data from HBase.




Promote your blog

No comments:

Post a Comment