Entities

Introduction

The description of an application’s domain is essentially declarative and amounts to defining the entities as Scala case classes.

The HFactory framework automatically handles the HBase storage of entities and provides access to them through a JSON-based REST API. HFactory also allows you to write your own entity handling, should you wish to do so.

Simple entities

Defining an HFactory entity is as simple as defining a Scala case class and registering it with the entity registry (more on that later). Hence an entity has the following form:

case class e(f1: t1, f2: t2, ..., fn: tn)

where e is the name of the entity and f1,…,fn are its fields with respective types t1,…,tn. By default, the name of the HBase table in which an entity instance is stored is the entity name in lowercase, and the entity’s first field is used as rowkey. The table name as well as the rowkey, column families, etc… can all be customized using annotations.

As an example, let’s say we want to write a user directory application called UserDir. The application’s domain consists in Group and User entities. The Group entity can be defined as follow (we will define the `User` entity in the next section and expand on this example throughout this guide):

case class Group(id: Int, name: String)

In this example, the Group entities are stored in the HBase table group and the field id is used as the entity’s rowkey.

From the case class definition, HFactory is able to generate the code to store/retrieve entities in HBase as well as the code to provide a REST API to these entities, provided the following requirements are met:

  • An instance of the BytesConv typeclass is in scope for all field types, including the rowkey’s (so they can be stored in HBase)
  • An instance of the JsonConv typeclass is in scope for all field types, including the rowkey’s (so they can be transmitted through the REST API)
  • An instance of the StringConv typeclass for the rowkey type is in scope

HFactory provides instances for the base Scala/Java types: Boolean, Int, Float, Double, String (etc.) as well as for HFactory-specific base types, e.g. Timestamp. For the full list, please refer to the API documentation of the typeclasses.

HFactory automatically generates JsonConv instances for non-generic case classes as well as collections (Option[T], Iterable[T], List[T], Array[T], etc.), provided type T meets the above requirements (again, for the full list, please refer to the API documentation of the typeclasses.) Of course, if these automatically generated instances do not suit your needs, you can define your own instances: just bring them in scope (by importing them) and HFactory will use them.

NOTE: When defining an instance for a type whose representation in JSON is a string, you should extend StringBasedJsonConv rather than JsonConv: it simplifies the implementation a bit and, more importantly, it will allow the values of your type to appear in URLs without requiring quotes (which is cumbersome in URLs).

Optional fields

An optional field, i.e. a field of type Option[T], is stored as follows:

  • When storing an entity into HBase, an optional field is stored iff its value is Some(...). If the field’s value is None, it is not stored at all: no column is created for that field.
  • When retrieving an entity from HBase, an optional field’s value in the resulting entity is Some(...) iff the corresponding column exists in HBase.

NOTE: If an entity was stored with an optional field Some(...) and the entity is stored again but with the optional field None, the existing column in HBase is not deleted. This means that a subsequent retrieval of that entity will have its field set to the original, Some(...) value!

REST representation of optional fields:

  • When POSTing an entity, an optional field must appear in the JSON representation iff it is Some(...).
  • When GETting an entity, the JSON representation returned contains the optional field iff the field is Some(...).

Sequence fields

A sequence field is of type Seq[String] or of a subtype thereof (List[String], Array[String], …).

The items of a string sequence field are each stored in a separate column in HBase. As such, it is strongly advised that a sequence field be stored in its own separate column family, otherwise a sequence item and another field of the entity might end up “sharing” a column, making them overwrite each other. (While the naming of the item columns have been carefully chosen to prevent this from happening, it could theoretically happen.)

NOTE: Sequence support will be generalized to arbitrary item types (Seq[T]) in a future version.

Map fields

A map field is of type Map[String, String].

The <value> of a (<key>, <value>) entry of a string map field is stored in column <key>. As such, a map field must be stored in its own separate column family, otherwise a map key and another field of the entity might end up “sharing” a column, making them overwrite each other.

NOTE: Map support will be generalized to arbitrary value types (Map[String, T]) in a future version.

Enum fields

In the context of HFactory, an enum has a very specific meaning: it is a sealed trait (or abstract class) that extends class StringEnum and whose subclasses are all case objects. As the name implies, only string-valued enums are supported. The case objects must implement value member defined by StringEnum:

sealed trait Month extends StringEnum
case object January extends Month { val value = "Jan" }
case object February extends Month { val value = "Feb" }
case object March extends Month { val value = "Mar" }
...

Linked entities

An entity B is said to be linked from entity A if it appears as a field of A.

When entity A is serialized, only B’s rowkey is serialized as part of A’s serialization: B is an entity and as such is serialized in its own table.

In our example, a user belongs to a group (just one group, to keep the example simple), thus we’ll define the User entity with a group field of type Group which is an entity:

case class User(id: Int, login: String, first: String, last: String, group: Group)

At the Scala level, the group field is a Group case class; it is nested in case class User.

At the HBase level however, we say that Group is linked from User. The group field won’t be serialized in extenso as part of User’s serialization: instead, only the Group’s rowkey is serialized (the group’s id in the example).

Conversely, HFactory is able to reconstruct a user instance, including its group, when reading from HBase.

Rowkeys

As mentioned previously, by default HFactory uses the first field of an entity as the entity’s rowkey. To user another field as the rowkey or to define a rowkey that is a computed value, you can define a parameterless method with the name rowKey and HFactory will then use the value returned by that method as the HBase rowkey of the entity:

case class ...(...) {
  def rowKey: K = ...
}

Note that K can be any type: it can be an Array[Byte] blob like in all of HBase’s API or a more meaningful type; the only constraint is that there exist BytesConv and StringConv instances for K in scope.

Departing from our User Directory example, here’s an entity describing a Wifi Hotspot [1]. A hotspot has an id, a name and a location (latitude and longitude). In order to quickly determine the hotspots closest to a given location, a spatial index is used, where the rowkey for a hotspot is its geohash [2]:

case class Hotspot(id: Int, name: String, latitude: Double, longitude: Double) {
  def rowKey: GeoHash = Hotspot.geoHash(latitude, longitude, 12)
}

Since the above rowkey is a custom type, we have to provide its BytesConv and StringConv instances:

implicit object GeoHashStringConv extends StringConv[GeoHash] {
  protected def fromString(s: String): GeoHash = GeoHash.fromGeohashString(s)
  protected def toString(x: GeoHash): String = x.toBase32
}

implicit object GeoHashBytesConv extends BytesConv[GeoHash] {
  protected def fromBytes(b: Array[Byte]): GeoHash = GeoHash.fromGeohashString(bytesTo[String](b))
  protected def toBytes(h: GeoHash): Array[Byte] = bytesFrom[String](h.toBase32)
}

These instances will be used by HFactory everywhere a GeoHash appears, not just for rowkeys.

Timeseries

For entities that may contain a high number of timestamped values, HFactory provides the TimeSeries[T] type, which declares a sequence of values of type T where each value is timestamped.

As an example, let’s say we want to track stock market quotes. Let’s call this app “StockMarket”. The quotes can be described as follows:

case class Quotes(name: String, description: String, values: TimeSeries[Double])

This makes the values field a sequence of timestamped doubles. TimeSeries[T] really is a Seq[TimeStamped[T]], which means all operations applicable on Seq are applicable on TimeSeries.

Note : Due to the way timeseries are stored in HBase, we advise that you keep a timeseries’ name as short as possible.

The creation of a timeseries entity is done in two steps:

  1. First, the entity itself is created with a POST request with an empty timeseries.
  2. Values are then added to the entity’s timeseries field with POST requests on the field itself, with the requests’ body containing a JSON array of timestamped values.

The REST representation of a timestamped value is an object with fields ts and value containing the timestamp and the value, respectively:

{ "ts": <timestamp>, "value": <value> }

Continuing the StockMarket example above, the creation of an AAPL quote and the insertion of values in its timeseries is done as follows (not actual values):

# Create the entity.
POST /StockMarket/Quotes
    { "name": "AAPL", "description": "Apple", "values": [] }

# Insert timeseries values.
POST /StockMarket/Quotes/AAPL/values
    [ { "ts": 1425549146262, "value": 128.54 },
      { "ts": 1425549146993, "value": 127.39 },
      { "ts": 1425549147195, "value": 129.10 }
    ]

# Insert another value.
POST /StockMarket/Quotes/AAPL/values
    [ { "ts": 1425549147199, "value": 130.03 } ]

The target of the second POST request is the timeseries field values of the AAPL entity. As you can see, you can POST an arbitrary number of values to a timeseries. The body of the request must always be a JSON array, even if the request contains a single value.

Entity controllers

Entity controllers are methods exposed through the REST API that are defined in an entity’s companion object. Controllers are explained in the Controllers chapter.

Generated REST API

JSON representation

The REST API represents entities in JSON format. The JSON representation of an entity defined by

case class e(f1: t1, ..., fn: tn)

is as follows:

{
  "f1": <v1>,
  "f2": <v2>,
  ...
  "fn": <vn>
}

where <v1>,…,<vn> are the JSON representation of the values of fields f1,…,fn (of type t1,…,tn).

Routes

For each entity in an app, HServer exposes routes that allow to perform creation, modification, retrieval and listing of entities. The body of the client’s requests and the server’s responses are always in JSON form.

Creation

To create one or more entities, send a POST request with its body containing the entities in the form described in JSON representation:

POST /<app>/<entity>[?return=<what>]

When a single entity is to be created, its JSON representation can appear by itself in the body; when several entities are to be created, their JSON representations must appear in a JSON array.

On success, the response body is as requested by the request’s optional return parameter:

  • > (nothing) - don’t return anything (default)
  • entity - return the list of created entities in full
  • rowkey - return the list of created entities’ rowkeys

On failure, the response may contain one of the following error messages:

Status Message Meaning
400 - Bad Request No entity specified The request body was empty.
400 - Bad Request Invalid form data The request body was not valid entity data.
400 - Bad Request Invalid T value: V Value V in the entity was not of the expected type T.

Retrieval

To retrieve an entity, send a GET request with the rowkey of the entity:

GET /<app>/<entity>/<rowkey>

On success, the response body contains the requested entity (in the form described in JSON representation) and its rowkey in the form:

{ "rowkey": <rk>, "fields": <entity> }

On failure, the response may contain one of the following error messages:

Status Message Meaning
400 - Bad Request Invalid entity rowkey ‘V The specified rowkey was not a valid value for the expected rowkey type.
404 - Not Found Entity not found No entity was found with the specified rowkey.

Deletion

To delete an entity, send a DELETE request with the rowkey of the entity:

DELETE /<app>/<entity>/<rowkey>

On success, the response is 204 - No Content. Note that this operation succeeds even if the entity with the specified rowkey does not exist.

On failure, the response may contain one of the following error messages:

Status Message Meaning
400 - Bad Request Invalid entity rowkey ‘V The specified rowkey was not a valid value for the expected rowkey type.

Listing

To list (retrieve) entities of a specific type, send a GET request to the entity route without specifying any rowkey:

GET /<app>/<entity>[?<args>]

where the optional <args> are as follows:

Parameter name Argument Description
start rowkey The rowkey the returned list starts at.
reverse true false
limit int >= 0 Limit the number of entities returned.

By specifying a start rowkey and a limit, one can easily page trough large datasets, both in “normal” and reverse directions.

The response contains a list where each item is an object containing an entity and its rowkey, in the same form as shown above for the retrieval of specific entities:

[ { "rowkey": <rk1>, "fields": <entity1> },
  ...
  { "rowkey": <rkN>, "fields": <entityN> }
]

In our example, the routes generated would be:

GET /UserDir/Group
GET /UserDir/Group/<gid>
POST /UserDir/Group
DELETE /UserDir/Group/<gid>

GET /UserDir/User
GET /UserDir/User/<uid>
POST /UserDir/User
DELETE /UserDir/User/<uid>

where <gid> and <uid> are the group and user entity rowkeys, respectively.

To create groups “devs” and “marketing” with ids 1 and 2, send POST /UserDir/Group requests to HServer with JSON bodies { "id": "1", "name": "devs" } and { "id": "2", "name": "marketing" }, respectively.

To list the groups, send GET /UserDir/Group. HServer will respond with:

[ { "rowkey": "1", "fields": { "id": 1, "name": "devs" } },
  { "rowkey": "2", "fields": { "id": 2, "name": "marketing" } } ]

Note that, since we haven’t explicitly defined a rowKey, the first field of the entity is used as rowkey, id in this case.

Entity metadata and operations

An entity (as returned by HEntityRegistry.getEntity(), see Entity Registry) provides the following API:

abstract class HEntity[T] {
  /** Canonical name (fully qualifier name) of the entity's class. */
  final def canonicalName: String = ct.runtimeClass.getCanonicalName

  /** Name of the entity, i.e. the short name of the entity's class. */
  final def name: String = ct.runtimeClass.getSimpleName

  /** The entity description: its name and its fields (name and type).
    *
    * Base types provided by Scala are abbreviated to their short name,
    * while less common types and user-defined types are fully qualified.
    */
  def desc: EntityDesc

  /** Instantiate an entity from a JSON object.
    * Raises RowKeyParseException in case the rowkey of a linked entity is invalid.
    * Raises LinkedEntityException if the linked entity couldn't be read from HBase.
    */
  def fromJson(o: JObject)(implicit entityReg: HEntityRegistry, hContext: HContext): T

  /** The entity's controllers building block.
    *
    * You should set it to `implicitly[HEntityControllers[T]]`, which will use the
    * instance in scope, whether it is an instance you defined or HFactory's
    * automatically generated instance.
    */
  val ctl: HEntityControllers[T]

 /** The entity's HBase IO building block.
   *
   * You should set it to `implicitly[HEntityIO[T, Id]]`, which will use the
   * instance in scope, whether it is an instance you defined or HFactory's
   * automatically generated instance.
   */
  val io: HEntityIO[T, Id]
}

While you can provide your own instance instead for your case classes, you should let HFactory automatically provide it: most of the time you will want to customize the IO (HBase) instance (HEntityIO) rather than the full entity.

Custom entities

If an entity automatically generated by HFactory does not suit your needs, you can customize it by providing some of its building blocks: HBase table name, IO operations, etc.

HBase table name

By default, the name of the HBase table for an entity is the name of that entity in lowercase. To override this, annotate the entity with annotation @Table from com.ubeeko.hfactory.entities.annotations.HBase:

The annotated User entity below would be stored in HBase table persons rather than the default user:

import com.ubeeko.hfactory.entities.annotations.HBase._

@Table("persons")
case class User(id: Int, login: String, first: String, last: String, group: Group)

Column families

By default, all the fields of an entity are stored in column family “d”. This can be changed on a per-field basis with field-level annotations @CF and @DefaultCF, as follows:

  • If a field is annotated with @CF(c), it is stored in column family c.
  • If a field is annotated with @DefaultCF, it is stored in the default column family.
  • If a field is annotated with neither, it is stored in the same column family as the field preceding it. If there’s no preceding field, it is stored in the default column family.

The annotated User entity below has its fields id and login stored in column family “u”, first and last in “r”, and group in “g”. (The fields are indented and aligned to make column families stand out, but it could be written all on one line.)

import com.ubeeko.hfactory.entities.annotations.HBase._

case class User(
  @CF("u") id   : Int,
           login: String,
  @CF("r") first: String,
           last : String,
  @CF("g") group: Group
)

HBase IO

Writing custom HBase IO operations for an entity consists in providing your own instance of the HEntityIO building block for that entity. While it is quite involved as of now, it will be simplified in future versions of HFactory.

The HEntityIO API is as follows [3]:

abstract class HEntityIO[T] {
  /** HBase table the entities of this type are stored in.
    * By default, the table name is the entity name in lowercase.
    */
  val table: TableInfo = TableInfo(name = ct.runtimeClass.getSimpleName.toLowerCase,
                                   columnFamilies = Set(HBaseManager.defaultFamilyName))

  /** Type of the entity's rowkey. */
  type RowKey

  /** Parses a rowkey from a string.
    * Raises RowKeyParseException on failure.
    */
  def parseRowKey(s: String): RowKey

  /** Interpret data as a rowkey for this entity.
    * Raises InvalidRowKeyException on failure.
    */
  def asRowKey(v: Any): RowKey

  /** Get the rowkey of the specified entity. */
  def getRowKey(x: T): RowKey

  /** Get the string representation of the entity's rowkey.
    * The representation is as defined by the rowkey's `StringConv` instance.
    */
  def getRowKeyAsString(x: T): String

  /** Conversion to/from HBase bytes.
    * It is strongly advised that you let HFactory provide the instance.
    */
  val conv: HEntityConv[T, Id]

  /** Writes an entity into HBase table `tableName`.
    * The entity's id is used as rowkey.
    */
  def put(x: T)(implicit entityReg: HEntityRegistry, hContext: HContext): Unit

  /** Writes a bunch of entities into the HBase table.
    * The commit is only performed at the end of the batch or when HBase's
    * write buffer is full.
    */
  def batchPut(xs: Iterable[T])(implicit entityReg: HEntityRegistry, hContext: HContext): Unit

  /** Reads the entity with the specified rowkey from HBase table `tableName`.
    * It is not an error if the entity isn't found: None is returned in that case.
    */
  def get(rowkey: RowKey)(implicit entityReg: HEntityRegistry, hContext: HContext): Option[T]

  /** Lists the entities in the HBase table.
    *
    * @param start    Rowkey to start at.
    * @param filter   Entity filter. If not specified, no filtering is performed.
    * @param reverse  Whether to scan in reverse (default: false).
    * @param limit    Number of entities to return. If not specified, all entities are returned.
    */
  def scan(start: Option[RowKey] = None, filter: Option[Filter] = None, reverse: Boolean = false,
           limit: Option[Int] = None)
          (implicit entityReg: HEntityRegistry, hContext: HContext): Iterable[T]

  /** Deletes the entity with the specified rowkey from HBase table `tableName`.
    * It is not error if the entity isn't found.
    */
  def delete(rowKey: RowKey)(implicit hContext: HContext): Unit
}

where T is the entity (case class) type.

You must provide an implementation for all these methods except tableName, for which you may provide a value, but beware that this value will override the @HBaseTable annotation, if any.

You can customize the HEntityConv value, but most of the time you should rely on HFactory to generate the instance for you:

val conv = implicitly[HEntityConv[T]]

[1] Example taken from the book “HBase in Action”, chapter 8.

[2] op. cit.

[3] Default implementations omitted for clarity.