I work for a company whose goal includes being the main community for exercisers where they could interact with trainers and fellow exercisers, join challenges and set and achieve goals. Clubs, chains and OEM manufacturers complete the players in this ecosystem. Here, workout data from all kinds of sources: be it gym equipment, tracking devices such as Fitbit, tracking apps such as Moves and Map My Fitness are being stored, analyzed and used for all kinds of different purposes.
A project to collect, display (by day, by week, by month and by year) and analyze the daily changes of a person’s exercising lifestyle such as steps taken, weight and food intake coming from fitness devices and apps commenced about two months ago and I was tasked to handle this. I knew immediately that this is a prime candidate for Big Data because of the potential volume and because one of our goals is to reach a massive scale. Cassandra was chosen for this purpose because of its columnar style.
One of my goals when I evaluated the existing drivers and clients for Cassandra is to find a solution that would allow us to use an abstraction layer so that we don’t have to program against the driver and write language-specific commands. Of course we still need to understand Cassandra’s underlying principle: how it stores data and how it handles partitioning to name a few. Ideally, this tool should be able to do it through JPA annotations. In short, we were searching for Hibernate’s “counterpart” in the NoSql world. We were also interested in utilizing a single column family to represent these said daily changes and lastly, we wanted to utilize the latest version of Cassandra which came with CQL 3.0 for its ease of usage.
After evaluating Kundera, Play ORM, Easy Cassandra, Hector Object Mapper and Hibernate OGM, I chose Easy Cassandra due to its lightweight and easy nature, its support for CQL 3.0 and also Spring and finally, JPA annotations without the hassle of persistence.xml. For the complete list of its features, click here. Under the hood, it uses the Datastax driver. Here’s a short comparison against the other clients: Kundera was a little bit more of everything as it supports all kinds of NoSql and RDBMS flavors. It also hard coded the location of persistence.xml. Hector Object Mapper and Play ORM do not currently support CQL 3.0 while Hibernate OGM is still in beta version as of this writing.
That being said, we still need features that weren’t supported by Easy Cassandra so I went ahead and contacted the community and started contributing the said features. These are now available on the next version of the build:
- Allow embedded id from the mapped superclass
- Added ability to query by primary key (partition key) + @Index annotation (which maps to the clustering column)
- Added ability to query by primary key and index range
And here’s the full solution to the problem above. Full details can be found at Easy Cassandra’s github repository here
CQL 3.0 command to create the column family:
I used the personid, companyid and type (weight, step, food) and date as the primary keys with the first three being the partition keys and date as the clustering column since data will be read by a date range so that a graph of daily, weekly, monthly and yearly summary could be displayed to the users. There is still room for improvement here because we could still add a feature to allow the @Index annotation to be retrieved from the superclass so that it’s not repeated per subclass.
Calendar cal = Calendar.getInstance();
- Why NoSQL Can Be Safer than an RDBMS (datastax.com)
- Cassandra 2.0: The next generation of big data (zdnet.com)