This post is a start of a new topic on NoSQL/Big Data
Today I will give some tricks I found out with Apache Cassandra during my work on Tatami and point out some gotchas to avoid.
Those who are not familiar with the Cassandra data model, you can have a look there:
- http://www.slideshare.net/ebenhewitt/cassandra-datamodel-4985524
- http://www.datastax.com/docs/0.8/ddl/index
I Tricks
First we start with good things, the tricks that will help you be more efficient with Cassandra
A Simple text-based searching
The first disappointment when people discover Cassandra is the lack of support for searching. We are so used to the SQL “like %xxx%” syntax that we take it for granted.
With Cassandra, they is no such syntax for text search. So are we done ?
Of course not. There is a small trick that rely on the clever use of the column sort provided natively by Cassandra.
Since we know that all columns are sorted by their type (string, number, date, UUID …) we can take avantage of this sorting for text search.
Let’s suppose we have a list of user logins and we put them in a wide row as string-ordered column name:
- jduboi
- jdubois
- jduboiseaux
- jduboisier
- jduboit
- jduboita
- xxx
Naturally, Cassandra will use lexicographic ordering for these column names.
Now we want to search for all logins starting with “jdubois”. With a quick SliceQuery, setting start = “jdubois“, end = “”, we will get:
- jdubois
- jduboiseaux
- jduboisier
- jduboit
- jduboita
- xxx
That’s not exactly what we want. Indeed Cassandra filters out the first login (“jduboi“) because it is sorted lexicograpically before the start limit “jdubois“.Since the end limit was set to empty string, Cassandra will take all entries following the first exact match (if there is a match). If there is no exact match, Cassandra still takes all logins that follow lexicograpically the start limit.
If we have set start = “jduboisa“, end = “”, we would have got:
- jduboiseaux : because it is the first login that follow “jduboisa” in lexicographic order
- jduboisier
- jduboit
- jduboita
- xxx
Now if we want to implement the “start with jdubois” search semantic, we need find the way to make Cassandra stop scanning if there is no partial match. The trick is to set the end limit as the start limit, with the last letter being advanced one notch.
For start = “jdubois“, end = “jduboit”, we will get:
- jdubois
- jduboiseaux
- jduboisier
- jduboit : to be filterd out
This time, the “contains” sematic is respected. Of course we need to get rid of the last entry “jduboit” because it does not match our input. Cassandra selected it because all the SliceQuery range limits are included in the search by default (it’s not possible to exclude the limit bounds currently).
This technique only address the requirement for “startWith” search semantic. For “endWith” semantic, you can reverse all logins and proceed similarly.
For “contains” search semantic I did not find so far any solution apart from in-memory filtering… But honestly if you have complex search requirements it’d be better to look at dedicated solutions like ElasticSearch.
B Lexicographic TimeUUID ordering
Cassandra provides, among all the primitive types, support for UUID values of type 1 (time and server based) and type 4 (random).
The primary use of UUID (Unique Universal IDentifier) is to obtain a really unique identifier in a potentially distributed environment.
One naive idea when working with unique identifier is to rely on System.currentTimeMillis(). It should work fine for most “basic” applications but for the use case of Big Data where we need to deal with thousands or millions of objects per second, it is clearly not sufficient. In one millisecond your server may create more than one message. System.currentTimeMillis() is not fine-grain enough to guarantee unicity.
The JDK also provides System.nanoTime() but the Javadocs clearly state that it should be used only to compute duration, not as a time reference.
So we end up with the good old UUID.
Cassandra does support version 1 UUID. It gives you an unique identifier by combining the computer’s MAC address and the number of 100-nanosecond intervals since the beginning of the Gregorian calendar.
As you can see the precision is only 100 nanoseconds, but fortunately it is mixed with a clock sequence to add randomness. Furthermore the MAC address is also used to compute the UUID so it’s very unlikely that you face collision on one cluster of machine, unless you need to process a really really huge volume of data (don’t forget, not everyone is Twitter or Facebook).
One of the most relevant use case for UUID, and espcecially TimeUUID, is to use it as column key. Since Cassandra column keys are sorted, we can take advantage of this feature to have a natural ordering for our column families.
The problem with the default com.eaio.uuid.UUID provided by the Hector client is that it’s not easy to work with. As an ID you may need to bring this value from the server up to the view layer, and that’s the gotcha.
Basically, com.eaio.uuid.UUID overrides the toString() to gives a String representation of the UUID. However this String formatting cannot be sorted lexicographically…
Below are some TimeUUID generated consecutively:
- 8e4cab00-c481-11e1-983b-20cf309ff6dc at some t1
- 2b6e3160-c482-11e1-addf-20cf309ff6dc at some t2 with t2 > t1
“2b6e3160-c482-11e1-addf-20cf309ff6dc”.compareTo(“8e4cab00-c481-11e1-983b-20cf309ff6dc”) gives -6 meaning that “2b6e3160-c482-11e1-addf-20cf309ff6dc” is less/before “8e4cab00-c481-11e1-983b-20cf309ff6dc” which is incorrect.
The current textual display of TimeUUID is split as follow:
time_low – time_mid – time_high_and_version – variant_and_sequence – node
If we re-order it starting with time_high_and_version, we can then sort it lexicographically:
time_high_and_version – time_mid – time_low – variant_and_sequence – node
The utility class is given below:
public static String reorderTimeUUId(String originalTimeUUID) { StringTokenizer tokens = new StringTokenizer(originalTimeUUID, "-"); if (tokens.countTokens() == 5) { String time_low = tokens.nextToken(); String time_mid = tokens.nextToken(); String time_high_and_version = tokens.nextToken(); String variant_and_sequence = tokens.nextToken(); String node = tokens.nextToken(); return time_high_and_version + '-' + time_mid + '-' + time_low + '-' + variant_and_sequence + '-' + node; } return originalTimeUUID; }
The TimeUUIDs become:
- 11e1-c481-8e4cab00-983b-20cf309ff6dc
- 11e1-c482-2b6e3160-addf-20cf309ff6dc
Now we get “11e1-c481-8e4cab00-983b-20cf309ff6dc“.compareTo(“11e1-c482-2b6e3160-addf-20cf309ff6dc“) = -1
C Paging data with sorted column name
If you are using Cassandra, it means that your application is dealing with a huge volume of data. If it’s not then there is something wrong with your data model and you probably should look at conventional SQL databases instead of NoSQL.
With high volume comes the need to page data. For this purpose, the SliceQuery comes to the rescue again!
Let’s say that your mail application manages a list of messages in the inbox. The messages are reverse-ordered with respect to their reception time.
The inbox will only display the latest N messages. Upon user click on “Next” it will fetch the following N messages and so on.
To build a paging system, first we should create an index for the message ID. It is merely a column family with column name of type TimeUUID (see above) for time ordering and column value pointing to a message ID.
To fetch the first N messages, just use SliceQuery with:
- start = null
- end = null
- reverse = true
- limit = N
for descending sort order. Latest messages first.
Once to get the list of first N messages, it should be pretty easy to get the message ID of the last element in the list. To fetch the next N messages:
- start = last messageID of previous list
- end = null
- reverse = true
- limit = N
And that’s all !
II Traps
And now the traps that you should absolutely avoid if you don’t want to suffer endless hours of debugging 🙂
A The real semantic of composite column key
In the latest version of Cassandra composite column key has been added. What is it ?
Basically instead of having a simple column key (column name) of one type, you can aggregate several values (also called components) of several types to form one unique column key.
Examples: login:location, surname:firstname:age …
In the earlier days, people used to concatenate their fields a string value with a arbitrary chosen separator to form a kind of composite key. But this approach is rather limited because the sort order is lexicographical (string). It works well when all the component are of string type because concatenating them does not modify the sort order. If the component are of different types, you’re screwed.
Now with the composite type, it is possible to mix components of different types!
Cool right ?
Not really indeed …
Let’s look at a simple example. We have a list of users and want to find them by their login, age or city. Usually we would have to create 3 indexes, one to index the login, one for the age and one for the city. Now the naive developer will create a composite column key as login : age : city.
Let’s consider the following data set:
- alice:27:New York
- bob:32:New York
- bob:35:Seattle
- boby:25:Atlanta
- jack:27:Los Angeles
Now let’s perform a SliceQuery with composite type. If we want to retrieve all people whose login is exactly “bob”:
Composite start = new Composite(); start.addComponent(0, "bob", Composite.ComponentEquality.EQUAL); Composite end = new Composite(); end.addComponent(0, "bob", Composite.ComponentEquality.GREATER_THAN_EQUAL); List<HColumn<Composite, Object>> columns = HFactory.createSliceQuery(keyspace, se, ce, oe) .setColumnFamily("composite").setKey("test") .setRange(start, end, false, 100).execute().get().getColumns();
The result is:
- bob:32:New York
- bob:35:Seattle
Please notice the line 5 in the above code. Intuitively we would set Composite.ComponentEquality.EQUAL for the end value but paradoxically Cassandra will return no result if set to EQUAL. So I set it to GREATER_THAN_EQUAL.
Now what if we want to get all users whose login start with “bob” ?
Composite start = new Composite(); start.addComponent(0, "bob", Composite.ComponentEquality.EQUAL); Composite end = new Composite(); end.addComponent(0, "boc", Composite.ComponentEquality.LESS_THAN_EQUAL); List<HColumn<Composite, Object>> columns = HFactory.createSliceQuery(keyspace, se, ce, oe) .setColumnFamily("composite").setKey("test") .setRange(start, end, false, 100).execute().get().getColumns();
The result is:
- bob:32:New York
- bob:35:Seattle
- boby:25:Atlanta
Please notice the trick at line 5. We shift the last letter of “bob” up one notch and we use LESS_THAN_EQUAL. This time the result includes “boby”
Now we want to get users whose login = “bob” and having 32 years old.
Composite start = new Composite(); start.addComponent(0, "bob", Composite.ComponentEquality.EQUAL); start.addComponent(1, 32, Composite.ComponentEquality.EQUAL); Composite end = new Composite(); end.addComponent(0, "bob", Composite.ComponentEquality.EQUAL); end.addComponent(1, 32, Composite.ComponentEquality.GREATER_THAN_EQUAL); List<HColumn<Composite, Object>> columns = HFactory.createSliceQuery(keyspace, se, ce, oe) .setColumnFamily("composite").setKey("test") .setRange(start, end, false, 100).execute().get().getColumns();
The result is:
- bob:32:New York
Please notice that now for the first component (login) we can both use EQUAL for start and end limit (line 5). For the second component (age) again EQUAL for start limit and GREATER_THAN_EQUAL for end limit if we want an exact match.
What if we want users whose login = “bob” and age between 32 and 35 years old inclusive ?
Composite start = new Composite(); start.addComponent(0, "bob", Composite.ComponentEquality.EQUAL); start.addComponent(1, 32, Composite.ComponentEquality.EQUAL); Composite end = new Composite(); end.addComponent(0, "bob", Composite.ComponentEquality.EQUAL); end.addComponent(1, 36, Composite.ComponentEquality.LESS_THAN_EQUAL); List<HColumn<Composite, Object>> columns = HFactory.createSliceQuery(keyspace, se, ce, oe) .setColumnFamily("composite").setKey("test") .setRange(start, end, false, 100).execute().get().getColumns();
The result is:
- bob:32:New York
- bob:35:Seattle
Again, we need to set 36 and LESS_THAN_EQUAL to the end limit to retrieve bob:35:Seattle because the LESS_THAN_EQUAL operator is a strict <
Now, we want to get all users with age = 27
Composite start = new Composite(); start.addComponent(0, "", Composite.ComponentEquality.EQUAL); start.addComponent(1, 27, Composite.ComponentEquality.EQUAL); Composite end = new Composite(); end.addComponent(0, Character.MAX_VALUE + "", Composite.ComponentEquality.EQUAL); end.addComponent(1, 27, Composite.ComponentEquality.GREATER_THAN_EQUAL); List<HColumn<Composite, Object>> columns = HFactory.createSliceQuery(keyspace, se, ce, oe) .setColumnFamily("composite").setKey("test") .setRange(start, end, false, 100).execute().get().getColumns();
The result:
- alice:27:New York
- bob:32:New York
- bob:35:Seattle
- boby:25:Atlanta
- jack:27:Los Angeles
It is mandatory to provide a value for the first component. Since we do not want to filter by login, I set “” for start limit and Character.MAX_VALUE (line 6) for end limit.
We expect Cassandra to return back only alice:27:New York and jack:27:Los Angeles but it returns all the users! Why ?
Simply because of the way how Cassandra orders composite columns. The composite columns are ordered first by its first component, then by its second component etc…
To get all users of 27 years old, Cassandra need to scan all possible values for the first composite component (login) before it can access the second one (age). In the end it needs to retrieve all the columns because there is no other way to process.
Most developers naively think that composite columns provides them multi-dimensional axes for data filtering but it’s just an illusion. The way Cassandra sort columns is the real limitation.
The only way to use composite columns effectively is to filter
- only by the first component
- or fixing the first component then filter by the second component
- or fixing the first and second component then filter by the third component
- etc…
B Row or colum indexing ? The (not so) difficult choice
With Cassandra you can filter data either using the row key as search key (exact match or range match with RangeQuery) or using the column key as described previously.
This leads to main 2 designs: wide row and skinny row.
A wide row pattern consists of a column family structure with very few rows and, for each rows, many many columns.
A skinny row pattern consists of a column family structure with many rows and for each row, very few columns.
Each of those design matches different technical needs. However for indexing data, the skinny row pattern should be avoided.
Basically you build indexes to accelerate data access and retrieval. Instead of scanning and reading values from various column families we only need to read data from one column family, the index.
Now what if the index data is spread over many Cassandra nodes ? Well the cluster need to fetch them all from different node and merge them. It has a cost.
Why does it occur ? Because when you index data using the row name as search key, with a RandomPartioner (default configuration) a hash is computed from the row key and the data is sent to a node depending on this hashed value.
Of course you can choose a OrderPreservingPartioner but it has many drawbacks, unbalanced cluster structure to cite the few (complete details here). Officially the use of OrderPreservingPartioner is strongly not recommended unless you do not have any choice.
So it let us with the wide row pattern as indexing solution and this is the right choice. All columns of a row are stored on the same node, in the same data block and sorted on disk so accessing and scanning these columns is extremely fast. And that’s what you want for your index !
Thanks first. The topic you have written is hardly shared. Thanks once more
Covers some of the very important points – Really useful!
Great Post!
I am particularly interested in the section “A The real semantic of composite column key”. My question is, is this still the case? Does Jira https://issues.apache.org/jira/browse/CASSANDRA-3680 addressed this problem or it addresses a different one?
Also, for the very last query, will you always get back 5 columns or there are other ways to write the Java code to get back only 2 columns? I understand why Cassandra may have to scan all columns to come up with the answer, ” there is no other way to process.” but what prevent it from filtering the results and send back only 2 columns? It may still have to read all columns from disk.
This is extremely important to us as it affects how we design our system, so your answer is greatly appreciated.
Hello Stanley
You’re right, they’ve added secondary index on composite columns, making it now possible to filter on any component on a composite.
Basically in the index they are keeping track of row key and column composite components that are ‘before’ the component to be indexed.
That makes index update heavier as Sylvain Lebresne said in the JIRA.
For your second question “Also, for the very last query, will you always get back 5 columns or there are other ways to write the Java code to get back only 2 columns?”
The answer depends on the protocol you use.
CQL3 will probably filter out undesired results on server and just send back the matching columns (not tested though, just an assumption) but Thrift client will surely get back all the columns.
Anyway, thanks for sharing you thoughts.
Hey,
Very nic article, I am using data stax java driver and new to it, can you please tell me how to use this “startWith” using my java driver-core driver?
starwith translate in CQL3 to “<" or "<=" depending on if you want strict or loose inequality