Randomized Sort: Cassandra Range Query Using CompositeType

Monday, November 21, 2011

Cassandra Range Query Using CompositeType

CompositeType is a powerful technique to create indices using regular column families instead of super families. But there is a dearth of information on how to use CompositeType in Cassandra. Introduced in 0.8.1 in May 2011 , it is a relatively new comer to Cassandra. It doesn't help that it is not even in the "official" datatype documentation on Casandra 1.0 and 0.8! This article pieces together various tidbits to bring you a complete how-to guide on programming CompositeType. The code examples will use Hector.

Let's say we want to define a column family as the following:
row key: string
column key: composite of an integer and a string
column value: string

We can define the following schema on the cli:

create column family MyCF
    with comparator = 'CompositeType(IntegerType,UTF8Type)'
    and key_validation_class = 'UTF8Type'
    and default_validation_class = 'UTF8Type';

We can also define the same schema programmatically in Hector:

// Step 1: Create a cluster
CassandraHostConfigurator chc 
      = new CassandraHostConfigurator("localhost");
Cluster cluster = HFactory.getOrCreateCluster(
                        "Test Cluster", chc);

// Step 2: Create the schema
ColumnFamilyDefinition myCfd 
      = HFactory.createColumnFamilyDefinition(
            "MyKS", "MyCF", ComparatorType.COMPOSITETYPE);
// Thanks to Shane Perry for this tip.
// http://groups.google.com/group/hector-users/
//       browse_thread/thread/ffd0895a17c7b43e)
myCfd.setComparatorTypeAlias("(IntegerType, UTF8Type)");
myCfd.setKeyValidationClass(UTF8Type.class.getName());
myCfd.setDefaultValidationClass(UTF8Type.class.getName());
KeyspaceDefinition myKs = HFactory.createKeyspaceDefinition(
      "MyKS", ThriftKsDef.DEF_STRATEGY_CLASS, 1, 
      Arrays.asList(myCfd));

// Step 3: Add schema to the cluster
cluster.addKeyspace(myKs, true);
KeySpace ks = HFactory.createKeyspace(myKs, cluster);

Now let's insert a single row with 2 columns:

String rowKey = "row1";

// First column key
Composite colKey1 = new Composite();
colKey1.addComponent(1, IntegerSerializer.get());
colKey1.addComponent("c1", StringSerializer.get());

// Second column key
Composite colKey2 = new Composite();
colKey2.addComponent(2, IntegerSerializer.get());
colKey2.addComponent("c2", StringSerializer.get());

// Insert both columns into row1 at once
Mutator<String> m 
      = HFactory.createMutator(ks, LongSerializer.get());
m.addInsertion(rowKey, "MyCF", 
      HFactory.createColumn(colKey1, "foo", 
                            new CompositeSerializer(), 
                            StringSerializer.get()));
m.addInsertion(rowKey, "MyCF", 
      HFactory.createColumn(colKey2, "bar", 
                            new CompositeSerializer(), 
                            StringSerializer.get()));
m.execute();

After the insertion, the column family should look like this table:

row1	{1, c1}	{2, c2}
row1	foo	bar

Now let's retrieve the first column using a slice query on only the first integer component of composite column key. Since Cassandra orders composite keys by components in each composite, we can construct a search range from {0, "a"} to {1, "\uFFFF} which will include {1, "c1"} but not {2, "c2"}.

SliceQuery<String, Composite, String> sq 
      =  HFactory.createSliceQuery(ks, StringSerializer(), 
                                   new CompositeSerializer(), 
                                   StringSerializer());
sq.setColumnFamily("MyCF");
sq.setKey("row1");

// Create a composite search range
Composite start = new Composite();
start.addComponent(0, IntegerSerializer.get());
start.addComponent("a", StringSerliazer.get());
Composite finish = new Composite();
finish.addComponent(1, IntegerSerializer.get());
finish.addComponent(Character.toString(Character.MAX_VALUE), 
                    StringSerliazer.get());
sq.setRange(start, finish, false, 100);

// Now search.
sq.execute();
// TODO: Parse the result to get the first column

It is unfortunate that a JavaDoc typo in the Cassandra source code prevents tools like Eclipse from displaying documentation about CompositeType. But you can always view the source online to get the precision definition and encoding scheme of CompositeType. Reading source code has been and is still the best way of learning new features in Cassandra.

13 comments:

sumitNovember 24, 2011 at 11:34 PM
after // Step 3: Add schema to the cluster
if i need to add another super column family in this schema so how it is possible?
ReplyDelete
Replies
YCDecember 11, 2011 at 3:26 PM
Super column family is frowned up these days. The suggestion I got from experienced Cassandra devs is to use Composite type instead. Someone even told me that there were discussions to deprecate Super column family in a future Cassandra release. Ed Anuff did a great post on composite vs super-family here:
http://www.anuff.com/2011/02/indexing-in-cassandra.html
ReplyDelete
Replies
SivaJanuary 16, 2012 at 11:29 PM
If I use CompositeType for row keys as key_validation_class = 'CompositeType(UTF8Type, UTF8Type) ' with RamdonPartitioner then how to perform range queries where only the first component is known and you have to fetch all the rows matching the first row? The second component can be anything. When I set only the first component for both start and end keys in RangeSliceQuery.setKeys method I don't get anything back
ReplyDelete
Replies
hardikbhalaniMay 29, 2013 at 3:27 AM
how to make this work for TimeUUIDType instead of integer type in composite key...what to do if we dont want to include timeUUID in range query...it only works if I put TimeUUID as second part of composite key
ReplyDelete
Replies
UnknownAugust 17, 2020 at 9:56 PM
nice post..
Python Coaching Classes near me | Python Tutorial in coimbatore | python Training Institute in coimbatore| Best Python Training Centre | Online python Training Institute in coimbatore | Python Course with placement in coimbatore | Python Course training in coimbatore | Python training in saravanampatti
ReplyDelete
Replies
VatvrikshMay 31, 2021 at 6:02 AM
Nice Post.Thanks for sharing
ForWomen Advisor India connect with us.
ReplyDelete
Replies
AtlantapartyrideMay 31, 2021 at 6:20 AM
Nice article to read thanks for sharing.To book our party bus atlanta connect with us at affordable rates limo services are available.
ReplyDelete
Replies
NikhilJanuary 19, 2026 at 9:43 AM
Django training online provides structured lessons with flexible schedules.It focuses on backend skill improvement.This Django training online increases productivity.It is effective.
ReplyDelete
Replies
vrJanuary 27, 2026 at 7:24 AM
This comment has been removed by the author.
ReplyDelete
Replies
vrJanuary 27, 2026 at 9:10 AM
Learn to access, verify, and manage digital land records using the Bhoomi land information system. dell boomi training online
ReplyDelete
Replies

Add comment

Randomized Sort

Monday, November 21, 2011

Cassandra Range Query Using CompositeType

13 comments:

Followers

About Me