Sunday, December 11, 2011
Fixing libssl and libcrypto Errors in Datastax OpsCenter Startup
The AWS Linux AMI I use has openssl 1.0.0 but DataStax OpsCenter 1.3.1 requires version 0.9.8 of libssl and libcrypto. Why didn't they say so in the docs?? The worst customer experience you can give to your user base is to let your software blow up at startup like this:
Failed to load application: libcrypto.so.0.9.8: cannot open shared object file: No such file or directory
This problem was apparently reported a month ago:
http://www.datastax.com/support-forums/topic/issue-starting-opscenterd-service
But no action has been taken to correct it...sigh...
Here is how we can fix it temporarily on our own before the Cassandra devs get their acts together:
1) Install openssl 0.9.8
sudo yum install openssl098e-0.9.8e-17.7.amzn1.i686
2) Change to /usr/lib and manually create following two symbolic links:
sudo ln -s libssl.so.0.9.8e libssl.so.0.9.8
sudo ln -s libcrypto.so.0.9.8e libcrypto.so.0.9.8
Now OpsCenter will start without the dreaded ssl error.
Monday, November 21, 2011
Cassandra Range Query Using CompositeType
CompositeType is a powerful technique to create indices using regular column families instead of super families. But there is a dearth of information on how to use CompositeType in Cassandra. Introduced in 0.8.1 in May 2011 , it is a relatively new comer to Cassandra. It doesn't help that it is not even in the "official" datatype documentation on Casandra 1.0 and 0.8! This article pieces together various tidbits to bring you a complete how-to guide on programming CompositeType. The code examples will use Hector.
Let's say we want to define a column family as the following:
row key: string
column key: composite of an integer and a string
column value: string
We can define the following schema on the cli:
create column family MyCF
with comparator = 'CompositeType(IntegerType,UTF8Type)'
and key_validation_class = 'UTF8Type'
and default_validation_class = 'UTF8Type';
We can also define the same schema programmatically in Hector:
// Step 1: Create a cluster
CassandraHostConfigurator chc
= new CassandraHostConfigurator("localhost");
Cluster cluster = HFactory.getOrCreateCluster(
"Test Cluster", chc);
// Step 2: Create the schema
ColumnFamilyDefinition myCfd
= HFactory.createColumnFamilyDefinition(
"MyKS", "MyCF", ComparatorType.COMPOSITETYPE);
// Thanks to Shane Perry for this tip.
// http://groups.google.com/group/hector-users/
// browse_thread/thread/ffd0895a17c7b43e)
myCfd.setComparatorTypeAlias("(IntegerType, UTF8Type)");
myCfd.setKeyValidationClass(UTF8Type.class.getName());
myCfd.setDefaultValidationClass(UTF8Type.class.getName());
KeyspaceDefinition myKs = HFactory.createKeyspaceDefinition(
"MyKS", ThriftKsDef.DEF_STRATEGY_CLASS, 1,
Arrays.asList(myCfd));
// Step 3: Add schema to the cluster
cluster.addKeyspace(myKs, true);
KeySpace ks = HFactory.createKeyspace(myKs, cluster);
Now let's insert a single row with 2 columns:
String rowKey = "row1";
// First column key
Composite colKey1 = new Composite();
colKey1.addComponent(1, IntegerSerializer.get());
colKey1.addComponent("c1", StringSerializer.get());
// Second column key
Composite colKey2 = new Composite();
colKey2.addComponent(2, IntegerSerializer.get());
colKey2.addComponent("c2", StringSerializer.get());
// Insert both columns into row1 at once
Mutator<String> m
= HFactory.createMutator(ks, LongSerializer.get());
m.addInsertion(rowKey, "MyCF",
HFactory.createColumn(colKey1, "foo",
new CompositeSerializer(),
StringSerializer.get()));
m.addInsertion(rowKey, "MyCF",
HFactory.createColumn(colKey2, "bar",
new CompositeSerializer(),
StringSerializer.get()));
m.execute();
After the insertion, the column family should look like this table:
| row1 | {1, c1} | {2, c2} |
| foo | bar |
Now let's retrieve the first column using a slice query on only the first integer component of composite column key. Since Cassandra orders composite keys by components in each composite, we can construct a search range from {0, "a"} to {1, "\uFFFF} which will include {1, "c1"} but not {2, "c2"}.
SliceQuery<String, Composite, String> sq
= HFactory.createSliceQuery(ks, StringSerializer(),
new CompositeSerializer(),
StringSerializer());
sq.setColumnFamily("MyCF");
sq.setKey("row1");
// Create a composite search range
Composite start = new Composite();
start.addComponent(0, IntegerSerializer.get());
start.addComponent("a", StringSerliazer.get());
Composite finish = new Composite();
finish.addComponent(1, IntegerSerializer.get());
finish.addComponent(Character.toString(Character.MAX_VALUE),
StringSerliazer.get());
sq.setRange(start, finish, false, 100);
// Now search.
sq.execute();
// TODO: Parse the result to get the first column
It is unfortunate that a JavaDoc typo in the Cassandra source code prevents tools like Eclipse from displaying documentation about CompositeType. But you can always view the source online to get the precision definition and encoding scheme of CompositeType. Reading source code has been and is still the best way of learning new features in Cassandra.
Wednesday, October 26, 2011
The “initial_token” in Cassandra Means the “Very First Time”
Monday, October 17, 2011
Counting All Rows in Cassandra
The SQL language makes counting rows deceptively simple:
SELECT count(*) from MYTABLE;The count function in the select clause iterates through all rows retrieved from mytable to arrive at a total count. But it is an anti-pattern to iterate through all rows in a column family in Cassandra because Cassandra is a distributed datastore. By its very nature of Big-Data, the total row count of a column family may not even fit in memory on a single 32-bit machine! But sometimes when you load a large static lookup table into a column family, you may want to verify that all rows are indeed stored in the cluster. However, before you start writing code to count rows, you should remember that:
- Counting by retrieving all rows is slow.
- The first scan may not return the total count due to delay in replication.
public int totalRowCount() {
String start = null;
String lastEnd = null;
int count = 0;
while (true) {
RangeSlicesQuery<String, String, String> rsq =
HFactory.createRangeSlicesQuery(ksp, StringSerializer.get(),
StringSerializer.get(), StringSerializer.get());
rsq.setColumnFamily("MY_CF");
rsq.setColumnNames("MY_CNAME");
// Nulls are the same as get_range_slices with empty strs.
rsq.setKeys(start, null);
rsq.setReturnKeysOnly(); // Return column names instead of values
rsq.setRowCount(1000); // Arbiturary default
OrderedRows<String, String, String> rows = rsq.execute().get();
int rowCount = rows.getCount();
if (rowCount == 0) {
break;
} else {
start = rows.peekLast().getKey();
if (lastEnd != null && start.compareTo(lastEnd) == 0) {
break;
}
count += rowCount - 1; // Key range is inclusive
lastEnd = start;
}
}
if (count > 0) {
count += 1;
}
return count;
}
Recursion would be a more elegant solution but be aware of the stack limitation in Java.