Tuesday, October 16, 2012

How to Quickly Evaluate Cloud Fitness

Cloud computing is an effective platform to build highly scalable and fault tolerant software products. But cloud computing is often confused with clustering. The so-called private cloud often muddles the water even more. While virtualization gives programmers the illusion of disappearing physical server boundaries, in reality virtualization also makes CPUs and networks themselves disappear easily on a whim. Therefore, a software product that is designed to run on physical servers in a data center often fails to live up to its promise in an elastic cloud environment. This article offers four simple clues to reveal if a software product is fit for cloud. The “cloud” used here refers to Amazon Web Service (AWS).

Clue #1: Are the addresses of all cluster nodes defined in a configuration file?
This is the first sign of trouble. An elastic virtual machine doesn’t have an IP address until it boots up.

Clue #2: Is shared storage required between cluster nodes?
There is no shared storage in the cloud.

Clue #3: Does the cluster use multicast?
Multicast strikes fear to the hearts of network admins. Muticast packets are problematic in traversing subnets. A layer 2 network emulated on top of layer 3 only makes the problem worse. No wonder multicast is disabled in AWS.

Clue #4: Does the cluster rely on UDP to manage cluster membership?
Your “subnet” in the cloud actually runs on an overlay network. Ever heard of routers dropping UDP packets in time of congestion?

Armed with these three clues, one can quickly filter out a lot of vendor noise in the web sphere today.

Tuesday, July 3, 2012

Jersey Unit Testing with Guice and EasyMock

This post will expand on my earlier postings on Guice integration in Jersey here
and here. Jersey has a little known but powerful in-memory test framework. Using
an in-memory test container along with Guice allows you to unit test resource
lookup without the expense of full HTTP protocol handling. This post shows you
how.

First, a sample software stack in Maven POM:
   <properties>
       <guice.version>3.0</guice.version>
       <jersey.version>1.12</jersey.version>
       <easymock.version>3.1</easymock.version>
   </properties>
  <dependencies>
    <dependency>
        <groupId>com.sun.jersey</groupId>
        <artifactId>jersey-server</artifactId>
        <version>${jersey.version}</version>
    </dependency>
    <dependency>
        <groupId>com.sun.jersey.contribs</groupId>
        <artifactId>jersey-guice</artifactId>
        <version>${jersey.version}</version>
    </dependency>
   <dependency>
      <groupId>com.google.inject.extensions</groupId>
      <artifactId>guice-assistedinject</artifactId>
      <version>${guice.version}</version>
   </dependency>      
    <dependency>
        <groupId>com.sun.jersey.jersey-test-framework</groupId>
        <artifactId>jersey-test-framework-inmemory</artifactId>
        <version>${jersey.version}</version>
        <scope>test</scope>
    </dependency>
   <dependency>
       <groupId>org.easymock</groupId>
       <artifactId>easymock</artifactId>
       <version>${easymock.version}</version>
       <scope>test</scope>
   </dependency>  
  </dependencies>  

We will now work through unit-testing a sub-resource locator example from my
last post. The resource classes are reproduced here for convinence:
// In BarResource.java file
class BarResource {
   @GET Response get();
}
 
// In FooResource.java file
import com.google.inject.Provider;
@Path("/foo")
class FooResource {
   private final Provider barProvider;
 
   @Inject 
   FooResource(final Provider barProvider) {
      this.barProvider = barProvider;
   }
   
   @Path("bar") 
  @Produces(MediaType.APPLICATION_JSON)
   public Response getBar() {
      // Client request /bar will will be redirected
      //to BarResource
      BarResource bar = barProvider.get();
      bar.get();
   }
}
Our goal is to mock the sub-resource BarResource so we can uni-test resource lookup in FooResource without launching a full-scale HTTP client and server. How do we do this? The trick lies in Jerey's InMemoryTestContainerFactory. Unfortunately, it is not obvious that you can provide your own IoC container with this Factory. You only need to make one line change in the start() method.

Change:

webApp.initiate(resourceConfig);
To:
webApp.initiate(resourceConfig, new GuiceComponentProviderFactory(resourceConfig, injector));
We would have preferred InMemoryTestContainerFactory to be made extensible so we can just pass-in our injector. But we make do for now by creating our own GuiceInMmoryTestContainerFactory class based on the InMemoryTestContainerFactory code with this one line change. I will only show a skeleton implementation here to save space:
public final class GuiceInMemoryTestContainerFactory implements
        TestContainerFactory {

    private final Injector injector;

    public GuiceInMemoryTestContainerFactory(final Injector injector) {
        this.injector = injector;
    }

    @Override
    public Class<LowLevelAppDescriptor> supports() {
        return LowLevelAppDescriptor.class;
    }

    @Override
    public TestContainer create(final URI baseUri, final AppDescriptor ad){
        if (!(ad instanceof LowLevelAppDescriptor)) {
            throw new IllegalArgumentException(
                    "The application descriptor must be an instance of LowLevelAppDescriptor");
        }

        return new GuiceInMemoryTestContainer(baseUri, (LowLevelAppDescriptor) ad, injector);
        
    }

    /**
     * The class defines methods for starting/stopping an in-memory test container,
     * and for running tests on the container.
     */
    private static final class GuiceInMemoryTestContainer implements TestContainer {

        // Copy other fields from InMemoryTestContainer here.
        
        final Injector injector;

        /**
         * Creates an instance of {@link InMemoryTestContainer}
         * @param baseUri URI of the application
         * @param ad instance of {@link LowLevelAppDescriptor}
         */
        private GuiceInMemoryTestContainer(final URI baseUri, final LowLevelAppDescriptor ad, 
                final Injector injector) {
            // Copy other statements from InMemoryTestContainer here
            this.injector = injector;
        }

        // Copy other methods from InMemoryTestContainer here

        @Override public void start() {
            if (!webApp.isInitiated()) {
                LOGGER.info("Starting low level InMemory test container");

                webApp.initiate(resourceConfig, 
                                new GuiceComponentProviderFactory(resourceConfig, injector));
            }
        }
    }
}    
Now we use Jersey's test framework JerseyTest to write our unit test for FooResource.
The key elements are:
  1. Statically initialize a Guice injector;
  2. Use GuiceInMemoryTestContainer to initilize the test framework;
  3. Use JerseyServletModule to mock up dependencies.
Here is the code:
public class FooResourceTest extends JerseyTest {
    private static Injector injector;
    @BeforeClass public static void init() {
        injector = Guice.createInjector(new MockServletModule());
    }
    
    public FooResourceTest() {
        super(new GuiceInMemoryTestContainerFactory(injector));
    }
    
    @Test public void testGetBar() {
        BarResource barMock = injector.getInstance(BarResource.class);
        barMock.get();
        EasyMock.expectLastCall().andStubReturn(createMock(Response.class));
        EasyMock.replay(barMock);
        
        WebResource wr = resource();
        ClientResponse r = wr.path("/foo/bar").get(ClientResponse.class);
        assertNotNull(r.getStatus());
        
        EasyMock.verify(barMock);
    }
    

    private static class MockServletModule extends JerseyServletModule {
        @Override protected void configureServlets() {
            bind(FooResource.class);
        }
        
        @Provides BarResource providesBarResource() {
            BarResource barMock = createMock(BarResource.class);
            return barMock;
        }
    }
    
}
Run this test and you will be on your way to test restful interactions in a
Guice-enabled POJO fashion.

Thursday, June 28, 2012

On-Demand Object Injection with Guice in Jersey

In Java, you can create a new object instance by calling the new operator. It allows you to use an object instance on-demand, e.g. when a condition is met:
class Bar {
   void doSomething();
}
class Foo {
   void process() {
      boolean condition;
      // Do something and then check condition
      if (condition) {
         // Create a new Bar instance to do something 
         //only when condition is true
         Bar bar = new Bar();
         bar.doSomething();
      }
   }
}
In the example above, a new Bar instance is created on-demand when condition is true. But how would you do this in Guice when you are using Guice to "inject", a.k.a. create your objects? Guice is designed around the principle of eager dependency specification at the time of object construction. When an object Foo is created, all its dependendencies should have been "injected" by Guice during the object constrution phase. This kind of question is typical for a "framework" like Guice. A framework codifies a practice. Guice codifies the Factory pattern. But a framework often obfuscates idioms outside the codified pattern. So does Guice. How do you "new" an object Bar on-demand without first creating it in the constructor of the enclosing class Foo? It is actually quite easy in Guice. It is called "provider injection", i.e. injecting object factory. Guice automatically creates a provider for every object class that it injects. So assuming both Bar and Foo are injected by Guice like this:
import com.google.inject.AbstractModule;
class GuiceModule extends AbstractModule {
   @Override
   protected final void configure() {
      bind(Bar.class);
      bind(Foo.class);
   }
You can then inject a provider of Bar into Foo so you can ask Guice for a new instance of Bar whenenver you need it:
class Bar {
   void doSomething();
}

import com.google.inject.Provider;
class Foo {
   private final Provider<Bar> barProvider;

   @Inject
   Foo(final Provider<Bar> barProvider) {
      this.barProvider = barProvider;
   }
  
   void process() {
      boolean condition;
      // Do something and then check condition
      if (condition) {
         // Create a new Bar instance to do something 
         //only when condition is true
         Bar bar = barProvider.get();
         bar.doSomething();
      }
   }
}
This is the technique to use when you write sub-resource locators in Jersey with
Guice as the IoC container:
public class BarResource {
   @GET
   public Response get();
}

import com.google.inject.Provider;
@Path("/")
public class FooResource {
   private final Provider<BarResource> barProvider;

   @Inject
   FooResource(final Provider<BarResource> barProvider) {
      this.barProvider = barProvider;
   }
  
   @Path("bar")
   @Produces(MediaType.APPLICATION_JSON)
   public Response getBar() {
      // Client request /bar will will be redirected 
      //to BarResource
      BarResource bar = barProvider.get();
      bar.get();
   }
}

Monday, June 25, 2012

Poor Man's Static IP for EC2 a.k.a. Elastic Network Interface

Amamzon's Elastic Network Interface (EIN) allows you to "reserve" an IP address. This is immensely useful in VPC because an EIN can function as a pseudo static IP for elastic instances. Granted, you have to use two IPs for a single instance. But EIN lets you assign a fixed private IP address to an elastic instance without having to go through the trouble of setting up dynamic DNS update. Unfortunately, Amazon's documentation is missing key information on configurting secondary IP with EIN. Even if you have attached an EIN to an elastic instance, you cannot access the instance using the private IP associated with the EIN. What gives? The missing piece is IP interface and routing configuration. Below is a step-to-step guide to configure the EIN interface. This guide assumes that you have followed the official AWS guide to the point where you have configured an EIN and have brought up an elastic instance that is attached with that EIN. Further, it assumes that the primary interface is assigned an IP 10.3.1.190 and the secondary interface, which is the EIN, is assigned an IP 10.3.1.191. At the end of the exercise, we will be able to ssh to the secondary IP address in addition to the primary one.

First, check IP address binding to each network interface.

$ sudo ip address

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 02:26:69:f0:87:46 brd ff:ff:ff:ff:ff:ff
inet 10.3.1.190/24 brd 10.3.1.255 scope global eth0
inet6 fe80::26:69ff:fef0:8746/64 scope link
valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
link/ether 02:26:69:dc:cc:62 brd ff:ff:ff:ff:ff:ff


We see from the output the current network interface assignment is of the following:

eth0: 10.3.1.190
eth1: none

Therefore,  the first order of business is to assign the EIN IP address to the interface eth1:

$ sudo ip address add 10.3.1.191/24 brd + dev eth1

Next, bring up the interface:

$ sudo ip link set dev eth1 up

Verify that eth1 is indeed up:

$ sudo ip address
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 02:26:69:f0:87:46 brd ff:ff:ff:ff:ff:ff
inet 10.3.1.190/24 brd 10.3.1.255 scope global eth0
inet6 fe80::26:69ff:fef0:8746/64 scope link
valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 02:26:69:dc:cc:62 brd ff:ff:ff:ff:ff:ff
inet 10.3.1.191/24 brd 10.3.1.255 scope global eth1
inet6 fe80::26:69ff:fedc:cc62/64 scope link
valid_lft forever preferred_lft forever


Next, find out the default gateway:

$ ip route show
default via 10.3.1.1 dev eth0
10.3.1.0/24 dev eth0 proto kernel scope link src 10.3.1.190
10.3.1.0/24 dev eth1 proto kernel scope link src 10.3.1.191


The default gateway is 10.3.1.1 in the output. It is bound to the virtual gateway associated with the VPC.  Since it is currently only bound to eth0, any traffic from eth1 that is destined to IP addresses outside the 10.3.1.0/24 IP block will be dropped! We need to reconfigure IP routing on the elastic instance to allow IP packets leaving eth1 to be routed through the default gateway. Here is how you do it.

First, add a new routing table called "awsein":

$ sudo echo 2 awsein >> /etc/iproute2/rt_tables

It will add a table called "awsein" to rt_tables as entry 2:
$ cat /etc/iproute2/rt_tables
#
# reserved values
#
255 local
254 main
253 default
0 unspec
#
# local
#
#1 inr.ruhep
2 awsein


Now adds a default route in the new table to use the same default gateway as the one used by eth0:

$ sudo ip route add default via 10.3.1.1 dev eth1 table awsein
$ sudo ip route flush cache

Confirm that the new route is indeed added:

$ ip route show table awsein
default via 10.3.1.1 dev eth1 metric 1000


Next, we need to create a new routing rule to trigger the default route on eth1 by its source IP. To do this, we first check existing routes:
$ ip rule
0: from all lookup local
32766: from all lookup main
32767: from all lookup default


Note the number 32766 for the rule "main". We will now add a new rule to "awsein" with a priority smaller than the one for "main".

$ sudo ip rule add from 10.3.1.191 lookup awsein prio 1000
Finally, verify the new rule configuration:

$ ip rule
0:      from all lookup local

1000:   from 10.3.1.191 lookup awsein
32766:  from all lookup main
32767:  from all lookup default

Now you can ssh into the instance using the EIN IP 10.3.1.191! Happy hacking.


Monday, January 9, 2012

ec2-bundle-vol Error "cert-ec2.pem: No such file or directory"

cert-ec2.pem is Amazon's public X.509 certification. ec2-bundle-vol needs it to bundle up an image to S3. But the EC2 API tools shipped with Amazon Linux AMI seem to exclude this crucial file when bundling up from an instance-store backed instance. This can cause problems down the road when you want to further customize your AMI. For example, Amazon has a 32-bit instance-store backed AMI in the us-east region with ID "ami-4b814f22". We launch an EC2 instance with this AMI, customize it, bundle it up using ec2-bundle-vol, and finally register the bundle. So we are now at, say, ami-12345678. We launch a new EC2 instance with ami-12345678, customize it again and then bundle up the new customization. But the ec2-bundle-vol command will fail this time with an error like this:

error reading certificate file /opt/aws/amitools/ec2/etc/ec2/amitools/cert-ec2.pem: No such file or directory - /opt/aws/amitools/ec2/etc/ec2/amitools/cert-ec2.pem
This looks like a bug in EC2 tools shipped by Amazon. An easy but tedious workaround is to launch an instance off the original Amazon AMI, i.e ami-4b814f22 and then copy over the cert-ec2.pem before running ec2-bundle-vol .

This problem has been reported here. Hope Amazon will devise a fix soon to save users from this misery.




Sunday, December 11, 2011

Fixing libssl and libcrypto Errors in Datastax OpsCenter Startup

Update 2011-12-19: For 64-bit Amazon Linux AMI, install openssl0.9.8 by the command "sudo yum install openssl098e-0.9.8e-17.7.amzn1.x86_64". Thanks to thobbs from the datatax forum for this tip.

The AWS Linux AMI I use has openssl 1.0.0 but DataStax OpsCenter 1.3.1 requires version 0.9.8 of libssl and libcrypto. Why didn't they say so in the docs?? The worst customer experience you can give to your user base is to let your software blow up at startup like this:

Failed to load application: libcrypto.so.0.9.8: cannot open shared object file: No such file or directory

This problem was apparently reported a month ago:
http://www.datastax.com/support-forums/topic/issue-starting-opscenterd-service

But no action has been taken to correct it...sigh...

Here is how we can fix it temporarily on our own before the Cassandra devs get their acts together:

1) Install openssl 0.9.8
sudo yum install openssl098e-0.9.8e-17.7.amzn1.i686

2) Change to /usr/lib and manually create following two symbolic links:
sudo ln -s libssl.so.0.9.8e libssl.so.0.9.8
sudo ln -s libcrypto.so.0.9.8e libcrypto.so.0.9.8

Now OpsCenter will start without the dreaded ssl error.

Monday, November 21, 2011

Cassandra Range Query Using CompositeType

CompositeType is a powerful technique to create indices using regular column families instead of super families. But there is a dearth of information on how to use CompositeType in Cassandra. Introduced in 0.8.1 in May 2011 , it is a relatively new comer to Cassandra. It doesn't help that it is not even in the "official" datatype documentation on Casandra 1.0 and 0.8! This article pieces together various tidbits to bring you a complete how-to guide on programming CompositeType. The code examples will use Hector.

Let's say we want to define a column family as the following:
row key: string
column key: composite of an integer and a string
column value: string

We can define the following schema on the cli:

create column family MyCF
    with comparator = 'CompositeType(IntegerType,UTF8Type)'
    and key_validation_class = 'UTF8Type'
    and default_validation_class = 'UTF8Type';

We can also define the same schema programmatically in Hector:

// Step 1: Create a cluster
CassandraHostConfigurator chc 
      = new CassandraHostConfigurator("localhost");
Cluster cluster = HFactory.getOrCreateCluster(
                        "Test Cluster", chc);

// Step 2: Create the schema
ColumnFamilyDefinition myCfd 
      = HFactory.createColumnFamilyDefinition(
            "MyKS", "MyCF", ComparatorType.COMPOSITETYPE);
// Thanks to Shane Perry for this tip.
// http://groups.google.com/group/hector-users/
//       browse_thread/thread/ffd0895a17c7b43e)
myCfd.setComparatorTypeAlias("(IntegerType, UTF8Type)");
myCfd.setKeyValidationClass(UTF8Type.class.getName());
myCfd.setDefaultValidationClass(UTF8Type.class.getName());
KeyspaceDefinition myKs = HFactory.createKeyspaceDefinition(
      "MyKS", ThriftKsDef.DEF_STRATEGY_CLASS, 1, 
      Arrays.asList(myCfd));

// Step 3: Add schema to the cluster
cluster.addKeyspace(myKs, true);
KeySpace ks = HFactory.createKeyspace(myKs, cluster);

Now let's insert a single row with 2 columns:

String rowKey = "row1";

// First column key
Composite colKey1 = new Composite();
colKey1.addComponent(1, IntegerSerializer.get());
colKey1.addComponent("c1", StringSerializer.get());

// Second column key
Composite colKey2 = new Composite();
colKey2.addComponent(2, IntegerSerializer.get());
colKey2.addComponent("c2", StringSerializer.get());

// Insert both columns into row1 at once
Mutator<String> m 
      = HFactory.createMutator(ks, LongSerializer.get());
m.addInsertion(rowKey, "MyCF", 
      HFactory.createColumn(colKey1, "foo", 
                            new CompositeSerializer(), 
                            StringSerializer.get()));
m.addInsertion(rowKey, "MyCF", 
      HFactory.createColumn(colKey2, "bar", 
                            new CompositeSerializer(), 
                            StringSerializer.get()));
m.execute();

After the insertion, the column family should look like this table:

row1 {1, c1} {2, c2}
foo bar

Now let's retrieve the first column using a slice query on only the first integer component of composite column key. Since Cassandra orders composite keys by components in each composite, we can construct a search range from {0, "a"} to {1, "\uFFFF} which will include {1, "c1"} but not {2, "c2"}.

SliceQuery<String, Composite, String> sq 
      =  HFactory.createSliceQuery(ks, StringSerializer(), 
                                   new CompositeSerializer(), 
                                   StringSerializer());
sq.setColumnFamily("MyCF");
sq.setKey("row1");

// Create a composite search range
Composite start = new Composite();
start.addComponent(0, IntegerSerializer.get());
start.addComponent("a", StringSerliazer.get());
Composite finish = new Composite();
finish.addComponent(1, IntegerSerializer.get());
finish.addComponent(Character.toString(Character.MAX_VALUE), 
                    StringSerliazer.get());
sq.setRange(start, finish, false, 100);

// Now search.
sq.execute();
// TODO: Parse the result to get the first column

It is unfortunate that a JavaDoc typo in the Cassandra source code prevents tools like Eclipse from displaying documentation about CompositeType. But you can always view the source online to get the precision definition and encoding scheme of CompositeType. Reading source code has been and is still the best way of learning new features in Cassandra.