Saturday, February 22, 2014

Cassandra cluster setup in 2 minutes

Cassandra is a very popular member of distributed nosql dbms and is one of the most scalable, fastest, and very robust NoSQL database. The steps documented in this post are very basic in nature and you should consider tuning this for production grade cluster setup, however, this is good enough to smackdown and explore Cassandra's capabilities.

Basic Cluster Configuration:

Step 1: Setting up on a single node.

Replace the download url with your closest mirror.
Here is a sample command for version 2.5, this command will download, extract and rename the folder

wget && tar xvzf apache-cassandra-2.0.5-bin.tar.gz && mv apache-cassandra-2.0.5 cassandra25_node1_dc1

Step 2 (Optional): Edit configuration to modify following as per your standards.

    - /home/cassandra/data
commitlog_directory: /home/cassandra/data/commitlog
saved_caches_directory: /home/cassandra/saved_caches

log4j.appender.R.File: /home/cassandra/system.log

Repeat Step 1 and 2 in another machine/vdi

At this point we have a basic setup configured and you should be able to launch the nodes
independently, However, the nodes are not yet clustered and can not communicate with each other.
./bin/cassandra -f

Step 3: Cluster nodes

We need to make few more changes to our configuration file to let the nodes cluster
Provide a logical name for your cluster, E.g.
cluster_name: 'hari_cassandra_ring'

Seeds - For a cassandra node to participate in a cluster it has to know about one other node in the datacenter, this is called as "seed" node
in cassandra config file, this can be a comma separated list of servers, the documentation suggests to avoid a chicken and egg reference while defining the seed node

seeds: ""

listen_address - This should be a private address that nodes connect to for inter node communication
for simple configuration we can leave this as the ip address or hostname of the node.

This is the rpc communication interface, for basic configuration we will leave this same as listen_address

initial_token - This is another important aspect of cluster configuration and governs load distribution across nodes, for the purpose of this demo I will leave it as blank, you may refer cassandra documentation on how this can be defined based on the number of nodes within the data center.

Step 3: Test cluster setup

You can now fire up one node at a time as follows "cassandra25_node1_dc1/bin/cassandra -f"

As you bring up more nodes we should be able to see similar messages indicating cluster node handshake.

INFO 22:06:20,974 Handshaking version with /
 INFO 22:06:23,023 Node / is now part of the cluster
 INFO 22:06:23,047 Handshaking version with /
 INFO 22:06:23,061 InetAddress / is now UP
 INFO 22:06:23,207 InetAddress / is now DOWN
 INFO 22:06:23,212 Handshaking version with /
 INFO 22:06:24,037 InetAddress / is now UP
 INFO 22:06:53,449 [Stream #6e422a30-99e4-11e3-858d-e535fdb952e8] Received streaming plan for Bootstrap
 INFO 22:06:53,590 [Stream #6e422a30-99e4-11e3-858d-e535fdb952e8] Session with / is complete

Another command to check cluster / node status is nodetool command

./cassandra25_node1_dc1/bin/nodetool status

Datacenter: datacenter1
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens  Owns (effective)  Host ID                               Rack
UN  68.61 KB   256     100.0%            005d1cea-aa68-41b0-9a75-0051dd431930  rack1
UN  73.14 KB   256     100.0%            8ca40713-2eb5-44df-8a52-6cd838a492e3  rack1

Sunday, February 2, 2014

WCS SQL Cheat Sheet

I have compiled together a list of frequently used WCS SQL's by subsystem's
Please do leave a comment if you have some useful snippet that you would like to share with the community, I will update the corresponding GIST frequently to keep the list up to date.