Basic apache cassandra installation (and some fine tune improvements)

Lately I’ve been working a lot with apache Cassandra clusters, so I’ll write some posts about it. I will start with the obvious: cassandra installation. It’s a very simple process that you could make even easier using existing .rpm or .deb. But I’ll do it distro-independent so it can be useful for everybody, no matter what distro you use.

1- Prerequisite: Install Java Virtual Machine

For starters, apache cassandra is a java application, so we will need the java virtual machine (jvm) to get it going. There is different jvm versions (openjdk, for instance), but using Sun / Oracle official version is highly recommended. We can find it in this link: http://www.java.com/en/download/linux_manual.jsp?locale=en, or we could install it using the package manager of each distro (it’s packed at least for the main distros: redhat/centos, debian, ubuntu, etc).

2- Downloading Cassandra

Now we download Cassandra. If it’s our first installation, we probably want the last version, and we can find it in the frontpage of their site (it’s 1.1.6 right now). After that, we unpack it where we want it (should be on /opt but you might disagree about that) and soft-link this directory to “/opt/cassandra“, to make future upgrades easier.

cd /opt
wget http://apache.rediris.es/cassandra/1.1.6/apache-cassandra-1.1.6-bin.tar.gz
tar zxvf apache-cassandra-1.1.6-bin.tar.gz
ln -s apache-cassandra-1.1.6 cassandra

3- Managing cassandra user’s permissions

It’s a best practice to not giving to a service more permissions than it’s needed. So we will create a “cassandra” user, that will be the user running our process. And we will create some directories that cassandra will use while running (logs, pids, etc)

adduser cassandra
mkdir /var/log/cassandra
chown -R cassandra /var/log/cassandra/
mkdir /var/run/cassandra
chown -R cassandra /var/run/cassandra/
mkdir /var/lib/cassandra
chown -R cassandra /var/lib/cassandra/
chown -R cassandra /opt/cassandra/

4- Cassandra as a system service

Now we need to start it as a system service. It does not come a init script with the official distribution, but here we have a standard one:

wget http://www.tomas.cat/blog/sites/default/files/cassandra.initd -O /etc/init.d/cassandra
chmod a+x /etc/init.d/cassandra
chkconfig --add cassandra
chkconfig cassandra on

This will make we could start and stop cassandra as a system service, and cassandra starting with the server boot.

5- Cassandra configuration changes

Right now we have a one-node cassandra cluster. We may want to make some configuration changes, such as cassandra RAM allowance. By default, half the RAM is assigned to java HEAP, and 100MB for each CPU, but you can change that in /opt/cassandra/conf/cassandra-env.sh file. You can also change java cmdline options, JMX port and so on. It pretty self-explaining.

Another file we may be interested in is /opt/cassandra/conf/cassandra.yaml, where we can configure a lot of things, such as where data si saved, or IP address and port we want to listen to. Examples:

  • rpc_address: address where thrift will be listening. We must put a existing IP address (it may be localhost, if we want to), or 0.0.0.0 if we want to listen through all of them
  • data_file_directories: directory where we want to store cassandra sstables
  • commitlog_directory: directory where we want to store commit_log
  • saved_caches_directory: directory where we want to store caches

In case we want to make a cluster, we will be interested in these parameters as well:

  • cluster_name: The name you want for your cluster (it must be the same for all cluster nodes and, MOST IMPORTANTLY, different between different clusters, so they don’t step on each other)
  • initial_token: this nodes token
  • listen_address: Address where gossip will be listening to. It can’t be localhost nor 0.0.0.0, because the rest of nodes will try to connect to this address.
  • seeds: Which server should we ask for a list of cluster nodes

6 – Fine tune

There is some things that are not enabled by default, but they upgrade stability and performance. I always enable them! There they are:

6.1- Fine tune 1 – Batch-certified cassandra init script

I think the former init script has a limitation. As cassandra is a java app, the init script just send the “start” or “stop” signal, and it doesn’t wait to the real startup or shutdown. This is why we can’t have a REAL confirmation of the being started, and if one of our batch processes depends on that, you’re screwed. This is why I added some lines checking with netcat if the thrift port is listening, and so, if cassandra is up and running, before returning to the shell. These are the lines:

First we extract thrift address and port from the config file:

CASS_CLI_ADDRESS=cat $CASS_HOME/conf/cassandra.yaml|grep rpc_address|cut -d":" -f2
CASS_CLI_PORT=cat $CASS_HOME/conf/cassandra.yaml|grep rpc_port|cut -d":" -f2

Then we add this check to the “start” function:

nc -z -w 1 $CASS_CLI_ADDRESS $CASS_CLI_PORT
while [ $? -ne 0 ]; do
[ date -r $CASS_LOG +%s -lt date -d "$TIMEOUT_LOG minutes ago" +%s ] && echo "Too much time of inactivity in the log file. Aborting..." && exit 1 ;
[ $start_date -lt date -d "$TIMEOUT_STARTUP minutes ago" +%s ] && echo "It's taking too long to start. Aborting..." && exit 1;
sleep 10;nc -z -w 1 $CASS_CLI_ADDRESS $CASS_CLI_PORT; done

It’s a little tatty, but it works. When we come back to the shell we know for sure cassandra is running. If you want the “tuned” version, you just do this:

wget http://www.tomas.cat/blog/sites/default/files/cassandra_tuned.initd -O /etc/init.d/cassandra
chmod a+x /etc/init.d/cassandra
chkconfig --add cassandra
chkconfig cassandra on

Keep in mind that this script depends on netcat, so you’ll need to install it to work (“yum install nc“, “apt-get install netcat“, or whatever it is you need to do).

6.2- Fine tune 2 – Enable Java Native Access (JNA) in cassandra

Disk access, among other things, improves a lot if you use OS native libraries instead of java (they are slow and resource-consuming). We can do this installing JNA, that does kind of a bridge between Java and native libraries. Installing is as easy as putting .jar files in /opt/cassandra/lib directory, and cassandra will enable them automatically

wget https://github.com/twall/jna/blob/3.5.1/dist/jna.jar?raw=true -O /opt/cassandra/lib/jna.jar
wget https://github.com/twall/jna/blob/3.5.1/dist/platform.jar?raw=true -O /opt/cassandra/lib/platform.jar

We can make sure cassandra is using it if we take a look to cassandra’s log file (/var/log/cassandra/system.log). If it’s not using JNA, we will find this message:

INFO [main] 2012-09-26 16:52:40,051 CLibrary.java (line 62) JNA not found. Native methods will be disabled.

If it has correctly found JNA, we’ll get this other message:

INFO [main] 2012-10-30 16:41:00,970 CLibrary.java (line 109) JNA mlockall successful

6.3- Fine tune 3 – Raising open files limit for cassandra

If we stress cassandra a little, usually it will more than 1024 open files (the default maximum). To avoid this bringing any problems, we should edit /etc/security/limits.conf file and raise that limit. And advisable value is 65536:

root soft nofile 65536
root hard nofile 65536
cassandra soft nofile 65536
cassandra hard nofile 65536

6.4- Fine tune 4 – Putting a limit on Cassandra memory use

Finally, we all know java and how its memory consumption is. You can configure a number, but it will use more than that. Application memory, heap memory, overhead, etc, we can find a 8GB-configured java process using 16GB or more. To ensure this will not happen to us, its highly advisable to put a limit. In our case, we’ve found wise this number to be half the server memory, in /etc/security/limits.conf file:

root soft memlock 8388608
root hard memlock 8388608
cassandra soft memlock 8388608
cassandra hard memlock 8388608

7 – Checking cassandra is up & running

And we’re done, we just need to start the daemon (“/etc/init.d/cassandra start” or “service cassandra start” or whatever). And we can check it using nodetool (connects to cassandra via JMX) or with cassandra-cli (connects to cassandra via thrift). The standard output will be something like this:

[[email protected] ~]# /opt/cassandra/bin/nodetool -h localhost ring
Note: Ownership information does not include topology, please specify a keyspace.
Address DC Rack Status State Load Owns Token
127.0.0.1 datacenter1 rack1 Up Normal 11,21 KB 100,00% 66508542233540571552076363838168202092
[[email protected] ~]# /opt/cassandra/bin/cassandra-cli -h localhost
Connected to: "Test Cluster" on localhost/9160
Welcome to Cassandra CLI version 1.1.6

Type 'help;' or '?' for help.
Type 'quit;' or 'exit;' to quit.

[[email protected]] describe cluster;
Cluster Information:
Snitch: org.apache.cassandra.locator.SimpleSnitch
Partitioner: org.apache.cassandra.dht.RandomPartitioner
Schema versions:
59adb24e-f3cd-3e02-97f0-5b395827453f: [127.0.0.1]

[[email protected]] quit;
[[email protected] ~]#

And that’s it!! We have a basic cassandra installation a little tuned.

How to solve SMTP-AUTH errors on Postfix (or any other mailserver) behind a Cisco PIX firewall

You have configured authentication on outgoing mail (SMTP-AUTH) on your mailserver (postfix, in this case) and it works great. But when you put it on production, the users complain because they can’t send emails.

What do you do? You try to follow the communication step by step. That is, you telnet to port 25 and follow step by step the authentication. The conversation goes like this (the lines beginning with “->” are written by me, without the “->” part):

[email protected]:~$ telnet smtp.example.com 25
Trying 1.2.3.4...
Connected to smtp.example.com.
Escape character is '^]'.
220 smtp.example.com ESMTP server ready
-> EHLO example.com
250-smtp.example.com
250 AUTH CRAM-MD5 DIGEST-MD5
-> AUTH FOOBAR
504 Unrecognized authentication type.
-> AUTH CRAM-MD5
334 PENCeUxFREJoU0NnbmhNWitOMjNGNndAZWx3b29kLmlubm9zb2Z0LmNvbT4=
-> ZnJlZCA5ZTk1YWVlMDljNDBhZjJiODRhMGMyYjNiYmFlNzg2ZQ==
235 Authentication successful.


Everything is fine… Then you tell the customer to do the same, but he says he can’t see the line “220 smtp.example.com ESMTP server ready”, and he only sees a bunch of asterisks. You try it yourself, and it’s true… Plus, it doesn’t recognise the AUTH command!!


[email protected]:~$ telnet smtp.example.com 25
Trying 1.2.3.4...
Connected to smtp.example.com.
Escape character is '^]'.
220*******************************************************0*2******0***********************
2002*******2***0*00
-> EHLO example.com
250-smtp.example.com
250 AUTH CRAM-MD5 DIGEST-MD5
-> AUTH FOOBAR
500 5.5.2 Error: bad syntax
-> AUTH CRAM-MD5
500 5.5.2 Error: bad syntax

What’s going on? Why this difference? It seems the answer is pretty simple…

Cisco Systems puts on every PIX router a protocol to avoid attacks and increase security. These protocols intercept every command sent to the server ant translate them, as a proxy. This protocol is called MailGuard, and only accept basic SMTP command, not the extended ESMTP ones, making it incompatible with SMTP-AUTH. So the only way to make SMTP-AUTH work is to disable it.

That’s an easy thing to do, because the command is pretty simple, connected to the PIX telnet port:

no fixup protocol smtp 25
write mem

The difficult part is to realise someone is messing around communications… But once you’ve discovered it, problem solved!

Using Kubuntu 9.10, or any linux distribution, from a USB device

A year ago we explained how to use Kubuntu 8.10 from a USB device. Now, a year later, I was in the same situation, but with Kubuntu 9.10 instead. And since then we’ve made some steps.

Ubuntu itself brings an application to make a USB device boots Ubuntu. In the panel there is the application “K-> Applications -> System -> USB Startup Disk Creator” (o also with /usr/bin/usb-creator-kde command), which is very simple

We choose the ISO file, we choose the USB drive, and we click “Make a boot disk”. We wait a while and then we can boot from this pendrive.

But this only works for Ubuntu… What about the other distros? Well, there is the application unetbootin, which does the same with all linux distributions (wel, at least with a bunch of them).

Here we choose distribution, the ISO file, drive to write, and it handles everything.

Obviously, this ony works from linux. If you want to do the same from windows, you should ask for advise to our friends at pendrivelinux

Fighting SPAM in blog comments: Mollom

Sometimes, the amount of work you have pending doesn’t allow you to update your blog. You won’t event visit it. Then, a rabid spammers horde comes, with an unhealthy lust for flooding your blog selling Viagra, Cialis and this kind of shit. This has been exactly the case of this blog. Result? Above 350.000 commercial comments, a full database, nobody could add comments since months ago… a mess.

I’ve deleted the comments to start over. But five minutes later there were 20 new SPAM comments! Obviously, I should do something to prevent this from happening…

So I’ve added Mollom to the blog. Apart from being “capicua” (they call it a palindrome) it is powerful: it analyzes the text, and if any SPAM suspicion is raised, then it asks for a CAPTCHA. Since then, there are no SPAM comments on this blog.

Installation has been very easy. First you should register into the website mollom.com, and add a new site (Site manager -> Add new site). We choose “Mollom free” and then we answer the four questions asked. The next step, we’ll be given two keys to put in our blog after.

Once this is done, it comes the part of our site. We download the module from the mollom project at drupal.org,, we uncompress it into the Drupal modules folder, and we activate it at “Administer -> Site building -> Modules”. Once activated, we go to the configuration part (Administer -> Site configuration -> Mollom) and there we put the keys we’ve been given before, and we configure when we want to activate the antispam filter.

Et voilà! It’s running! We killed SPAM in 5 minutes.

Renewing a HTTPS certificate for IIS without starting a renewal request

It seems the usual procedure for renewing HTTPS certificates for IIS is starting a renewal request, sending it to de CA (Verisign, for example), wait for a file in the reply and import it inside your IIS.

But, what can we do if we have the renewal certificate with a former CSR? You get an e-mail with a part like this:

-----BEGIN CERTIFICATE-----
AoGBAOv4w3UeEEarsyIXsBL1zdBi67fC7jFiqhbs0f7/tDRuvnQvj5V7NF7Awhah
9K3J9fPkOPMfTBMmQCFVTLAlUxioh1jLEZOWDPvrB8h7msO5gM1MpufOh4NRS79J
LvyOKdDtXGfYdVRj/TNpNTFu10wLO2y9o8HAkRUlkCDb/xS3AgMBAAGjggF6MIIB
djAJBgNVHRMEAjAAMAsGA1UdDwQEAwIFoDBGBgNVHR8EPzA9MDugOaA3hjVodHRw
Oi8vY3JsLnZlcmlzaWduLmNvbS9DbGFzczNJbnRlcm5hdGlvbmFsU2VydmVyLmNy
f4&dBgNVHSAEPTA7MDkGC2CGSAGG+EUBBxcDMCowKAYIKwYBBQUHAgEWHGh0dHBz
(...)
-----END CERTIFICATE-----

How can we import this inside our IIS? We should follow this steps:

First we export the current certifical. In order to do this, we should go to site properties, tab “Directory Security”:

We start the wizard clicking on “Server Certificate” and go to next screen:

We click on “Next” and go to the next screen:

Where we will choose “Export the current certificate to a .pfx file”. After that, we will be asked where to put it:

And a password for the export. This way we have our certificate exported.

If we look inside the file, we will see it is binary. To convert it to the same format we received on the email, we can use openssl, with this command:

openssl pkcs12 -in cert.pfx -out cert.pem

It will ask us for the password we’ve put before, and will ask for another password to put to the .pem resulting file.
If we edit this file with any text editor, we will see it contains a “certificate” part, delimited by “BEGIN CERTIFICATE” and “END CERTIFICATE” clauses, exactly the same as the part we got on the email. We just should change the former certificate text with the new one. Once we have done this, we can put it again to binary, “understandable” by IIS. In order to do this, we use again openssl:

openssl pkcs12 -export -in cert.pem -out cert-new.pfx

It will ask us for the .pem password, and another password to put to the resulting .pfx. Now, to put it in the IIS site, first we shoult take out the former certificate. In “Directory Security” tab we should start the wizard again, but this time we will choose “Remove the current certificate”:

Clicking “next next” we will take out former certificate:

Now we should import the new certificate. In the wizard we will see a new option: “Import certificate from a .pfx file”:

It will ask us for the file to import, and we should choose cert-new.pfx. It will ask us for the password, the port to listen (usually we will use the default 443) and finally we will have the certificate imported:

If we look at certificate properties, we will see expiration date has changed. We have the certificate renewed!