Troubleshooting pacemaker: Discarding cib_apply_diff message (xxx) from server2: not in our membership

Yesterday, when we rebooted one of our high availability servers (to update the goddamn vmware-tools, screwing my precious uptime), we faced a serious problem: pacemaker was not syncing, so we had lost high availability. The active node could see the cluster as if there was no problem:

[[email protected]]# crm status
============
Last updated: Wed Nov 7 12:36:01 2012
Last change: Tue Nov 6 18:33:15 2012 via crmd on server2
Stack: openais
Current DC: server2 - partition with quorum
Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14
2 Nodes configured, 2 expected votes
6 Resources configured.
============

Online: [ server2 server1 ]
(...)

Passive node, instead, was seeing the cluster as if all nodes, including itself, were off-line:

[[email protected] ]# crm status
============
Last updated: Wed Nov 7 12:36:27 2012
Last change: Wed Nov 7 12:35:57 2012 via cibadmin on server1
Stack: openais
Current DC: NONE
2 Nodes configured, 2 expected votes
6 Resources configured.
============

OFFLINE: [ server2 server1 ]

Looking logfiles on the passive node we could see startup was fin, but at certain point there were error like::

Nov 6 16:40:29 server1 cib [4607]: warning: cib_peer_callback: Discarding cib_replace message (776) from server2: not in our membership
Nov 6 16:40:29 server1 cib[4607]: warning: cib_peer_callback: Discarding cib_apply_diff message (777) from server2: not in our membership

In the active node there ere no strange messages Looking corosync, we checked the nodes were talking to each other with no problems. Both of them were returning the same message:

[[email protected] ]# corosync-objctl | grep member
runtime.totem.pg.mrp.srp.members.200.ip=r(0) ip(10.10.10.10)
runtime.totem.pg.mrp.srp.members.200.join_count=1
runtime.totem.pg.mrp.srp.members.200.status=joined
untime.totem.pg.mrp.srp.members.201.ip=r(0) ip(10.10.10.11)
runtime.totem.pg.mrp.srp.members.201.join_count=1
runtime.totem.pg.mrp.srp.members.201.status=joined

We used tcpdump listenint to corosync port and we checked there was traffic (as it was obvious, but at this point we doubted everything), so it was clear that the problem was in pacemaker and also pretty clear we had no idea what was it. On further investigation we found some links (for instance, this one: http://comments.gmane.org/gmane.linux.highavailability.pacemaker/13185) pointint at this problem as a bug, fixed with this commit https://github.com/ClusterLabs/pacemaker/commit/03f6105592281901cc10550b8ad19af4beb5f72f entering at 1.1.8 version of pacemaker. And ours was 1.1.7. Crap.

Trying to make things safer, instead of upgrading the existing machine, we created another one and installed pacemaker from scratch, following instructions from http://www.clusterlabs.org/rpm-next/ and http://www.clusterlabs.org/wiki/Install. Basically we did:

yum install -y corosync corosynclib.x86_64 corosynclib-devel.x86_64
wget -O /etc/yum.repos.d/pacemaker.repo http://clusterlabs.org/rpm-next/rhel-6/clusterlabs.repo
yum install -y pacemaker cman

We copied our corosync.conf adapting it (just changing nodeid) and started it, and it joined the cluster without any problem:

[[email protected]]# crm status

Last updated: Wed Nov 7 18:14:08 2012
Last change: Wed Nov 7 18:07:01 2012 via cibadmin on balance03
Stack: openais
Current DC: balance03 - partition with quorum
Version: 1.1.8-1.el6-394e906
3 Nodes configured, 3 expected votes
6 Resources configured.

Online: [ server3 server2 ]
OFFLINE: [ server1 ]
(...)

We did a smooth migrate to the new node and everything went well. We upgraded the rest and we got back our high availability.

But the new versions has its inconveniences, because crm, the tool we use to configure pacemaker, now is distributed separatedly, following mantainer’s will. It has become a project by itself with the name crmsh, it has its own web: http://savannah.nongnu.org/projects/crmsh/. The compiled package is available here http://download.opensuse.org/repositories/network:/ha-clustering/ but it has a dependency on pssh package, whom has its own dependencies itself. Bottomline, we did this:


wget http://apt.sw.be/redhat/el6/en/i386/rpmforge/RPMS/pssh-2.0-1.el6.rf.noarch.rpm
rpm -Uvh pssh-2.0-1.el6.rf.noarch.rpm
yum -y install python-dateutil.noarch
yum -y install redhat-rpm-config
wget http://download.opensuse.org/repositories/network:/ha-clustering/CentOS_CentOS-6/x86_64/crmsh-1.2.5-55.3.x86_64.rpm
rpm -Uvh crmsh-1.2.5-55.3.x86_64.rpm

And with that we had everything online again

Troubleshooting cassandra: Saved cluster name XXXX != configured name YYYY

If at anytime you can’t start cassandra, and the logfile show this error:

INFO [SSTableBatchOpen:3] 2012-10-31 16:51:35,669 SSTableReader.java (line 153) Opening /cassandra/data/system/LocationInfo-hd-56 (696 bytes)
ERROR [main] 2012-10-31 16:51:35,717 AbstractCassandraDaemon.java (line 173) Fatal exception during initialization
org.apache.cassandra.config.ConfigurationException: Saved cluster name XXXX != configured name YYYY
at org.apache.cassandra.db.SystemTable.checkHealth(SystemTable.java:299)
at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:169)
at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:356)
at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:107)

The problem is a mismatch between the cluster name in the config file ($CASSANDRA_HOME/conf/cassandra.yaml) and the value LocationInfo[utf8(‘L’)][utf8(‘ClusterName’)] inside the database. To fix this we must either change one or the other.

Changing the config file is obvious, but if we want to change the other one, you need cassandra-cli:

$CASSANDRA_HOME/bin/cassandra-cli -h localhost
use system;
set LocationInfo[utf8('L')][utf8('ClusterName')]=utf8('');
exit;

This can happen when we are trying to move data from a cluster to another (moving from a production environment to a non-production, for instance), or simply if we want to change cassandra cluster name, for whatever reason.

Basic apache cassandra installation (and some fine tune improvements)

Lately I’ve been working a lot with apache Cassandra clusters, so I’ll write some posts about it. I will start with the obvious: cassandra installation. It’s a very simple process that you could make even easier using existing .rpm or .deb. But I’ll do it distro-independent so it can be useful for everybody, no matter what distro you use.

1- Prerequisite: Install Java Virtual Machine

For starters, apache cassandra is a java application, so we will need the java virtual machine (jvm) to get it going. There is different jvm versions (openjdk, for instance), but using Sun / Oracle official version is highly recommended. We can find it in this link: http://www.java.com/en/download/linux_manual.jsp?locale=en, or we could install it using the package manager of each distro (it’s packed at least for the main distros: redhat/centos, debian, ubuntu, etc).

2- Downloading Cassandra

Now we download Cassandra. If it’s our first installation, we probably want the last version, and we can find it in the frontpage of their site (it’s 1.1.6 right now). After that, we unpack it where we want it (should be on /opt but you might disagree about that) and soft-link this directory to “/opt/cassandra“, to make future upgrades easier.

cd /opt
wget http://apache.rediris.es/cassandra/1.1.6/apache-cassandra-1.1.6-bin.tar.gz
tar zxvf apache-cassandra-1.1.6-bin.tar.gz
ln -s apache-cassandra-1.1.6 cassandra

3- Managing cassandra user’s permissions

It’s a best practice to not giving to a service more permissions than it’s needed. So we will create a “cassandra” user, that will be the user running our process. And we will create some directories that cassandra will use while running (logs, pids, etc)

adduser cassandra
mkdir /var/log/cassandra
chown -R cassandra /var/log/cassandra/
mkdir /var/run/cassandra
chown -R cassandra /var/run/cassandra/
mkdir /var/lib/cassandra
chown -R cassandra /var/lib/cassandra/
chown -R cassandra /opt/cassandra/

4- Cassandra as a system service

Now we need to start it as a system service. It does not come a init script with the official distribution, but here we have a standard one:

wget http://www.tomas.cat/blog/sites/default/files/cassandra.initd -O /etc/init.d/cassandra
chmod a+x /etc/init.d/cassandra
chkconfig --add cassandra
chkconfig cassandra on

This will make we could start and stop cassandra as a system service, and cassandra starting with the server boot.

5- Cassandra configuration changes

Right now we have a one-node cassandra cluster. We may want to make some configuration changes, such as cassandra RAM allowance. By default, half the RAM is assigned to java HEAP, and 100MB for each CPU, but you can change that in /opt/cassandra/conf/cassandra-env.sh file. You can also change java cmdline options, JMX port and so on. It pretty self-explaining.

Another file we may be interested in is /opt/cassandra/conf/cassandra.yaml, where we can configure a lot of things, such as where data si saved, or IP address and port we want to listen to. Examples:

  • rpc_address: address where thrift will be listening. We must put a existing IP address (it may be localhost, if we want to), or 0.0.0.0 if we want to listen through all of them
  • data_file_directories: directory where we want to store cassandra sstables
  • commitlog_directory: directory where we want to store commit_log
  • saved_caches_directory: directory where we want to store caches

In case we want to make a cluster, we will be interested in these parameters as well:

  • cluster_name: The name you want for your cluster (it must be the same for all cluster nodes and, MOST IMPORTANTLY, different between different clusters, so they don’t step on each other)
  • initial_token: this nodes token
  • listen_address: Address where gossip will be listening to. It can’t be localhost nor 0.0.0.0, because the rest of nodes will try to connect to this address.
  • seeds: Which server should we ask for a list of cluster nodes

6 – Fine tune

There is some things that are not enabled by default, but they upgrade stability and performance. I always enable them! There they are:

6.1- Fine tune 1 – Batch-certified cassandra init script

I think the former init script has a limitation. As cassandra is a java app, the init script just send the “start” or “stop” signal, and it doesn’t wait to the real startup or shutdown. This is why we can’t have a REAL confirmation of the being started, and if one of our batch processes depends on that, you’re screwed. This is why I added some lines checking with netcat if the thrift port is listening, and so, if cassandra is up and running, before returning to the shell. These are the lines:

First we extract thrift address and port from the config file:

CASS_CLI_ADDRESS=cat $CASS_HOME/conf/cassandra.yaml|grep rpc_address|cut -d":" -f2
CASS_CLI_PORT=cat $CASS_HOME/conf/cassandra.yaml|grep rpc_port|cut -d":" -f2

Then we add this check to the “start” function:

nc -z -w 1 $CASS_CLI_ADDRESS $CASS_CLI_PORT
while [ $? -ne 0 ]; do
[ date -r $CASS_LOG +%s -lt date -d "$TIMEOUT_LOG minutes ago" +%s ] && echo "Too much time of inactivity in the log file. Aborting..." && exit 1 ;
[ $start_date -lt date -d "$TIMEOUT_STARTUP minutes ago" +%s ] && echo "It's taking too long to start. Aborting..." && exit 1;
sleep 10;nc -z -w 1 $CASS_CLI_ADDRESS $CASS_CLI_PORT; done

It’s a little tatty, but it works. When we come back to the shell we know for sure cassandra is running. If you want the “tuned” version, you just do this:

wget http://www.tomas.cat/blog/sites/default/files/cassandra_tuned.initd -O /etc/init.d/cassandra
chmod a+x /etc/init.d/cassandra
chkconfig --add cassandra
chkconfig cassandra on

Keep in mind that this script depends on netcat, so you’ll need to install it to work (“yum install nc“, “apt-get install netcat“, or whatever it is you need to do).

6.2- Fine tune 2 – Enable Java Native Access (JNA) in cassandra

Disk access, among other things, improves a lot if you use OS native libraries instead of java (they are slow and resource-consuming). We can do this installing JNA, that does kind of a bridge between Java and native libraries. Installing is as easy as putting .jar files in /opt/cassandra/lib directory, and cassandra will enable them automatically

wget https://github.com/twall/jna/blob/3.5.1/dist/jna.jar?raw=true -O /opt/cassandra/lib/jna.jar
wget https://github.com/twall/jna/blob/3.5.1/dist/platform.jar?raw=true -O /opt/cassandra/lib/platform.jar

We can make sure cassandra is using it if we take a look to cassandra’s log file (/var/log/cassandra/system.log). If it’s not using JNA, we will find this message:

INFO [main] 2012-09-26 16:52:40,051 CLibrary.java (line 62) JNA not found. Native methods will be disabled.

If it has correctly found JNA, we’ll get this other message:

INFO [main] 2012-10-30 16:41:00,970 CLibrary.java (line 109) JNA mlockall successful

6.3- Fine tune 3 – Raising open files limit for cassandra

If we stress cassandra a little, usually it will more than 1024 open files (the default maximum). To avoid this bringing any problems, we should edit /etc/security/limits.conf file and raise that limit. And advisable value is 65536:

root soft nofile 65536
root hard nofile 65536
cassandra soft nofile 65536
cassandra hard nofile 65536

6.4- Fine tune 4 – Putting a limit on Cassandra memory use

Finally, we all know java and how its memory consumption is. You can configure a number, but it will use more than that. Application memory, heap memory, overhead, etc, we can find a 8GB-configured java process using 16GB or more. To ensure this will not happen to us, its highly advisable to put a limit. In our case, we’ve found wise this number to be half the server memory, in /etc/security/limits.conf file:

root soft memlock 8388608
root hard memlock 8388608
cassandra soft memlock 8388608
cassandra hard memlock 8388608

7 – Checking cassandra is up & running

And we’re done, we just need to start the daemon (“/etc/init.d/cassandra start” or “service cassandra start” or whatever). And we can check it using nodetool (connects to cassandra via JMX) or with cassandra-cli (connects to cassandra via thrift). The standard output will be something like this:

[[email protected] ~]# /opt/cassandra/bin/nodetool -h localhost ring
Note: Ownership information does not include topology, please specify a keyspace.
Address DC Rack Status State Load Owns Token
127.0.0.1 datacenter1 rack1 Up Normal 11,21 KB 100,00% 66508542233540571552076363838168202092
[[email protected] ~]# /opt/cassandra/bin/cassandra-cli -h localhost
Connected to: "Test Cluster" on localhost/9160
Welcome to Cassandra CLI version 1.1.6

Type 'help;' or '?' for help.
Type 'quit;' or 'exit;' to quit.

[[email protected]] describe cluster;
Cluster Information:
Snitch: org.apache.cassandra.locator.SimpleSnitch
Partitioner: org.apache.cassandra.dht.RandomPartitioner
Schema versions:
59adb24e-f3cd-3e02-97f0-5b395827453f: [127.0.0.1]

[[email protected]] quit;
[[email protected] ~]#

And that’s it!! We have a basic cassandra installation a little tuned.

How to solve SMTP-AUTH errors on Postfix (or any other mailserver) behind a Cisco PIX firewall

You have configured authentication on outgoing mail (SMTP-AUTH) on your mailserver (postfix, in this case) and it works great. But when you put it on production, the users complain because they can’t send emails.

What do you do? You try to follow the communication step by step. That is, you telnet to port 25 and follow step by step the authentication. The conversation goes like this (the lines beginning with “->” are written by me, without the “->” part):

[email protected]:~$ telnet smtp.example.com 25
Trying 1.2.3.4...
Connected to smtp.example.com.
Escape character is '^]'.
220 smtp.example.com ESMTP server ready
-> EHLO example.com
250-smtp.example.com
250 AUTH CRAM-MD5 DIGEST-MD5
-> AUTH FOOBAR
504 Unrecognized authentication type.
-> AUTH CRAM-MD5
334 PENCeUxFREJoU0NnbmhNWitOMjNGNndAZWx3b29kLmlubm9zb2Z0LmNvbT4=
-> ZnJlZCA5ZTk1YWVlMDljNDBhZjJiODRhMGMyYjNiYmFlNzg2ZQ==
235 Authentication successful.


Everything is fine… Then you tell the customer to do the same, but he says he can’t see the line “220 smtp.example.com ESMTP server ready”, and he only sees a bunch of asterisks. You try it yourself, and it’s true… Plus, it doesn’t recognise the AUTH command!!


[email protected]:~$ telnet smtp.example.com 25
Trying 1.2.3.4...
Connected to smtp.example.com.
Escape character is '^]'.
220*******************************************************0*2******0***********************
2002*******2***0*00
-> EHLO example.com
250-smtp.example.com
250 AUTH CRAM-MD5 DIGEST-MD5
-> AUTH FOOBAR
500 5.5.2 Error: bad syntax
-> AUTH CRAM-MD5
500 5.5.2 Error: bad syntax

What’s going on? Why this difference? It seems the answer is pretty simple…

Cisco Systems puts on every PIX router a protocol to avoid attacks and increase security. These protocols intercept every command sent to the server ant translate them, as a proxy. This protocol is called MailGuard, and only accept basic SMTP command, not the extended ESMTP ones, making it incompatible with SMTP-AUTH. So the only way to make SMTP-AUTH work is to disable it.

That’s an easy thing to do, because the command is pretty simple, connected to the PIX telnet port:

no fixup protocol smtp 25
write mem

The difficult part is to realise someone is messing around communications… But once you’ve discovered it, problem solved!

Using Kubuntu 9.10, or any linux distribution, from a USB device

A year ago we explained how to use Kubuntu 8.10 from a USB device. Now, a year later, I was in the same situation, but with Kubuntu 9.10 instead. And since then we’ve made some steps.

Ubuntu itself brings an application to make a USB device boots Ubuntu. In the panel there is the application “K-> Applications -> System -> USB Startup Disk Creator” (o also with /usr/bin/usb-creator-kde command), which is very simple

We choose the ISO file, we choose the USB drive, and we click “Make a boot disk”. We wait a while and then we can boot from this pendrive.

But this only works for Ubuntu… What about the other distros? Well, there is the application unetbootin, which does the same with all linux distributions (wel, at least with a bunch of them).

Here we choose distribution, the ISO file, drive to write, and it handles everything.

Obviously, this ony works from linux. If you want to do the same from windows, you should ask for advise to our friends at pendrivelinux