Varnish basic configuration with http cache and stale-while

In high-demand environments, we can reach the point where the number of PHP queries (or CGI queries) we want to serve through apache httpd is higher than our servers can handle. To solve that we can do the simplest thing: adding more servers, lowering this way the servers load (the queries are spreaded along more servers). But the simplest way isn’t necesarily proficient. Instead of distributing the load, can our servers handle more queries?

Of course. We can speed up the PHP (or general CGI) processing with FastCGI. We can also make our http server faster, exchanging it for a lighter one, nginx for instance. We can approach the problem from other perspective, which we will discuss here: mantaining a cache where we store content, instead of processing it each time, avoiding CPU time and speeding up the time it takes to serve it. We will do that using varnish .

Maintaining a cache is a delicate matter because you should look for a lot of things. You shouldn’t cache a page if it has cookies involved, for instance, or if the http query is a POST. But all of this is app-related. Developers should be able to say what is safe to cache and what is not, and sysadmins should take this decisions to the servers. So we will assume we start from scratch, we have nothing in the varnish cache, and we want to begin with a particular URL, which we know implies no risk. We will do that here, caching just one URL.

For our tests we will use a simple PHP file. It takes 10 seconds to return the result, and it has a header expiring after 5 seconds. We will name it sleep.php:

If we query it, we can check it do take 10 seconds to return:

$ curl http://localhost/sleep.php -w %{time_total}
10,001

The first thing we should do is to install varnish with our package manager (apt-get install varnish, yum install varnish, whatever). After that we want varnish listening in port 80, instead of apache. So we move apache to 8080 for instance (“Listen: ” directive), and then varnish to 80 (VARNISH_LISTEN_PORT= directive, usually in /etc/default/varnish or /etc/sysconfig/varnish, depends on your distro). We also need to tell varnish the servers it will have behind, to forward the queries (backend servers). For that we will create /etc/varnish/default.vcl file with the following contents:


backend default {
.host = "127.0.0.1";
.port = "8080";
}

With all this we restart apache and varnish, and we check they are running:


$ curl http://localhost/sleep.php -IXGET
HTTP/1.1 200 OK
Server: Apache/2.2.22 (Ubuntu)
X-Powered-By: PHP/5.3.10-1ubuntu3.4
Cache-control: max-age=5, must-revalidate
Vary: Accept-Encoding
Content-Type: text/html
Transfer-Encoding: chunked
Date: Fri, 30 Nov 2012 13:56:33 GMT
X-Varnish: 1538615861
Age: 0
Via: 1.1 varnish
Connection: keep-alive

$ curl http://localhost:8080/sleep.php -IXGET
HTTP/1.1 200 OK
Date: Fri, 30 Nov 2012 13:56:59 GMT
Server: Apache/2.2.22 (Ubuntu)
X-Powered-By: PHP/5.3.10-1ubuntu3.4
Cache-control: max-age=5, must-revalidate
Vary: Accept-Encoding
Content-Length: 0
Content-Type: text/html

We can see different headers in each query. When we query varnish there are “Via: 1.1 varnish” and “Age: 0”, among others apache doesn’t show. If we have it like this, we have our baseline.

The default behaviour is to cache everything:

$ curl http://localhost/sleep.php -w %{time_total}
10,002
$ curl http://localhost/sleep.php -w %{time_total}
0,001

But we don’t want to cache everything, just a particular URL, avoiding cache of cookies and things like that. So we will change sub vcl_recv to not cache anything, adding this to the file /etc/varnish/default.vcl:

sub vcl_recv {
return(pass);
}

We check it:

$ curl http://localhost/sleep.php -w %{time_total}
10,002
$ curl http://localhost/sleep.php -w %{time_total}
10,001

Now we cache just sleep.php, adding this to default.vcl:

sub vcl_recv {
if (req.url == "/sleep.php")
{
return(lookup);
}
else
{
return(pass);
}
}

We can check it:

$ cp /var/www/sleep.php /var/www/sleep2.php
$ curl http://localhost/sleep.php -w %{time_total}
10,002
$ curl http://localhost/sleep.php -w %{time_total}
0,001
$ curl http://localhost/sleep2.php -w %{time_total}
10,002
$ curl http://localhost/sleep2.php -w %{time_total}
10,001

Also we check the “Age:” header is increasing, and when it reaches 5 (max-age we wrote), it takes 10 seconds again:

$ curl http://localhost/sleep.php -IXGET -w %{time_total}
HTTP/1.1 200 OK
Server: Apache/2.2.22 (Ubuntu)
X-Powered-By: PHP/5.3.10-1ubuntu3.4
Cache-control: max-age=5, must-revalidate
Vary: Accept-Encoding
Content-Type: text/html
Transfer-Encoding: chunked
Date: Mon, 03 Dec 2012 10:53:54 GMT
X-Varnish: 500945303
Age: 0
Via: 1.1 varnish
Connection: keep-alive

10,002
$ curl http://localhost/sleep.php -IXGET -w %{time_total}
HTTP/1.1 200 OK
Server: Apache/2.2.22 (Ubuntu)
X-Powered-By: PHP/5.3.10-1ubuntu3.4
Cache-control: max-age=5, must-revalidate
Vary: Accept-Encoding
Content-Type: text/html
Transfer-Encoding: chunked
Date: Mon, 03 Dec 2012 10:53:56 GMT
X-Varnish: 500945305 500945303
Age: 2
Via: 1.1 varnish
Connection: keep-alive

0,001
$ curl http://localhost/sleep.php -IXGET -w %{time_total}
HTTP/1.1 200 OK
Server: Apache/2.2.22 (Ubuntu)
X-Powered-By: PHP/5.3.10-1ubuntu3.4
Cache-control: max-age=5, must-revalidate
Vary: Accept-Encoding
Content-Type: text/html
Transfer-Encoding: chunked
Date: Mon, 03 Dec 2012 10:53:59 GMT
X-Varnish: 500945309 500945303
Age: 5
Via: 1.1 varnish
Connection: keep-alive

0,001
$ curl http://localhost/sleep.php -IXGET -w %{time_total}
HTTP/1.1 200 OK
Server: Apache/2.2.22 (Ubuntu)
X-Powered-By: PHP/5.3.10-1ubuntu3.4
Cache-control: max-age=5, must-revalidate
Vary: Accept-Encoding
Content-Type: text/html
Transfer-Encoding: chunked
Date: Mon, 03 Dec 2012 10:54:09 GMT
X-Varnish: 500945310
Age: 0
Via: 1.1 varnish
Connection: keep-alive

10,002

We can see when the content expires, varnish ask for it agains and it takes 10 seconds. But what happens during this time? The rest of queries must wait, too? No, they don’t. There is a 10 seconds grace period, and during this period varnish will keep serving the old content (stale content). We can check it if we run two curl at the same time, and we will see one of them stops while the other keeps serving content fast, with the header “Age” above the 5 seconds we assigned:

$ while :;do curl http://localhost/sleep.php -IXGET;sleep 1;done
(...)
HTTP/1.1 200 OK
Server: Apache/2.2.22 (Ubuntu)
X-Powered-By: PHP/5.3.10-1ubuntu3.4
Cache-control: max-age=5, must-revalidate
Vary: Accept-Encoding
Content-Type: text/html
Transfer-Encoding: chunked
Date: Mon, 03 Dec 2012 11:16:29 GMT
X-Varnish: 500952300 500952287
Age: 8
Via: 1.1 varnish
Connection: keep-alive

We can also check it with siege, with two concurrent users, and we will see for a while just one of the threads, while the other is stopped, waiting for the content.:

$ siege -t 30s -c 2 -d 1 localhost/sleep.php

If we think 10 seconds is a low value, we can change it with the beresp.grace directive, in the sub vcl_fetch in default.vcl file. We can set a minute, for instance:

sub vcl_fetch {
set beresp.grace = 60s;
}

What if the backend server is down? Will it keep serving stale content? Not as we have it right now. Because varnish has no way of knowing if a backend server is healthy or not, so it will consider all servers healthy. So, if the server is down, and the content expires, it will give error 503:

$ $ sudo /etc/init.d/apache2 stop
[sudo] password:
* Stopping web server apache2 apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.1.1 for ServerName
... waiting [OK]
$ sudo /etc/init.d/apache2 status
Apache2 is NOT running.
$ while :;do curl http://localhost/sleep.php -IXGET;sleep 1;done
(...)
HTTP/1.1 200 OK
Server: Apache/2.2.22 (Ubuntu)
X-Powered-By: PHP/5.3.10-1ubuntu3.4
Cache-control: max-age=5, must-revalidate
Vary: Accept-Encoding
Content-Type: text/html
Transfer-Encoding: chunked
Date: Fri, 30 Nov 2012 14:19:15 GMT
X-Varnish: 1538616905 1538616860
Age: 5
Via: 1.1 varnish
Connection: keep-alive

HTTP/1.1 503 Service Unavailable
Server: Varnish
Content-Type: text/html; charset=utf-8
Retry-After: 5
Content-Length: 419
Accept-Ranges: bytes
Date: Fri, 30 Nov 2012 14:19:15 GMT
X-Varnish: 1538616906
Age: 0
Via: 1.1 varnish
Connection: close

To make the grace period apply in this situation, we just need to tell varnish how should it check if apache is up or down (healthy), just setting “probe” directive in the backend:

backend default {
.host = "127.0.0.1";
.port = "8080";
.probe = {
.url = "/";
.timeout = 100 ms;
.interval = 1s;
.window = 10;
.threshold = 8;
}
}

This way it keeps serving stale content when the backend is down, and it will keep serving it until it comes up and varnish can ask for the content again.

Testing with siege and curl, we can see there is always a thread that is “screwed”. The first time varnish finds an expired content, it asks for it to the backend and waits for the answer. Meanwhile, the rest of the threads will get the stale content, but this thread is “screwed”. The same thing happens when the server is down. There is a lot of literature trying to avoid this, you can read a lot about it, but bottomline: there is no way to avoid it. It just happens. One thread must be sacrificed.

Until now we are covering two scenarios where we will keep serving stale content:
– There is no backend server available, so we serve stale content.
– There is backends available, and a thread has asked for new content. While this content comes from the backend, varnish keeps serving stale content to the rest of the threads.

What if we want these two scenarios to have different timeouts? For instance, we could need the stale content to stop serving after certaing time (could be minutes). After this time, we stop and wait for the backend answer, forcing the content to be fresh. But at the same sime we could need to serve stale content when the servers are down (so there’s no way to get fresh content), because normally that’s better than serve a 503 error page. This can be configured at sub vcl_recv in default.vcl file, this way:

sub vcl_recv {
if (req.backend.healthy) {
set req.grace = 30s;
} else {
set req.grace = 1h;
}
}

sub vcl_fetch {
set beresp.grace = 1h;
}

Per tant, el nostre fitxer default.vcl complet tindra el seguent contingut:

$ cat /etc/varnish/default.vcl
backend default {
.host = "127.0.0.1";
.port = "8080";
.probe = {
.url = "/";
.timeout = 100 ms;
.interval = 1s;
.window = 10;
.threshold = 8;
}
}

sub vcl_recv {
if (req.backend.healthy) {
set req.grace = 30s;
} else {
set req.grace = 1h;
}
if (req.url == "/sleep.php")
{
return(lookup);
}
else
{
return(pass);
}
}
sub vcl_fetch {
set beresp.grace = 30s;
}

Troubleshooting pacemaker: Pacemaker IP doesn’t appear in ifconfig

People who are used to manage our network devices through ifconfig, we find some trouble when putting a virtual IP in pacemaker, because we can’t see it. If we type ip addr there it is, but not with ifconfig. To make it visible we just need to use iflabel=”label” option, and we will see the IP, and with “ip addr” we will now fast which is the server IP and which is the service IP in pacemaker:

#crm configure show
(...)
primitive IP_VIRTUAL ocf:heartbeat:IPaddr2
params ip="10.0.0.11" cidr_netmask="32" iflabel="IP_VIRTUAL"
op monitor interval="3s"
meta target-role="Started"
(...)

IMPORTANT: Device label accept only 10 characters. If we put more than 10, pacemaker won’t be able to start the virtual IP and will fail (this gave me some headaches :D). Make sure you put 10 characters at most.

Without iflabel it doesn’t appear in ifconfig and isn’t labeled in ip addr:

# ip addr
1: lo: mtu 16436 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:50:56:9e:3c:94 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.1/24 brd 10.0.0.255 scope global eth0
inet 10.0.0.11/32 brd 10.0.0.11 scope global eth0
inet 10.0.0.12/32 brd 10.04.0.12 scope global eth0
inet 10.0.0.13/32 brd 10.0.0.13 scope global eth0
inet 10.0.0.14/32 brd 10.0.0.14 scope global eth0
inet6 fe80::250:56ff:fe9e:3c94/64 scope link
valid_lft forever preferred_lft forever
# ifconfig
eth0 Link encap:Ethernet HWaddr 00:50:56:9E:3C:94
inet addr:10.0.0.1 Bcast:10.10.0.255 Mask:255.255.255.0
inet6 addr: fe80::250:56ff:fe9e:3c94/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1825681745 errors:0 dropped:0 overruns:0 frame:0
TX packets:2044189443 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:576307237739 (536.7 GiB) TX bytes:605505888813 (563.9 GiB)

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:924190306 errors:0 dropped:0 overruns:0 frame:0
TX packets:924190306 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:415970933288 (387.4 GiB) TX bytes:415970933288 (387.4 GiB)


However, if we use iflabel, there they are:

[[email protected] ~]# ip addr
1: lo: mtu 16436 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth1: mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:50:56:9e:3c:9c brd ff:ff:ff:ff:ff:ff
inet 10.0.0.1./24 brd 10.254.1.255 scope global eth1
inet 10.0.0.11/32 brd 10.0.0.11 scope global eth1:nginx-ncnp
inet 10.0.0.12/32 brd 10.0.0.12 scope global eth1:nginx-clnp
inet 10.0.0.13/32 brd 10.0.0.13 scope global eth1:hap-ncnp
inet 10.0.0.14/32 brd 10.254.1.14 scope global eth1:hap-clnp
inet6 fe80::250:56ff:fe9e:3c9c/64 scope link
valid_lft forever preferred_lft forever
[[email protected] ~]# ifconfig
eth1 Link encap:Ethernet HWaddr 00:50:56:9E:3C:9C
inet addr:10.0.0.1 Bcast:10.0.0.255 Mask:255.255.255.0
inet6 addr: fe80::250:56ff:fe9e:3c9c/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:322545491 errors:0 dropped:0 overruns:0 frame:0
TX packets:333825895 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:92667389749 (86.3 GiB) TX bytes:93365772607 (86.9 GiB)

eth1:hap-clnp Link encap:Ethernet HWaddr 00:50:56:9E:3C:9C
inet addr:10.0.0.12 Bcast:10.254.1.52 Mask:255.255.255.255
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

eth1:hap-ncnp Link encap:Ethernet HWaddr 00:50:56:9E:3C:9C
inet addr:10.0.0.11 Bcast:10.254.1.51 Mask:255.255.255.255
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

eth1:nginx-clnp Link encap:Ethernet HWaddr 00:50:56:9E:3C:9C
inet addr:10.0.0.13 Bcast:10.254.1.32 Mask:255.255.255.255
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

eth1:nginx-ncnp Link encap:Ethernet HWaddr 00:50:56:9E:3C:9C
inet addr:10.0.0.14 Bcast:10.254.1.30 Mask:255.255.255.255
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:4073 errors:0 dropped:0 overruns:0 frame:0
TX packets:4073 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1136055 (1.0 MiB) TX bytes:1136055 (1.0 MiB)

Much better this way :)

More info here: http://linux.die.net/man/7/ocf_heartbeat_ipaddr2

Installing graphite 0.9.10 on debian squeeze

The graphing system I like the most is graphite. It’s very useful for a lot of things (fast, scalable, low resources consuming, etc, etc). Today I’ll explain how to install graphite 0.9.10 on Debian squeeze.

First of all, we install the requirements:

apt-get install python apache2 python-twisted python-memcache libapache2-mod-python python-django libpixman-1-0 python-cairo python-django-tagging

And then we make sure we don’t have installed whisper from debian’s repository, which is old and may have incompatibilities with the last version of graphite:

apt-get remove python-whisper

Then we install the application. I’ve build some .deb packages that can be used directly:

wget http://www.tomas.cat/blog/sites/default/files/python-carbon_0.9.10_all.deb
wget http://www.tomas.cat/blog/sites/default/files/python-graphite-web_0.9.10_all.deb
wget http://www.tomas.cat/blog/sites/default/files/python-whisper_0.9.10_all.deb
dpkg -i python-carbon_0.9.10_all.deb python-graphite-web_0.9.10_all.deb python-whisper_0.9.10_all.deb

But if you don’t like mine, it’s ease to make them yourself with fpm (Effing package managers), a ruby app to build packages for different package managers. First we install ruby and fpm:

apt-get install ruby rubygems
gem install fpm

Then we download graphite and we untar it:

wget http://pypi.python.org/packages/source/c/carbon/carbon-0.9.10.tar.gz#md5=1d85d91fe220ec69c0db3037359b691a
wget http://pypi.python.org/packages/source/w/whisper/whisper-0.9.10.tar.gz#md5=218aadafcc0a606f269b1b91b42bde3f
wget http://pypi.python.org/packages/source/g/graphite-web/graphite-web-0.9.10.tar.gz#md5=b6d743a254d208874ceeff0a53e825c1
tar zxf graphite-web-0.9.10.tar.gz
tar zxf carbon-0.9.10.tar.gz
tar zxf whisper-0.9.10.tar.gz

Finally we build the packages and install them:

/var/lib/gems/1.8/gems/fpm-0.4.22/bin/fpm --python-install-bin /opt/graphite/bin -s python -t deb carbon-0.9.10/setup.py
/var/lib/gems/1.8/gems/fpm-0.4.22/bin/fpm --python-install-bin /opt/graphite/bin -s python -t deb whisper-0.9.10/setup.py
/var/lib/gems/1.8/gems/fpm-0.4.22/bin/fpm --python-install-lib /opt/graphite/webapp -s python -t deb graphite-web-0.9.10/setup.py
dpkg -i python-carbon_0.9.10_all.deb python-graphite-web_0.9.10_all.deb python-whisper_0.9.10_all.deb

We have graphite app installed. Whisper doesn’t need any configuration.Carbon does, but we can go with the default config files:

cp /opt/graphite/conf/carbon.conf.example /opt/graphite/conf/carbon.conf
cp /opt/graphite/conf/storage-schemas.conf.example /opt/graphite/conf/storage-schemas.conf

This storache-schemas.conf stores data every minute for a day. As it’s very likely that we need to store data longer (a month, a year…), my storache-schemas.conf looks like that:

[default_1min_for_1month_15min_for_2years]
pattern = .*
retentions = 60s:30d,15m:2y

This way data is stored every minute for 30 days, and every 15 minutes for 2 years. This makes each graph data size 1,4MB, something reasonable. You can play with these numbers if you need more time or you want to store less space on disk (it’s pretty intuitive).

After that we need to initialize the database:

cd /opt/graphite/webapp/graphite
sudo python manage.py syncdb

And now we could start carbon to begin to collect data, executing:

cd /opt/graphite/
./bin/carbon-cache.py start

But we also want the service to start with the machine, so we need to add it to init.d. As there is no init file with the application, I downloaded an init.d file for graphite in RedHat, and I did little changes to make it work in Debian:

#!/bin/bash
#
# Carbon (part of Graphite)
#
# chkconfig: 3 50 50
# description: Carbon init.d

. /lib/lsb/init-functions
prog=carbon
RETVAL=0

start() {
log_progress_msg "Starting $prog: "

PYTHONPATH=/usr/local/lib/python2.6/dist-packages/ /opt/graphite/bin/carbon-cache.py start
status=$?
log_end_msg $status
}

stop() {
log_progress_msg "Stopping $prog: "

PYTHONPATH=/usr/local/lib/python2.6/dist-packages/ /opt/graphite/bin/carbon-cache.py stop > /dev/null 2>&1
status=$?
log_end_msg $status
}

# See how we were called.
case "$1" in
start)
start
;;
stop)
stop
;;
status)
PYTHONPATH=/usr/local/lib/python2.6/dist-packages/ /opt/graphite/bin/carbon-cache.py status
RETVAL=$?
;;
restart)
stop
start
;;
*)
echo $"Usage: $prog {start|stop|restart|status}"
exit 1
esac

exit $RETVAL

To install it we just need to put it where it belongs:

wget http://www.tomas.cat/blog/sites/default/files/carbon.initd -O /etc/init.d/carbon
chmod 0755 /etc/init.d/carbon
chkconfig --add carbon

Now we can start it from initd (service carbon start or also /etc/init.d/carbon start). Finally, we configure the webapp to access the data. We create an apache virtualhost with this content:


ServerName YOUR_SERVERNAME_HERE
DocumentRoot "/opt/graphite/webapp"
ErrorLog /opt/graphite/storage/log/webapp/error.log
CustomLog /opt/graphite/storage/log/webapp/access.log common


SetHandler python-program
PythonPath "['/opt/graphite/webapp'] + sys.path"
PythonHandler django.core.handlers.modpython
SetEnv DJANGO_SETTINGS_MODULE graphite.settings
PythonDebug Off
PythonAutoReload Off

Alias /content/ /opt/graphite/webapp/content/

SetHandler None


We add the virtualhost and allow apache user to access whisper data:

wget http://www.tomas.cat/blog/sites/default/files/graphite-vhost.txt -O /etc/apache2/sites-available/graphite
a2ensite graphite
chown -R www-data:www-data /opt/graphite/storage/
/etc/init.d/apache reload

And that’s it! One last detail… graphite comes with Los Angeles timezone. In order to chang it, we need to set “TIME_ZONE” variable in /opt/graphite/webapp/graphite/local_settings.py file. There is a file with lots of variables in /opt/graphite/webapp/graphite/local_settings.py.example, but as I just need to change the timezone, I run this command:

echo "TIME_ZONE = 'Europe/Madrid'" > /opt/graphite/webapp/graphite/local_settings.py

And with that we have everything. Now we just need to send data to carbon (port 2003) to be stored in whisper si graphite webapp can show it. Have fun!

Bibliography: I’ve followed official documentation http://graphite.wikidot.com/installation and http://graphite.wikidot.com/quickstart-guide, but debian specifics were at http://slacklabs.be/2012/04/05/Installing-graphite-on-debian/

Troubleshooting pacemaker: Discarding cib_apply_diff message (xxx) from server2: not in our membership

Yesterday, when we rebooted one of our high availability servers (to update the goddamn vmware-tools, screwing my precious uptime), we faced a serious problem: pacemaker was not syncing, so we had lost high availability. The active node could see the cluster as if there was no problem:

[[email protected]]# crm status
============
Last updated: Wed Nov 7 12:36:01 2012
Last change: Tue Nov 6 18:33:15 2012 via crmd on server2
Stack: openais
Current DC: server2 - partition with quorum
Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14
2 Nodes configured, 2 expected votes
6 Resources configured.
============

Online: [ server2 server1 ]
(...)

Passive node, instead, was seeing the cluster as if all nodes, including itself, were off-line:

[[email protected] ]# crm status
============
Last updated: Wed Nov 7 12:36:27 2012
Last change: Wed Nov 7 12:35:57 2012 via cibadmin on server1
Stack: openais
Current DC: NONE
2 Nodes configured, 2 expected votes
6 Resources configured.
============

OFFLINE: [ server2 server1 ]

Looking logfiles on the passive node we could see startup was fin, but at certain point there were error like::

Nov 6 16:40:29 server1 cib [4607]: warning: cib_peer_callback: Discarding cib_replace message (776) from server2: not in our membership
Nov 6 16:40:29 server1 cib[4607]: warning: cib_peer_callback: Discarding cib_apply_diff message (777) from server2: not in our membership

In the active node there ere no strange messages Looking corosync, we checked the nodes were talking to each other with no problems. Both of them were returning the same message:

[[email protected] ]# corosync-objctl | grep member
runtime.totem.pg.mrp.srp.members.200.ip=r(0) ip(10.10.10.10)
runtime.totem.pg.mrp.srp.members.200.join_count=1
runtime.totem.pg.mrp.srp.members.200.status=joined
untime.totem.pg.mrp.srp.members.201.ip=r(0) ip(10.10.10.11)
runtime.totem.pg.mrp.srp.members.201.join_count=1
runtime.totem.pg.mrp.srp.members.201.status=joined

We used tcpdump listenint to corosync port and we checked there was traffic (as it was obvious, but at this point we doubted everything), so it was clear that the problem was in pacemaker and also pretty clear we had no idea what was it. On further investigation we found some links (for instance, this one: http://comments.gmane.org/gmane.linux.highavailability.pacemaker/13185) pointint at this problem as a bug, fixed with this commit https://github.com/ClusterLabs/pacemaker/commit/03f6105592281901cc10550b8ad19af4beb5f72f entering at 1.1.8 version of pacemaker. And ours was 1.1.7. Crap.

Trying to make things safer, instead of upgrading the existing machine, we created another one and installed pacemaker from scratch, following instructions from http://www.clusterlabs.org/rpm-next/ and http://www.clusterlabs.org/wiki/Install. Basically we did:

yum install -y corosync corosynclib.x86_64 corosynclib-devel.x86_64
wget -O /etc/yum.repos.d/pacemaker.repo http://clusterlabs.org/rpm-next/rhel-6/clusterlabs.repo
yum install -y pacemaker cman

We copied our corosync.conf adapting it (just changing nodeid) and started it, and it joined the cluster without any problem:

[[email protected]]# crm status

Last updated: Wed Nov 7 18:14:08 2012
Last change: Wed Nov 7 18:07:01 2012 via cibadmin on balance03
Stack: openais
Current DC: balance03 - partition with quorum
Version: 1.1.8-1.el6-394e906
3 Nodes configured, 3 expected votes
6 Resources configured.

Online: [ server3 server2 ]
OFFLINE: [ server1 ]
(...)

We did a smooth migrate to the new node and everything went well. We upgraded the rest and we got back our high availability.

But the new versions has its inconveniences, because crm, the tool we use to configure pacemaker, now is distributed separatedly, following mantainer’s will. It has become a project by itself with the name crmsh, it has its own web: http://savannah.nongnu.org/projects/crmsh/. The compiled package is available here http://download.opensuse.org/repositories/network:/ha-clustering/ but it has a dependency on pssh package, whom has its own dependencies itself. Bottomline, we did this:


wget http://apt.sw.be/redhat/el6/en/i386/rpmforge/RPMS/pssh-2.0-1.el6.rf.noarch.rpm
rpm -Uvh pssh-2.0-1.el6.rf.noarch.rpm
yum -y install python-dateutil.noarch
yum -y install redhat-rpm-config
wget http://download.opensuse.org/repositories/network:/ha-clustering/CentOS_CentOS-6/x86_64/crmsh-1.2.5-55.3.x86_64.rpm
rpm -Uvh crmsh-1.2.5-55.3.x86_64.rpm

And with that we had everything online again

Troubleshooting cassandra: Saved cluster name XXXX != configured name YYYY

If at anytime you can’t start cassandra, and the logfile show this error:

INFO [SSTableBatchOpen:3] 2012-10-31 16:51:35,669 SSTableReader.java (line 153) Opening /cassandra/data/system/LocationInfo-hd-56 (696 bytes)
ERROR [main] 2012-10-31 16:51:35,717 AbstractCassandraDaemon.java (line 173) Fatal exception during initialization
org.apache.cassandra.config.ConfigurationException: Saved cluster name XXXX != configured name YYYY
at org.apache.cassandra.db.SystemTable.checkHealth(SystemTable.java:299)
at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:169)
at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:356)
at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:107)

The problem is a mismatch between the cluster name in the config file ($CASSANDRA_HOME/conf/cassandra.yaml) and the value LocationInfo[utf8(‘L’)][utf8(‘ClusterName’)] inside the database. To fix this we must either change one or the other.

Changing the config file is obvious, but if we want to change the other one, you need cassandra-cli:

$CASSANDRA_HOME/bin/cassandra-cli -h localhost
use system;
set LocationInfo[utf8('L')][utf8('ClusterName')]=utf8('');
exit;

This can happen when we are trying to move data from a cluster to another (moving from a production environment to a non-production, for instance), or simply if we want to change cassandra cluster name, for whatever reason.