load balancing – Not So Frequently Asked Questions

En entorns d’alta demanda arriba un moment on les peticions php (o similar, CGI en general) que volem servir a traves del nostre apache httpd son més de les que el nostre servidor pot processar. Per solucionar-ho podem fer el més senzill, que és afegir més servidors a la granja de balanceig i disminuir així la càrrega (es reparteixen les peticions entre més servidors). Però el més senzill no necessariament és el més eficient. En comptes de repartir la càrrega entre més servidors, no podriem fer que cada servidor pogués servir més peticions?

I tant. D’una banda podem fer més ràpid el processament dels PHP (o similar, CGI en general) amb FastCGI. D’altra banda podem fer que el nostre servidor HTTP sigui més ràpid, canviant-lo per un de mes lleuger, com per exemple nginx. Una altra aproximació, de la que parlarem aquí, es tracta de mantenir una cache en memòria dels continguts, en comptes de processar-los cada vegada, evitant així el consum de CPU i accelerant enormement el temps que es triga a servir-los. Ho farem fent servir varnish .

Mantenir una cache és força delicat perquè s’han de tenir en compte moltes coses. No s’hauria de fer cache si hi ha cookies pel mig, per exemple, o si hi la petició http és de tipus POST. Però això es una cosa a avaluar dintre de cada aplicació, els desenvolupadors han de poder indicar què es pot guardar a la cache i què no, i els administradors configurar els servidors en conseqüencia.Aleshores, suposem que comencem de zero, que no tenim res en la cache del varnish, i volem començar posant-hi una URL en concret, de la que sabem que no te risc. Això farem aquí, caché selectiva d’una sola URL.

Per il·lustrar la nostra demostració, farem servir un fitxer PHP senzill que triga 10 segons a tornar el resultat, i que té una capçalera que el fa caducar en 5 segons. L’anomenarem sleep.php:

<?php
header("Cache-control: max-age=5, must-revalidate");
sleep(10);
?>

<?php

header("Cache-control: max-age=5, must-revalidate");

sleep(10);

Si el demanem comprovarem que efectivament triga 10 segons a servir-se:

$ curl http://localhost/sleep.php -w %{time_total}
10,001

1 2	$ curl http://localhost/sleep.php -w %{time_total} 10,001

El primer que hem de fer es instal·lar varnish amb el nostre gestor de paquets (apt-get install varnish, yum install varnish, el que sigui). Després voldrem que sigui el varnish qui escolti pel port 80 en comptes de l’apache. Per això mourem el apache de port (directiva “Listen: “), i el posarem al 8080, per exemple. Després canviarem el port del varnish (directiva VARNISH_LISTEN_PORT= , normalment esta al fitxer /etc/default/varnish o /etc/sysconfig/varnish, depenent de la distro). Necessitem dir-li al varnish quins servidors tindrà darrera, als que ha de reenviar les peticions (els servidors backend). Per això crearem el fitxer /etc/varnish/default.vcl amb el següent contingut:

backend default {
.host = "127.0.0.1";
.port = "8080";
}

backend default {

.host = "127.0.0.1";

.port = "8080";

}

Amb tot això reiniciem tant el apache com el varnish, i podrem comprobar que estan ambdos en marxa:

$ curl http://localhost/sleep.php -IXGET
HTTP/1.1 200 OK
Server: Apache/2.2.22 (Ubuntu)
X-Powered-By: PHP/5.3.10-1ubuntu3.4
Cache-control: max-age=5, must-revalidate
Vary: Accept-Encoding
Content-Type: text/html
Transfer-Encoding: chunked
Date: Fri, 30 Nov 2012 13:56:33 GMT
X-Varnish: 1538615861
Age: 0
Via: 1.1 varnish
Connection: keep-alive

$ curl http://localhost:8080/sleep.php -IXGET
HTTP/1.1 200 OK
Date: Fri, 30 Nov 2012 13:56:59 GMT
Server: Apache/2.2.22 (Ubuntu)
X-Powered-By: PHP/5.3.10-1ubuntu3.4
Cache-control: max-age=5, must-revalidate
Vary: Accept-Encoding
Content-Length: 0
Content-Type: text/html

$ curl http://localhost/sleep.php -IXGET

HTTP/1.1 200 OK

Server: Apache/2.2.22 (Ubuntu)

X-Powered-By: PHP/5.3.10-1ubuntu3.4

Cache-control: max-age=5, must-revalidate

Vary: Accept-Encoding

Content-Type: text/html

Transfer-Encoding: chunked

Date: Fri, 30 Nov 2012 13:56:33 GMT

X-Varnish: 1538615861

Age: 0

Via: 1.1 varnish

Connection: keep-alive

$ curl http://localhost:8080/sleep.php -IXGET

HTTP/1.1 200 OK

Date: Fri, 30 Nov 2012 13:56:59 GMT

Server: Apache/2.2.22 (Ubuntu)

X-Powered-By: PHP/5.3.10-1ubuntu3.4

Cache-control: max-age=5, must-revalidate

Vary: Accept-Encoding

Content-Length: 0

Content-Type: text/html

Veiem que les capçaleres que retornen un i l’altre són diferents. Quan demanem al varnish apareixen “Via: 1.1 varnish” i “Age: 0”, entre d’altres que amb l’apache no apareixen. Si ho tenim així, ja tenim la base configurada.

El comportament que tindrem ara mateix es de fer cache de tot.

$ curl http://localhost/sleep.php -w %{time_total}
10,002
$ curl http://localhost/sleep.php -w %{time_total}
0,001

$ curl http://localhost/sleep.php -w %{time_total}

10,002

$ curl http://localhost/sleep.php -w %{time_total}

0,001

Però com que volem ser selectius i no volem fer cache de tot, sino de algunes URL en concret, evitant que es faci cache de cookies i coses semblants, canviarem el comportament de la rutina sub vcl_recv perque no faci cache, afegint això al fitxer /etc/varnish/default.vcl:

sub vcl_recv {
return(pass);
}

sub vcl_recv {

return(pass);

}

Ho comprovem:

$ curl http://localhost/sleep.php -w %{time_total}
10,002
$ curl http://localhost/sleep.php -w %{time_total}
10,001

$ curl http://localhost/sleep.php -w %{time_total}

10,002

$ curl http://localhost/sleep.php -w %{time_total}

10,001

Ara fem que faci cache només del fitxer sleep.php, afegint aixo al default.vcl:

sub vcl_recv {
    if (req.url == "/sleep.php")
    {
        return(lookup);
    }
    else
    {
        return(pass);
    }
}

sub vcl_recv {

if (req.url == "/sleep.php")

{

return(lookup);

}

else

{

return(pass);

}

Ho podem comprovar:

$ cp /var/www/sleep.php /var/www/sleep2.php
$ curl http://localhost/sleep.php -w %{time_total}
10,002
$ curl http://localhost/sleep.php -w %{time_total}
0,001
$ curl http://localhost/sleep2.php -w %{time_total}
10,002
$ curl http://localhost/sleep2.php -w %{time_total}
10,001

$ cp /var/www/sleep.php /var/www/sleep2.php

$ curl http://localhost/sleep.php -w %{time_total}

10,002

$ curl http://localhost/sleep.php -w %{time_total}

0,001

$ curl http://localhost/sleep2.php -w %{time_total}

10,002

$ curl http://localhost/sleep2.php -w %{time_total}

10,001

També comprovem que la capçalera “Age:” va pujant, i quan arriba a 5 (el max-age que li hem configurat), torna a trigar 10 segons:

$ curl http://localhost/sleep.php -IXGET -w %{time_total}
HTTP/1.1 200 OK
Server: Apache/2.2.22 (Ubuntu)
X-Powered-By: PHP/5.3.10-1ubuntu3.4
Cache-control: max-age=5, must-revalidate
Vary: Accept-Encoding
Content-Type: text/html
Transfer-Encoding: chunked
Date: Mon, 03 Dec 2012 10:53:54 GMT
X-Varnish: 500945303
Age: 0
Via: 1.1 varnish
Connection: keep-alive

10,002
$ curl http://localhost/sleep.php -IXGET -w %{time_total}
HTTP/1.1 200 OK
Server: Apache/2.2.22 (Ubuntu)
X-Powered-By: PHP/5.3.10-1ubuntu3.4
Cache-control: max-age=5, must-revalidate
Vary: Accept-Encoding
Content-Type: text/html
Transfer-Encoding: chunked
Date: Mon, 03 Dec 2012 10:53:56 GMT
X-Varnish: 500945305 500945303
Age: 2
Via: 1.1 varnish
Connection: keep-alive

0,001
$ curl http://localhost/sleep.php -IXGET -w %{time_total}
HTTP/1.1 200 OK
Server: Apache/2.2.22 (Ubuntu)
X-Powered-By: PHP/5.3.10-1ubuntu3.4
Cache-control: max-age=5, must-revalidate
Vary: Accept-Encoding
Content-Type: text/html
Transfer-Encoding: chunked
Date: Mon, 03 Dec 2012 10:53:59 GMT
X-Varnish: 500945309 500945303
Age: 5
Via: 1.1 varnish
Connection: keep-alive

0,001
$ curl http://localhost/sleep.php -IXGET -w %{time_total}
HTTP/1.1 200 OK
Server: Apache/2.2.22 (Ubuntu)
X-Powered-By: PHP/5.3.10-1ubuntu3.4
Cache-control: max-age=5, must-revalidate
Vary: Accept-Encoding
Content-Type: text/html
Transfer-Encoding: chunked
Date: Mon, 03 Dec 2012 10:54:09 GMT
X-Varnish: 500945310
Age: 0
Via: 1.1 varnish
Connection: keep-alive

10,002

$ curl http://localhost/sleep.php -IXGET -w %{time_total}

HTTP/1.1 200 OK

Server: Apache/2.2.22 (Ubuntu)

X-Powered-By: PHP/5.3.10-1ubuntu3.4

Cache-control: max-age=5, must-revalidate

Vary: Accept-Encoding

Content-Type: text/html

Transfer-Encoding: chunked

Date: Mon, 03 Dec 2012 10:53:54 GMT

X-Varnish: 500945303

Age: 0

Via: 1.1 varnish

Connection: keep-alive

10,002

$ curl http://localhost/sleep.php -IXGET -w %{time_total}

HTTP/1.1 200 OK

Server: Apache/2.2.22 (Ubuntu)

X-Powered-By: PHP/5.3.10-1ubuntu3.4

Cache-control: max-age=5, must-revalidate

Vary: Accept-Encoding

Content-Type: text/html

Transfer-Encoding: chunked

Date: Mon, 03 Dec 2012 10:53:56 GMT

X-Varnish: 500945305 500945303

Age: 2

Via: 1.1 varnish

Connection: keep-alive

0,001

$ curl http://localhost/sleep.php -IXGET -w %{time_total}

HTTP/1.1 200 OK

Server: Apache/2.2.22 (Ubuntu)

X-Powered-By: PHP/5.3.10-1ubuntu3.4

Cache-control: max-age=5, must-revalidate

Vary: Accept-Encoding

Content-Type: text/html

Transfer-Encoding: chunked

Date: Mon, 03 Dec 2012 10:53:59 GMT

X-Varnish: 500945309 500945303

Age: 5

Via: 1.1 varnish

Connection: keep-alive

0,001

$ curl http://localhost/sleep.php -IXGET -w %{time_total}

HTTP/1.1 200 OK

Server: Apache/2.2.22 (Ubuntu)

X-Powered-By: PHP/5.3.10-1ubuntu3.4

Cache-control: max-age=5, must-revalidate

Vary: Accept-Encoding

Content-Type: text/html

Transfer-Encoding: chunked

Date: Mon, 03 Dec 2012 10:54:09 GMT

X-Varnish: 500945310

Age: 0

Via: 1.1 varnish

Connection: keep-alive

10,002

Veiem que quan el contingut caduca, el torna a demanar i triga 10 segons. Pero què passa en aquest interval? La resta de peticions que arriben al varnish mentre es fa aquesta petició al backend, s’han d’esperar? Doncs no. Hi ha un periode de gràcia de 10 segons, durant els quals el varnish continuará servint l’objecte antic (o “ranci”, “stale” en anglés). Ho podem comprobar si fem dos curl alhora, en comptes d’un, i veurem com n’hi ha un que s’atura mentre l’altre continua servint les pagines ràpid, amb la capçalera “Age” per sobre dels 5 segons que li hem assignat:

$ while :;do curl http://localhost/sleep.php -IXGET;sleep 1;done
(...)
HTTP/1.1 200 OK
Server: Apache/2.2.22 (Ubuntu)
X-Powered-By: PHP/5.3.10-1ubuntu3.4
Cache-control: max-age=5, must-revalidate
Vary: Accept-Encoding
Content-Type: text/html
Transfer-Encoding: chunked
Date: Mon, 03 Dec 2012 11:16:29 GMT
X-Varnish: 500952300 500952287
Age: 8
Via: 1.1 varnish
Connection: keep-alive

$ while :;do curl http://localhost/sleep.php -IXGET;sleep 1;done

(...)

HTTP/1.1 200 OK

Server: Apache/2.2.22 (Ubuntu)

X-Powered-By: PHP/5.3.10-1ubuntu3.4

Cache-control: max-age=5, must-revalidate

Vary: Accept-Encoding

Content-Type: text/html

Transfer-Encoding: chunked

Date: Mon, 03 Dec 2012 11:16:29 GMT

X-Varnish: 500952300 500952287

Age: 8

Via: 1.1 varnish

Connection: keep-alive

També podem comprovar-ho amb el siege, posant dos usuaris concurrents, i veurem com durant una estona només un d’ells serveix contingut, pero que en cap moment es deixa de rebre continguts:

$ siege -t 30s -c 2 -d 1 localhost/sleep.php

1	$ siege -t 30s -c 2 -d 1 localhost/sleep.php

Si ens sembla que 10 segons de gràcia és massa poc, podem canviar aquest valor amb la directiva beresp.grace dintre de la rutina vcl_fetch al fitxer default.vcl. Per exemple, si volem posar un minut:

sub vcl_fetch {
set beresp.grace = 60s;
}

sub vcl_fetch {

set beresp.grace = 60s;

}

I si cau el servidor backend? Continuará servint el contingut antic (“stale“)? Doncs tal i com ho tenim, no. Perquè tal i com ho tenim, al varnish no li hem configurat res per distingir entre un backend saludable i un que no ho és, per tant els considerarà tots saludables. Per tant, si el backend cau, i el contingut caduca, tornarem error 503:

$ $ sudo /etc/init.d/apache2 stop
[sudo] password:
* Stopping web server apache2 apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.1.1 for ServerName
... waiting [OK]
$ sudo /etc/init.d/apache2 status
Apache2 is NOT running.
$ while :;do curl http://localhost/sleep.php -IXGET;sleep 1;done
(...)
HTTP/1.1 200 OK
Server: Apache/2.2.22 (Ubuntu)
X-Powered-By: PHP/5.3.10-1ubuntu3.4
Cache-control: max-age=5, must-revalidate
Vary: Accept-Encoding
Content-Type: text/html
Transfer-Encoding: chunked
Date: Fri, 30 Nov 2012 14:19:15 GMT
X-Varnish: 1538616905 1538616860
Age: 5
Via: 1.1 varnish
Connection: keep-alive

HTTP/1.1 503 Service Unavailable
Server: Varnish
Content-Type: text/html; charset=utf-8
Retry-After: 5
Content-Length: 419
Accept-Ranges: bytes
Date: Fri, 30 Nov 2012 14:19:15 GMT
X-Varnish: 1538616906
Age: 0
Via: 1.1 varnish
Connection: close
<span style="font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; line-height: 1.5;">Per fer que el periode de gràcia també s'apliqui quan el servidor està caigut, primer hem de dir-li al varnish com ha de comprovar si el apache esta en marxa o no (healthy). Això es fa configurant la directiva "probe" al backend:</span>
<pre class="lang:sh decode:true">backend default {
    .host = "127.0.0.1";
    .port = "8080";
    .probe = {
        .url = "/";
        .timeout = 100 ms;
        .interval = 1s;
        .window = 10;
        .threshold = 8;
    }
}

$ $ sudo /etc/init.d/apache2 stop

[sudo] password:

* Stopping web server apache2 apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.1.1 for ServerName

... waiting [OK]

$ sudo /etc/init.d/apache2 status

Apache2 is NOT running.

$ while :;do curl http://localhost/sleep.php -IXGET;sleep 1;done

(...)

HTTP/1.1 200 OK

Server: Apache/2.2.22 (Ubuntu)

X-Powered-By: PHP/5.3.10-1ubuntu3.4

Cache-control: max-age=5, must-revalidate

Vary: Accept-Encoding

Content-Type: text/html

Transfer-Encoding: chunked

Date: Fri, 30 Nov 2012 14:19:15 GMT

X-Varnish: 1538616905 1538616860

Age: 5

Via: 1.1 varnish

Connection: keep-alive

HTTP/1.1 503 Service Unavailable

Server: Varnish

Content-Type: text/html; charset=utf-8

Retry-After: 5

Content-Length: 419

Accept-Ranges: bytes

Date: Fri, 30 Nov 2012 14:19:15 GMT

X-Varnish: 1538616906

Age: 0

Via: 1.1 varnish

Connection: close

<span style="font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; line-height: 1.5;">Per fer que el periode de gràcia també s'apliqui quan el servidor està caigut, primer hem de dir-li al varnish com ha de comprovar si el apache esta en marxa o no (healthy). Això es fa configurant la directiva "probe" al backend:</span>

<pre class="lang:sh decode:true">backend default {

.host = "127.0.0.1";

.port = "8080";

.probe = {

.url = "/";

.timeout = 100 ms;

.interval = 1s;

.window = 10;

.threshold = 8;

}

D’aquesta manera es continua servint el contingut antic quan el backend cau, i el servim fins que s’aixequi i el varnish pugui tornar a demanar-li el contingut.

Fent proves amb el siege i curl, podem veure que tant en el cas del contingut que caduca com en el cas del servidor que cau, sempre hi ha un fil que “pringa”. La primera vegada que el varnish es troba amb el contingut caducat, el demana i espera que acabi. Mentrestant, la resta de fils reben el contingut antic, pero aquest fil “pringa”. El mateix passa amb el servidor caigut. Hi ha molta literatura intentant trobar la manera d’evitar que això passi, i pots llegir molt al respecte, pero el resum es que no hi ha manera d’evitar-ho. Que passa i punt. Un d’els fils “pringa”.

Amb això cobrim dos casos en els quals continuarem servint continguts antics (stale):
– No hi ha cap backend disponible, per tant servim contingut antic.
– Hi ha backends disponibles, i un fil ja ha demanat nou contingut. Mentre aquest nou contingut arriba des del backend, el varnish continua servint l’antic a la resta de fils.

I si volem que aquests dos casos tinguin un timeout diferent? Per exemple, pot donar-se el cas que volguem que, si el backend està disponible, els continguts caducats triguin un temps màxim. Passat aquest temps, deixem de servir el contingut antic i forcem a que s’esperin per al contingut nou. Aquest timeout normalment serà d’uns segons. I alhora, volem que si els backends estan caiguts, i per tant no hi ha manera d’aconseguir contingut nou, que el varnish continui servint el contingut antic durant molta estona, que normalment és millor que estar servint una pàgina d’error 503. Això s’ha de configurar a la rutina vcl_recv del default.vcl, i es fa aixi:

sub vcl_recv {
    if (req.backend.healthy) {
        set req.grace = 30s;
    } else {
        set req.grace = 1h;
    }
}

sub vcl_fetch {
    set beresp.grace = 1h;
}

sub vcl_recv {

if (req.backend.healthy) {

set req.grace = 30s;

} else {

set req.grace = 1h;

}

sub vcl_fetch {

set beresp.grace = 1h;

}

Per tant, el nostre fitxer default.vcl complet tindra el seguent contingut:

$ cat /etc/varnish/default.vcl
backend default {
    .host = "127.0.0.1";
    .port = "8080";
    .probe = {
        .url = "/";
        .timeout = 100 ms;
        .interval = 1s;
        .window = 10;
        .threshold = 8;
    }
}

sub vcl_recv {
    if (req.backend.healthy) {
        set req.grace = 30s;
    } else {
        set req.grace = 1h;
    }
    if (req.url == "/sleep.php")
    {
        return(lookup);
    }
    else
    {
        return(pass);
    }
}
sub vcl_fetch {
    set beresp.grace = 30s;
}

$ cat /etc/varnish/default.vcl

backend default {

.host = "127.0.0.1";

.port = "8080";

.probe = {

.url = "/";

.timeout = 100 ms;

.interval = 1s;

.window = 10;

.threshold = 8;

}

sub vcl_recv {

if (req.backend.healthy) {

set req.grace = 30s;

} else {

set req.grace = 1h;

}

if (req.url == "/sleep.php")

{

return(lookup);

}

else

{

return(pass);

}

sub vcl_fetch {

set beresp.grace = 30s;

}