Failover and loadbalancer using keepalived (LVS) on two machines

In this scenario, we have two machines and try to make the most of available resources. Each of the node will play the role of realserver, it will provide a service such as a web or a mail server. At the same time, one of the machines will loadbalance the requests to itself and to its neighbor. The node that is responsible of the loadbalancing owns the VIP. Every client connects to it transparently thanks to the VIP. The other node is also able to take over the VIP if it detects that current master failed but in nominal case only process requests forwarded by the loadbalancer.

Throughout this post the following ip addresses are used. Do not forget to modify them according to you network settings:

  • hostname1 ip address: 192.168.9.10
  • hostname2 ip address: 192.168.9.20
  • virtual ip address: 192.168.9.100

Configuration Files

Install Keepalived and set up configuration files

  • Install Keepalived on both machines:

[tux]# apt-get install keepalived

  • Copy provided Keepalived configuration files (master and slave) into /etc/keepalived/ directory:

[hostame1]# cp keepalived_master.conf /etc/keepalived/keepalived.conf
[hostame2]# cp keepalived_slave.conf /etc/keepalived/keepalived.conf

  • Copy provided bypass_ipvs.sh script that will be called during master/slave transitions on both machines:

[tux]# cp bypass_ipvs.sh /etc/keepalived/

Install and Configure services (mail and web server in our case) on both machines

  • For our test purposes, the realservers provide a mail and a web server each. First install them:

[tux]# apt-get install postfix apache2

  • Configure postfix, make sure each node can connect to the mail server of its neighbor. In installation phase, select local only, then comment following line in /etc/postfix/main.cf to be sure the mail server not only listen on local interface:

# inet_interfaces = loopback-only

  • Then try to connect to the mail server of your neighbor to be sure it works correctly:

[hostname1]# telnet hostname2 25
Connected to hostname2.
Escape character is ‘^]’.
220 hostname2 ESMTP Postfix (Debian/GNU)

[hostname2]# telnet hostname1 25
Connected to hostname1.
Escape character is ‘^]’.
220 hostname1 ESMTP Postfix (Debian/GNU)

  • Generate digest string to check web server using genhash value for one accessible web page. In our case we compute the digest for /apache2-default/index.html which is the default page for apache2:

[hostname1]# genhash -s hostname1 -p 80 -u /apache2-default/index.html
MD5SUM = c7b4690c8c46625ef0f328cd7a24a0a3

[hostname1]# genhash -s hostname2 -p 80 -u /apache2-default/index.html
MD5SUM = c7b4690c8c46625ef0f328cd7a24a0a3

  • Keepalived will check if the server is up using this digest value. That’s why you have to copy it in Keepalived configuration specifically in realserver sections intended to web server:
HTTP_GET {
  url {
    path /apache2-default/index.html
    digest c7b4690c8c46625ef0f328cd7a24a0a3
  }
  connect_timeout 3
  nb_get_retry 3
  delay_before_retry 2
}
  • At this step, we set up a functional mail and web server on each node.

Configure VIP(Virtual IP Service)

  • This IP will enable access to realservers. It will be completely configured from Keepalived configuration and does not need any other modification. Only one of the two nodes owns the VIP at a given time. Thus there are different configurations for the two nodes. In our case, hostname1 is set up as the master and hostname2 is the slave. Furthermore the VIP is 192.168.9.100:
  • On the master:
# describe virtual service ip
vrrp_instance VI_1 {
  # initial state
  state MASTER
  interface eth0
  # arbitary unique number 0..255
  # used to differentiate multiple instances of vrrpd
  virtual_router_id 1
  # for electing MASTER, highest priority wins.
  # to be MASTER, make 50 more than other machines.
  priority 100
  authentication {
    auth_type PASS
    auth_pass xxx
  }
  virtual_ipaddress {
    192.168.9.100/24
  }
  • On the slave:
# describe virtual service ip
  vrrp_instance VI_1 {
  # initial state
  state BACKUP
  interface eth0
  # arbitary unique number 0..255
  # used to differentiate multiple instances of vrrpd
  virtual_router_id 1
  # for electing MASTER, highest priority wins.
  # to be MASTER, make 50 more than other machines.
  priority 50
  authentication {
    auth_type PASS
    auth_pass xxx
  }
  virtual_ipaddress {
    192.168.9.100/24
  }
  • Then we can start or reload Keepalived and check that the master really owns the VIP:
[hostname1]# /etc/init.d/keepalived start
[hostname2]# /etc/init.d/keepalived start
[hostname1]# ip addr list dev eth0
2: eth0: <BROADCAST,MULTICAST,UP,10000> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:0c:29:e3:e2:40 brd ff:ff:ff:ff:ff:ff
    inet 192.168.9.10/24 brd 192.168.9.255 scope global eth0
    inet 192.168.9.100/24 scope global secondary eth0
    inet6 fe80::20c:29ff:fee3:e240/64 scope link
       valid_lft forever preferred_lft forever

Configure loadbalancing

  • The loadbalancing is also configured thanks to Keepalived. At a given time, only one machine owns the VIP and then the requests are forwarded to the realservers according to chosen rules. Services are accessed through VIP and are processed indifferently by one or another machine. In /etc/keepalived/keepalived.conf realservers are defined like this:
# describe virtual mail server
virtual_server 192.168.9.100 25 {
  delay_loop 15
  lb_algo rr
  lb_kind DR
  persistence_timeout 50
  protocol TCP

  real_server 192.168.9.10 25 {
    TCP_CHECK {
      connect_timeout 3
    }
  }
  real_server 192.168.9.20 25 {
    TCP_CHECK {
      connect_timeout 3
    }
  }
}
  • This example demonstrates how the requests intended to the web server are processed. The requests are loadbalanced according to round robin (rr) algorithm to the realservers. Direct Routing (DR) mode is preferred. In this scenario as soon as a realserver is selected to process a request, then the realserver is directly connected to the client without going through the loadbalancer. Thus a single loadbalancer can process a huge amount of requests without becoming the bottleneck of our system since query processing requires only a few amounts of resources.
  • Then enable ip_forward on both machines permanently. In /etc/sysctl.conf :

net.ipv4.ip_forward = 1

  • You can load this option and check it is set up correctly with the following commands:

[tux]# sysctl -p
net.ipv4.ip_forward = 1

[tux]# sysctl -a | grep net.ipv4.ip_forward
net.ipv4.ip_forward = 1

  • We have a mail and a web server at disposal. Ensure that loadbalancer is configured correctly:
[hostname1]# ipvsadm -L -n
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  192.168.9.100:25 rr persistent 50
  -> 192.168.9.10:25             Local   1      0          0
  -> 192.168.9.20:25             Route   1      0          0
TCP  192.168.9.100:80 rr persistent 50
  -> 192.168.9.10:80             Local   1      0          0
  -> 192.168.9.20:80             Route   1      0          0
  • Requests intended to VIP on port 25 or 80 are distributed to 192.168.9.10 and 192.168.9.20. Then we try to connect to the mail server through the VIP from another machine, that is to say not hostname1 or hostname2:

[tux]# telnet 192.168.9.100 25
Trying 192.168.9.100…

  • Noting happens… And that’s completely normal and happens every time the loadbalancer assigns the request to the node that does not currently owns the VIP since this node is not supposed to handle this request. The traditional way to sort out this issue is to configure the VIP on the other node for example on the loopback interface so that it accepts packets with VIP as destination address. Then you should configure network interfaces such that they ignore some ARP requests playing with arp_ignore and arp_announce options. This will be sufficient to resolve our problem in a classical scenario where there are dedicated machines for the load distribution, but not in our case!
  • In our architecture, both the loadbalancer and the realserver are located on the same machine. If you simply add the VIP on the secondary machine, there will be cases where packets will be processed by the loadbalancer of the two machines since it is not deactivated on the slave. Furthermore if each loadbalancer selects its neighbor to process the request, we will face a ping pong effect. In such cases there will be an infinite loop between the two nodes.  Thus the request is not handled at all!
  • Fortunately, there is a trick to handle every request efficiently. We use the mechanism specific to Keepalived to call predefined scripts on master/slave transitions in /etc/keepalived/keepalived.conf:

# Invoked to master transition
notify_master “/etc/keepalived/bypass_ipvs.sh del 192.168.9.100″
# Invoked to slave transition
notify_backup “/etc/keepalived/bypass_ipvs.sh add 192.168.9.100″
# Invoked to fault transition
notify_fault “/etc/keepalived/bypass_ipvs.sh add 192.168.9.100″

  • bypass_ipvs.sh script adds a nat rule when host is configured as slave and removes it when it goes to master state so that requests intended to VIP are processed correctly even when they are handled by the slave. The prerouting rule is essential for the slave to redirect incoming service packet to localhost. Otherwise a loop can appear between master and slave. The routing table is consulted when a packet that creates a new connection is encountered. Prerouting rule alters packets as soon as they come in. Redirect statement redirects the packet to the machine itself by changing the destination IP to the primary address of the incoming interface. Locally generated packets are mapped to the 127.0.0.1 address thanks to the following rule. Thus packets forwarded by the active loadbalancer are not handled a second time.

iptables -A PREROUTING -t nat -d 192.168.9.100 -p tcp -j REDIRECT

  • Check rule on the slave:
[hostname2]# iptables -t nat --list
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination
REDIRECT   tcp  --  anywhere             192.168.9.100       

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

Failover

  • Stop Keepalived on the master:

[hostname1]# /etc/init.d/keepalived stop

  • Ensure new master owns VIP:
[hostname2]# ip addr list dev eth0
2: eth0: <BROADCAST,MULTICAST,UP,10000> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:0c:29:ab:e7:dd brd ff:ff:ff:ff:ff:ff
    inet 192.168.9.20/24 brd 172.16.89.255 scope global eth0
    inet 192.168.9.100/24 scope global secondary eth0
    inet6 fe80::20c:29ff:feab:e7dd/64 scope link
       valid_lft forever preferred_lft forever
  • Check that nat rule disappeared:
[hostname2]# iptables -t nat --list
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination      

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
  • If origin master appears again, then the global architecture adjusts automatically to keep on processing incoming requests.

Service Failure Handling

  • If a service failed, then it should not correctly respond to basic Keepalived checks and be automatically removed from ipvs table:
[hostname1]# ipvsadm -L -n
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  192.168.9.100:25 rr persistent 50
  -> 192.168.9.10:25             Local   1      0          0
  -> 192.168.9.20:25             Route   1      0          0
TCP  192.168.9.100:80 rr persistent 50
  -> 192.168.9.10:80             Local   1      0          0
  -> 192.168.9.20:80             Route   1      0          0

[hostname1]# /etc/init.d/postfix stop
Stopping Postfix Mail Transport Agent: postfix.

[hostname1]# ipvsadm -L -n
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  192.168.9.100:25 rr persistent 50
  -> 192.168.9.20:25             Route   1      0          0
TCP  192.168.9.100:80 rr persistent 50
  -> 192.168.9.10:80             Local   1      0          0
  -> 192.168.9.20:80             Route   1      0          0
  • New requests are not forwarded any more to the service that encountered failure.

Other Considerations

Thanks to the nat rule, we successfully set up loadbalancing and automatic failover on a two nodes configuration. With such architecture we take full advantage of available resources. One of the nodes plays the role of loadbalancer and realserver as the other can take over the role of loadbalancer as soon as it detects that its neighbor failed. The slave does not only check the master but also handles the requests it receives from the loadbalancer.

Keep in mind that the request distribution is not strictly made on a connection basis. The clients connect to one of the realserver going through the VIP. Once this is done and as soon as the realserver in question is available,  further requests will be forwarded to the same realserver. In a classical scenario, many clients connect to the VIP, thus the global amount of requests is equally distributed between the two nodes.

To go further

This section will show you in details the interaction between the different components of my system. This could be very useful to understand how it is suppose to work if everything goes well. The following figures were built based on the traces I got from Wireshark. So this is not how it is suppose to behave based on what you can read on some documentation, but it is really how it works under my configuration. The configuration is the same as before expect from the fact that I added a client machine:

  • hostname1 ip address: 192.168.9.10
  • hostname2 ip address: 192.168.9.20
  • virtual ip address: 192.168.9.100
  • client ip address: 192.168.9.15

Here are the commands I entered:

client~$ telnet 192.168.9.100 25
Trying 192.168.9.100…
Connected to 192.168.9.100.
Escape character is ‘^]’.
220 hostname2 ESMTP Postfix (Debian/GNU)
HELO hostname2
250 hostname2
QUIT
221 2.0.0 Bye
Connection closed by foreign host.

As expected, it works like a charm. I access hostname2 mail server going through VIP address currently owned by hostname1. In this case, as you can see, the ipvs mechanism decided to forward the request to hostname2. It could also decide to process it on its own since the decision is taken on a round robin basis, but this is not what I wanted to show. There are the interactions I built from the traces I noticed on client, hostname1, hostname2.

As you can see, the client always speaks to the VIP address and thus sends its requests to hostname1 in my case. Thanks to keepalived, hostname1 forwards the request to hostname2. The important point to notice is that hostname2 directly responds to the client without going through the VIP machine. So the client does not know that the current responder is in fact hostname2 since the packet it received has the VIP address as source address. The key point to make it work is to ensure that hostname2 can accepts and process packets with the VIP address as destination address. By default, this is not the case. Here it works because of my PREROUTING rule. Another way would be to add the VIP address as a second address in hostname2. In my configuration only the first option can work since the two machines run ipvs. If I set up the VIP address on each machine, infinite loops can appear between hostname1 and hostname2 if each one decides to forward the request to the other.

You have seen the traces when it works as expected. But what if for example hostname2 is not configured properly to accept requests with VIP destination address. For test purposes, I thus manually removed my PREROUTING rule on hostname2 to see what’s going on.

Firs of all, you notice that the client does not receive any response. As before, the client sends its first request to the VIP. The VIP owner makes its job correctly; it forwards the request to hostname2. But here is the problem: hostname2 receives the SYN request with VIP destination address. There is no reason for him to process such a request, so it simply drops the packet. The VIP keeps on sending the same SYN request until the time-to-live is exceeded. That’s why you should take care to correctly configure every machine that is suppose to respond to requests with VIP destination address.

  1. Luis
    December 2nd, 2011 at 06:06
    Reply | Quote | #1

    Changed to bash as someone suggested and eveything worked fine

  2. Luis
    December 2nd, 2011 at 07:00
    Reply | Quote | #2

    Now.. I am getting the following messages:

    Dec 2 17:04:18 AERLRS129 Keepalived_healthcheckers: Removing service [192.168.0.158:80] from VS [192.168.0.148:80]
    Dec 2 17:04:18 AERLRS129 Keepalived_healthcheckers: Remote SMTP server [127.0.0.1:25] connected.
    Dec 2 17:04:18 AERLRS129 Keepalived_healthcheckers: Error processing RCPT cmd on SMTP server [127.0.0.1:25]. SMTP status code = 550
    Dec 2 17:04:18 AERLRS129 Keepalived_healthcheckers: Can not read data from remote SMTP server [127.0.0.1:25].

  3. Paul
    January 5th, 2012 at 17:20
    Reply | Quote | #3

    Hi,

    took your exact same configuration, on 2 different machines, connected with a switch and both keepalived stay in MASTER state.. obviously there is something missing

  4. Mike
    January 8th, 2012 at 23:33
    Reply | Quote | #4

    @Paul
    Hi Paul,
    Just had a similar problem, maybe that iptables is blocking incoming vrrp advertisements on your machines (in this case your second machine)? In my proof of concept solution (i’m not following this document, I simply stopped iptables. I know you cannot do it cause this solution is using it, but you should try to confirm if vrrp advertisements are received well !
    good luck!

  5. Michael
    February 22nd, 2012 at 21:34
    Reply | Quote | #5

    We’ve followed this guide step by step in our office and have a very similar configuration, what we’ve noticed though is that if keepalived is stopped on the master, the backup continues to route traffic to both nodes. This is desirable, since the actual application service has not stopped and can still be used. The problem is that the packets going to the previous master get dropped and are never processed. What ends up happening is that the application only works half of the time, whenever traffic is routed to the backup.

  6. Michael
    February 23rd, 2012 at 14:34
    Reply | Quote | #6

    I believe I have figured out the problem, but I need advice on configuring a solution. In a situation above labeled Failover, the keepalived service is stopped and the backup becomes master. The routing rule is removed from the backup, but is never sent to the master to be added. This is not so much a situation where the target service fails, but when keepalived fails. If the routing rule is added to the old master then it can now function as an application server, until keepalived is restarted.

    So essentially in the case of a keepalived service stop on the master, I want to add this routing rule to the old master. The bypass file will remove the rule if and when the service is restarted.

  7. March 13th, 2012 at 22:35
    Reply | Quote | #7

    First of all, thank you for this great post that proves a very clever way of thinking HA, simple, clear and obvious. You still have emules : http://www.keepalived.org/pdf/asimon-jres-paper.pdf.
    Just one precision concerning your iptables script (bypass_ipvs.sh), if the virtual address is hold by a reverse dns, you have to add -n at lines 66 and 74 :
    n=$(iptables -n -t nat -L| grep $VIP | wc -l)
    in place off :
    n=$(iptables -t nat -L| grep $VIP | wc -l)
    If you don’t, the rule will always be incremented and never replaced.

  8. March 13th, 2012 at 23:06
    Reply | Quote | #8

    @Bruno
    Edit: well, “the rule will always be incremented and never replaced” = “the rule will always be incremented and never replaced by nothing” = “deleted”

  9. March 26th, 2012 at 16:44
    Reply | Quote | #9

    Thanks , I’ve just been searching for info about this topic for ages and yours is the greatest I have discovered till now. But, what about the conclusion? Are you sure about the source?

  10. March 27th, 2012 at 10:11

    @Jenny Mollen Interview
    What would you like as a conclusion? Please try it with your application behind and find if it can fit your needs. The source was my test lab, the config guides of keepalived, google and a lot of time to understand why it does not work at first shot… I guess this is computer science as usual, dont you think so?

  11. rip
    April 18th, 2012 at 19:59

    I use this tutorial to deploy a lvs with 2 nodes in DR mode.

    From hostname1 I telnet to VIP and only connects to local, when connects to hostname2 it hangs up.
    I try with and without the prerouting rule and don’t work

    -> RemoteAddress:Port Forward Weight ActiveConn InActConn
    TCP 192.168.2.154:22 rr
    -> 192.168.2.150:22 Local 1 0 0
    -> 192.168.2.151:22 Route 1 0 0
    TCP 192.168.2.154:25 rr
    -> 192.168.2.150:25 Local 1 0 0
    -> 192.168.2.151:25 Route 1 0 0

    any ideas?

  12. rip
    April 19th, 2012 at 11:01

    @rip
    There is a bug in Debian Squeeze with kernel 2.6, LVS-NAT and fragmentation.

    There is a workaround, upgrade to kernel 3 or disable all fragmentation options in the intefaces.

    hostname:~# ethtool -k eth0
    Offload parameters for eth0:
    rx-checksumming: on
    tx-checksumming: on
    scatter-gather: on
    tcp-segmentation-offload: off
    udp-fragmentation-offload: off
    generic-segmentation-offload: off
    generic-receive-offload: off
    large-receive-offload: off
    ntuple-filters: off
    receive-hashing: off

  13. Excel
    April 26th, 2012 at 16:54

    Hello,

    thank you for the article and the script. It seems that there is a problem when trying to connect to the VIP from a realserver :

    http://marc.info/?l=keepalived-devel&m=127117320206255&w=2

    Do you have encounter this problem ? Do you have a solution ?

  14. Ste
    May 4th, 2012 at 19:42

    @gcharriere

    First and foremost. Thanks for the howto, it has been a great help. We have managed to get all but the script top function. I have used the exact script provided with no alterations and when running we get the following error. I am running on ubuntu 12.04 LTS.

    ./bypass_ipvs.sh: 33: ./bypass_ipvs.sh: Syntax error: “(” unexpected

    Any assistance would be much appreciated.

  15. Ste
    May 8th, 2012 at 13:12

    @Ste

    If you get the following error:

    ./bypass_ipvs.sh: 33: ./bypass_ipvs.sh: Syntax error: “(” unexpected

    Change the following in bypass_ipvs.sh

    from
    #! /bin/sh

    to
    #! /bin/bash

    Worked for me.

  16. June 14th, 2012 at 16:39

    I also had problems with bypass_ipvs.sh. I wrote a replacement in perl which you can find here if you’re interested: http://www.garyrule.net/wp/scripts/bypass_ipvs-pl/

  17. Juan David DIAZ
    July 10th, 2012 at 16:14

    Hello, I would like to know with which distribution and version of Linux did you make it? We made exactly the same configuration with 2 Linux Debian Squeeze 6.0 servers and keepalived 1.1.20 and it doesn’t make the load balancing, it responds allways the MASTER server, all http and SMTP requests. Do you have any idea why is that happening, the configuration is the same. Thank you

  18. insubordinate
    August 9th, 2012 at 11:15

    I have a setup like this, but one of my machines has 99% cpu load on keepalived, I cannot find why. Someone sugested a “packet storm”, but I haven’t been able to resolve it.

  19. Ruslan
    October 12th, 2012 at 08:02

    Thank you!!! I am really impressed! It is works!

  20. October 16th, 2012 at 10:00

    Great post! Neatly explained. Added your blog to my favourites, keep up the greatwork

  21. Nadj
    October 29th, 2012 at 11:58

    @Michael
    hello,
    i have the same problem than Michael.
    Know anyone the solution? What can I do?
    Thank you very much

  22. December 4th, 2012 at 06:49

    Hi there colleagues, its great piece of writing on the topic
    of educationand entirely explained, keep it up all the time.

  23. Alexey
    May 22nd, 2013 at 03:01

    @gcharriere
    I have a couple of questions.

    1) Most important.
    Can those parameters be used with you configuration, or they are not necessary or harmful?
    kernel.core_uses_pid = 1
    # Disable filtering of packets by which interface they arrive on
    # vs which interface they go out on.
    net.ipv4.conf.default.rp_filter = 0
    net.ipv4.conf.default.accept_source_route = 1
    # Enable selection of arp_ignore and arp announce on other interfaces
    net.ipv4.conf.all.arp_ignore = 1
    net.ipv4.conf.all.arp_announce = 2
    # Only respond to an arp request for an address on this interface.
    # repeat this for all interfaces
    net.ipv4.conf.eth0.arp_ignore = 1
    # When sending an arp announce, only use addresses on this interface.
    # repeat for all interfaces.
    net.ipv4.conf.eth0.arp_announce = 2

    2) Why this parameter should be used? Can it be missed?
    persistence_timeout 50

  24. May 31st, 2013 at 17:51

    Have you solved the problem of loop without using bypass_ipvs.sh ?? Could I specify any parameter only loadbalance when the request comes from the interface it has the IP ?

    Thanks

    Pablo

  25. August 5th, 2013 at 09:19

    An impressive share! I’ve just forwarded this onto a coworker who has been doing a little research on this. And he in fact ordered me dinner because I found it for him… lol. So allow me to reword this…. Thanks for the meal!! But yeah, thanx for spending time to talk about this subject here on your site.

Comment pages
Comments are closed.