February 2011

kernel 2.6.32 的服务器作为lvs 的rs 无法建立tcp 连接的问题

最近用xen 的虚拟机作为lvs 的rs 组了一下lvs ,发现上线以后,无法与client 建立tcp 连接,开始还以为是xen 的bridge 网络的问题,后来才把问题定位到系统内核。其实之前也出现过类似的问题的,但当时急着上线,没去研究了,这回太忙,连测试环境都搭出来了,也都没空去深究,今天同事帮我测试了一下,倒是完全明白了。

问题描述:

LVS 用dr 的方式组起来以后,访问vip 时,无法与real server 建立tcp 连接。tcpdump 看到只有client 过来的syn 包,rs 并不返回ack 包,根本无法完成tcp 握手。用curl 来模拟访问测试的话,会看到这样的返回 curl: (7) couldn’t connect to host 。

看看我的配置吧。

lvs director :

采用dr 的调度方式,通过内网把请求分配到real server

lvs 的 real server :

内核为 2.6.32-5-amd64 (2.6.32 这个内核很普遍了,debian 6 和 rhel 6 默认都这个了),不响应arp 配置

echo “1” >/proc/sys/net/ipv4/conf/lo/arp_ignore
echo “2” >/proc/sys/net/ipv4/conf/lo/arp_announce
echo “1” >/proc/sys/net/ipv4/conf/all/arp_ignore
echo “2” >/proc/sys/net/ipv4/conf/all/arp_announce

其它基本的配置就不说了,如果就这样子起服务后,就会出现我描述的问题了。

原因在于内核的这个参数 : reverse path filtering,这个是啥,我摘抄一下:

Reverse Path Filtering (RPF) is a technology that is used on InternetProtocol routers to try and prevent source address spoofing, which is often used for DenialOfService attacks. RPF works by checking the source IP of each packet received on an interface against the routing table. If the best route for the source IP address does not use the same interface that the packet was received on the packet is dropped. There are some situations where this feature will obviously not be the desired behaviour and will need to be disabled. In general if you are not multi-homed then enabling RPF on your router will not be a problem.

简单地说,就是如果从eth1 接收到包,就不会从eth0 返回,内核把这个包丢弃。

正好我们的LVS 就是这样的服务模式,lvs director 接收到client 对vip 的访问,经过包的重写通过内网把包分配给rs ,rs 直接使用外网返回这个client 的请求。正好就会被rp_filter 给干掉了。

但是,为什么我们以前并没有出现过类似的问题呢??看看rp_filter 的默认内核配置

net.ipv4.conf.eth1.rp_filter = 1

net.ipv4.conf.eth0.rp_filter = 1

net.ipv4.conf.lo.rp_filter = 0

net.ipv4.conf.default.rp_filter = 1

net.ipv4.conf.all.rp_filter = 0

rp_filter 的值的意义是:

814 rp_filter – INTEGER

815 0 – No source validation.

816 1 – Strict mode as defined in RFC3704 Strict Reverse Path

817 Each incoming packet is tested against the FIB and if the interface

818 is not the best reverse path the packet check will fail.

819 By default failed packets are discarded.

820 2 – Loose mode as defined in RFC3704 Loose Reverse Path

821 Each incoming packet’s source address is also tested against the FIB

822 and if the source address is not reachable via any interface

823 the packet check will fail.

0 就是对进来的包完全不作检查,这样有被dos 攻击的风险。

1 就是严格检查,只要不是这个interface 的包,就不返回。

2 就是不太严格,只要本机配置了这个ip ,还是可以返回的。

对于lvs 来说,用2 也是可以的。

只要把eth1 的 rp_filter 的值置为 0 ,lvs 的服务就能正常了。

从这里可以找到答案:http://www.spinics.net/lists/linux-net/msg17162.html

The first patch changed rp_filter from a boolean to an integer, and the

second patch changed the way the interface-specific value and the “all”

value are combined to produce a functional value from a logical AND to

an arithmetic MAX.

Before patches : functional value = interface AND all

After patches  : functional value = MAX(interface, all)

So now if net.ipv4.conf.all.rp_filter=1, source validation is enabled on

all interfaces as their functional value is at least 1. You may either

set net.ipv4.conf.all.rp_filter to 0 (to disable it) or 2 (to enable

loose mode globally), or set net.ipv4.conf.$interface.rp_filter to 2 (to

enable loose mode on $interface).

I guess that the patch suggested by Dave Miller is related to another

(apparently incomplete) change that occured in 2.6.32 :

在2.6.31 , 对于rp_filter 的最终值,有了不同的计算方法。

之前,是只要设置了all.rp_filter 为0 ,那么就是0 了。

之后,看具体的interface 和 all 的值的最大值来取最终值。

默认是 net.ipv4.conf.eth1.rp_filter = 1 和 net.ipv4.conf.all.rp_filter = 0 ,因此,组lvs 的时候,就会丢弃包了。

问题解决!!下次调一下内核参数吧。

PS:可以通过设置这个内核参数来查看一下log

echo 1 >/proc/sys/net/ipv4/conf/<interfacename>/log_martians

64 位linux 上安装svn 1.4.x 的错误

很久没更新了,随便更新一篇吧,2011 年的1 月一篇blog 都没有post 。。。皆因全去做杂事了。做事情的人了,没有技术上的长进阿!

部门内部的svn 用了很旧的版本,1.4.6 了,最近把它迁移到一台64 bit 的机器上,svn 编译不过去,具体报错如下:

cd subversion/libsvn_ra_dav && /bin/sh /home/download/subversion-1.4.6/libtool –tag=CC –silent –mode=link gcc  -g -O2  -g -O2 -pthread    -rpath /usr/local/lib -o libsvn_ra_dav-1.la  commit.lo fetch.lo file_revs.lo log.lo merge.lo options.lo props.lo replay.lo session.lo util.lo ../../subversion/libsvn_delta/libsvn_delta-1.la ../../subversion/libsvn_subr/libsvn_subr-1.la /home/download/subversion-1.4.6/apr-util/libaprutil-0.la -lexpat /home/download/subversion-1.4.6/apr/libapr-0.la -lrt -lm -lcrypt -lnsl  -lpthread -ldl /home/download/subversion-1.4.6/neon/src/libneon.la -lz
/usr/bin/ld: /home/download/subversion-1.4.6/neon/src/.libs/libneon.a(ne_request.o): relocation R_X86_64_32 against `a local symbol’ can not be used when making a shared object; recompile with -fPIC
/home/download/subversion-1.4.6/neon/src/.libs/libneon.a: could not read symbols: Bad value
collect2: ld returned 1 exit status
make: *** [subversion/libsvn_ra_dav/libsvn_ra_dav-1.la] Error 1

其实它也提到了解决方案,recompile with -fPIC ,但。。。我不知道在哪里加……

祭出google 大神,竟然要自己手动改Makefile 。。。

修改 neon/src/Makefie 的 CFLAGS 为 -fPIC -g -O2

困扰了一会,顺便更新一下blog 。