使用Systemtap监视TCP连接队列溢出

本文阅读量 Posted by Kird on 2020-03-21

本文是我接触到systemtap后的使用入门记录文档,主要分安装和使用stp监视连接队列溢出的演示

安装

debuginfo下载官网
官网下载内核对应的debuginfo rpm包,本地安装:
kernel-debuginfo-2.6.32-504.23.4.el6.x86_64.rpm
kernel-debuginfo-common-x86_64-2.6.32-504.23.4.el6.x86_64.rpm

1
2
3
rpm -ivh kernel-debuginfo-2.6.32-504.23.4.el6.x86_64.rpm
rpm -ivh kernel-debuginfo-common-x86_64-2.6.32-504.23.4.el6.x86_64.rpm
yum install systemtap systemtap-client systemtap-common systemtap-runtime systemtap-server -y

安装完成后运行,如果不报错,则安装ok:

1
stap -v -e 'probe vfs.read {printf("read performed"); exit()}'

运行acceptq.stp

acceptq.stp是用来监测连接队列accpet queue中是否请求溢出,可以用来排查定位溢出的数据包信息:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
/*
* Prints details on specifically what connections
* suffered due to Accept Queue overflow. It can be greatly
* useful for identifying periodically hung applications that
* fails to accept() connections fast enough.
*
* Usage: stap acceptq.stp
*/
probe begin {
printf("time (us) \tacceptq\tqmax\tlocal addr\tremote_addr\n")
}

function skb_get_remote_v4addr:string(skb:long)
{
return format_ipaddr(__ip_skb_daddr(__get_skb_iphdr(skb)), 2 /* AF_INET */)
}


function skb_get_remote_port:long(skb:long)
{
return __tcp_skb_sport(__get_skb_tcphdr(skb))
}

probe kernel.function("tcp_v4_conn_request") {
if ($sk->sk_ack_backlog > $sk->sk_max_ack_backlog) {
printf("%d\t%d\t%d\t%s:%d\t%s:%d\n",
gettimeofday_us(),
$sk->sk_ack_backlog,
$sk->sk_max_ack_backlog,
inet_get_ip_source($sk),
inet_get_local_port($sk),
skb_get_remote_v4addr($skb),
skb_get_remote_port($skb));
}
}

模拟并发导致连接队列溢出

设置nginx 80 socket 连接队列为1

1
listen 80 default backlog=1;

验证队列大小:

1
2
[root ~]# ss -tnl | grep ":80 "
LISTEN 0 1 *:80 *:*

服务端运行systemtap监视

1
stap -v acceptq.stp

客户端使用ab进行发包测试

1
ab -n 10000 -c 1000 -X 10.x.x.x:80 http://0xfe.com.cn/

观察stap结果

Reference



支付宝打赏 微信打赏

赞赏支持一下