故障

我的弹幕服务，代码换了个环境运行，虽然服务可用，但是发现出现了如下情况，基本1秒输出来一次：

java.io.IOException: Connection reset by peer
  at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
  at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
  at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
  at sun.nio.ch.IOUtil.read(IOUtil.java:192)
  at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
  at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311)
  at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:853)
  at io.netty.buffer.WrappedByteBuf.writeBytes(WrappedByteBuf.java:641)
  at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:240)
  at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:115)
  at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:514)
  at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:471)
  at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:385)
  at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:351)
  at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
  at io.netty.util.internal.chmv8.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1412)
  at io.netty.util.internal.chmv8.ForkJoinTask.doExec(ForkJoinTask.java:280)
  at io.netty.util.internal.chmv8.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:877)
  at io.netty.util.internal.chmv8.ForkJoinPool.scan(ForkJoinPool.java:1706)
  at io.netty.util.internal.chmv8.ForkJoinPool.runWorker(ForkJoinPool.java:1661)
  at io.netty.util.internal.chmv8.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:126)

github.com/netty/netty…
Connection reset by peer is typically a result of your peer sending a TCP RST to you.

tcpdump

既然是网络故障，tcpdump 是否可用呢？

$sudo netstat -lnp|grep 8657
tcp        0      0 0.0.0.0:443                 0.0.0.0:*                   LISTEN      8657/java

没有列出全部的 socket 连接，从当前进程监听的端口入手试试。ifconfig可以看到这台机器有两个网卡（eth0 内网 ip，eth1 公网 ip）。

tcpdump 下的 tcp/ip 协议解析

先看下 tcpdump 输出tcp/ip 数据的大致格式

参考 www.tcpdump.org/tcpdump_man… 中关于 tcp 协议的描述，我整理了下格式

The general format of a TCP protocol line is

src > dst: Flags [tcpflags], seq data-seqno, ack ackno, win window, urg urgent, options [opts], length len

src and dst are the source and destination IP addresses and ports.
tcpflags are some combination of S (SYN), F (FIN), P (PUSH), R (RST), U (URG), W (ECN CWR), E (ECN-Echo) or . (ACK), or none if no flags are set.
data-seqno describes the portion of sequence space covered by the data in this packet.
ackno is sequence number of the next data expected the other direction on this connection.
window is the number of bytes of receive buffer space available the other direction on this connection.
urg indicates there is urgent data in the packet.
opts are TCP options (e.g., mss 1024).
len is the length of payload data.

抓包数据

通过tcpdump查看下443 端口的网络情况。与上面的格式略有不同。因为我有两个网卡，请求是从 eth0 进，从 eth1 出，我尝试使用tcpdump -i any 发现第三次握手输出的 askno 数据不对。（为什么会这样呢，还不知道，谁知道请告知下）

$sudo tcpdump -i any -nn port 443

16:18:41.553460 IP xxx.xxx.238.110.5745 > xxx.xxx.198.40.443: S 806033:806033(0) win 14600 <mss 1460,sackOK,timestamp 2230744217 0,nop,wscale 8>
16:18:41.553483 IP xxx.xxx.198.40.443 > xxx.xxx.238.110.5745: S 1720728675:1720728675(0) ack 806034 win 14480 <mss 1460,sackOK,timestamp 2425389802 2230744217,nop,wscale 7>
16:18:41.553647 IP xxx.xxx.238.110.5745 > xxx.xxx.198.40.443: . ack 1 win 58 <nop,nop,timestamp 2230744217 2425389802>
16:18:41.553677 IP xxx.xxx.238.110.5745 > xxx.xxx.198.40.443: R 1:1(0) ack 1 win 58 <nop,nop,timestamp 2230744217 2425389802>

但是如果你要导出*.pcap文件然后用wireshark来查看，还是得将两个网卡的流量合并到一起才行。

sudo tcpdump -i any port 443 -c 3 -w log.pcap

两个网卡分开抓

# 合并到一起写
$sudo tcpdump -i eth0 -nn port 443
$sudo tcpdump -i eth1 -nn port 443

16:51:16.073956 IP xxx.xxx.238.2.61835 > xxx.xxx.198.40.443: S 2659415794:2659415794(0) win 14600 <mss 1460,sackOK,timestamp 2328745293 0,nop,wscale 8>
16:51:16.073985 IP xxx.xxx.198.40.443 > xxx.xxx.238.2.61835: S 955422999:955422999(0) ack 2659415795 win 14480 <mss 1460,sackOK,timestamp 2427344323 2328745293,nop,wscale 7>
16:51:16.074147 IP xxx.xxx.238.2.61835 > xxx.xxx.198.40.443: . ack 955423000 win 58 <nop,nop,timestamp 2328745293 2427344323>
16:51:16.074192 IP xxx.xxx.238.2.61835 > xxx.xxx.198.40.443: R 0:0(0) ack 1 win 58 <nop,nop,timestamp 2328745293 2427344323>

上面的握手比较简单对应着

时间 协议 src > dst tcpflags data-seqno ack ackno win window <opts>

握手逻辑

一次连接建立的三次握手的步骤：

Caller sends SYN
Recipient responds with SYN, ACK
Caller sends ACK

数据分析

src 发送的 tcpflags = S，也就是说发送的SYN请求，随机产生一个值 data-seqno = J(2659415794)，并将该数据包发送给 dst
dst 发送的 tcpflags = S，实际发送的时候 SYN 和 ACK 标志位都置1了，返回一个 ackno=J + 1，也就是2659415795，然后随机生成一个新的 data-seqno = K(955422999) 发送给 src
src 检查收到的 ackno 是不是等于J+1，然后再发送一个 ackno = K+1，也就是955423000，前面的文档中说道tcpflags为.的时候也表示ACK
src 最后为什么又给 dst 发送一个 tcpflags = R 的请求呢？

为什么会出现 RST

因为具有周期性，我大概猜到了，是 lvs 对我的后端服务的健康检查导致的，健康检查的机制就是建立成功后就 RST 掉，就不占用 session 了。这是公司的规范，问题算是定位了。

既然公司的规范，那么我们这边需要定制修改下，针对 vip 发送的 RST 不处理。

其他参考

github.com/netty/netty…
tools.ietf.org/html/rfc793
my.oschina.net/costaxu/blo…

今天的文章java.io.IOException: Connection reset by peer 原因分析分享到此就结束了，感谢您的阅读。

版权声明：本文内容由互联网用户自发贡献，该文观点仅代表作者本人。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容，请发送邮件至举报，一经查实，本站将立刻删除。
如需转载请保留出处：http://bianchenghao.cn/15507.html

java.io.IOException: Connection reset by peer 原因分析

故障