nginx代理dns的问题
侧边栏壁纸
博主昵称
yuc

  • 累计撰写 291 篇文章
  • 累计收到 0 条评论

nginx代理dns的问题

yuc
yuc
2024-07-30 / 最后修改: 2024-08-01 03:21 / 0 评论 / 7 阅读 / 正在检测是否收录...
环境背景

目前是由nginx 4层代理的 bind 服务提供 dns 解析服务,但是出现了一些问题,这里作为一个记录

配置和问题

这是原本的配置,测试dns解析基本上没有问题

    upstream dns {
        server 192.168.10.120:53;
        server 192.168.10.121:53;
        server 192.168.10.122:53;
    }
    server {
        listen 53;
        listen 53 udp;
        proxy_connect_timeout 2s;
        proxy_timeout 2s;
        proxy_next_upstream on;
        proxy_pass dns;
        error_log /usr/local/nginx/logs/dns.log info;
    }

但日志一直有问题,要等到会话超时,也就是 proxy_timeout 之后才会正常打印日志,这样后续就会很难配合 bind 服务排查问题,时间是对不上的,而且bind服务只能记录 nginx 机器的 ip 地址

经过检索,增加了如下配置,主要是参数 proxy_responses

    server {
        listen 53;
        listen 53 udp;
        proxy_connect_timeout 2s;
        proxy_responses 1;
        proxy_timeout 2s;
        proxy_next_upstream on;
        proxy_pass dns;
        error_log /usr/local/nginx/logs/dns.log info;
    }

这个参数 nginx 官方的解释如下:

Sets the number of datagrams expected from the proxied server in response to a client datagram if the UDP protocol is used. The number serves as a hint for session termination. By default, the number of datagrams is not limited.

If zero value is specified, no response is expected. However, if a response is received and the session is still not finished, the response will be handled.

主要设置一个接收响应的报文,如果设置 1 ,则只要有一个数据包响应,则认为会话结束,然后它就会记录日志,这样确实解决了及时记录日志的问题

新的问题

在配置使用几天后发现了新的问题,明显感觉这几天上网、登录服务器等操作经常变慢,监控甚至又开始出现了一个很早之前的报错---域名解析失败,怀疑是上次修改的dns代理配置问题

在nginx服务器上检查了日志,可以看到如下内容:

2024/07/22 11:00:02 [error] 17500#0: *705 upstream timed out (110: Connection timed out) while proxying connection, udp client: 192.168.7.21, server: 0.0.0.0:53, upstream: "192.168.10.120:53
", bytes from/to client:86/43, bytes from/to upstream:43/86
2024/07/22 11:00:04 [error] 17500#0: *1173 upstream timed out (110: Connection timed out) while proxying connection, udp client: 192.168.3.103, server: 0.0.0.0:53, upstream: "192.168.10.122:
53", bytes from/to client:100/176, bytes from/to upstream:176/100
2024/07/22 11:00:08 [error] 17500#0: *1313 upstream timed out (110: Connection timed out) while proxying connection, udp client: 172.23.126.234, server: 0.0.0.0:53, upstream: "192.168.10.121
:53", bytes from/to client:76/232, bytes from/to upstream:232/76
2024/07/22 11:00:08 [error] 17500#0: *38157 no live upstreams while connecting to upstream, udp client: 172.23.127.238, server: 0.0.0.0:53, upstream: "dns", bytes from/to client:37/0, bytes 
from/to upstream:0/0
2024/07/22 11:00:08 [error] 17500#0: *38158 no live upstreams while connecting to upstream, udp client: 192.168.3.103, server: 0.0.0.0:53, upstream: "dns", bytes from/to client:29/0, bytes f
rom/to upstream:0/0
2024/07/22 11:00:08 [error] 17500#0: *38159 no live upstreams while connecting to upstream, udp client: 172.23.126.242, server: 0.0.0.0:53, upstream: "dns", bytes from/to client:41/0, bytes 
from/to upstream:0/0
2024/07/22 11:00:08 [error] 17500#0: *38160 no live upstreams while connecting to upstream, udp client: 172.23.126.254, server: 0.0.0.0:53, upstream: "dns", bytes from/to client:33/0, bytes 
from/to upstream:0/0
2024/07/22 11:00:09 [error] 17500#0: *38161 no live upstreams while connecting to upstream, udp client: 192.168.3.100, server: 0.0.0.0:53, upstream: "dns", bytes from/to client:40/0, bytes f
rom/to upstream:0/0

显然在上次增加了 proxy_responses 1 参数后,经常访问不到后端服务器,连接后端超时,等到全部 upstream 都被判定超时则提示 no live upstream,基本上符合问题现象

接下来反复打开关闭这个参数验证多次,确实发现只要开启参数,过一会儿就会有一大片的 upstream 超时日志

虽然找到了问题,但目前还未找到原因和解决办法 --- 这个参数会导致 upstream 超时,所以只能暂时关闭了

dns真实ip问题

udp的代理是不支持 proxy_protocol 的,所以无法传递真实的 ip 给 upstream,但是了解到可以使用 proxy_responses 0 配合 DSR 来实现传递真实的 ip,后续如果要这样先把 dns 服务器独立出来吧,暂时先这样,参考文档:

https://www.nginx-cn.net/blog/ip-transparency-direct-server-return-nginx-plus-transparent-proxy/
0

评论

博主关闭了当前页面的评论