Skip to content

Conversation

Revolyssup
Copy link
Contributor

@Revolyssup Revolyssup commented Sep 5, 2025

Cause of defect
ai-proxy-multi executes resolve_endpoint once before creating health checker:

local node = resolve_endpoint(instance)
, resolve_endpoint was executed again when obtaining the health status of the instance:
local node = resolve_endpoint(ins)

When the results of two dns resolutions change (which is common when domain name resolution contains multiple IPs), the following error will be reported:

2025/08/21 09:38:13 [warn] 154#154: *381457 [lua] ai-proxy-multi.lua:345: fetch_health_instances(): failed to get health check target status, addr: 192.168.117.8:80, host: httpbin.local, err: target not found, client: 192.168.117.1, server: _, request: "POST /anything HTTP/1.1", host: "127.0.0.1:9080", request_id: "d2daae4e3eb88ecff5b3419c2cb38c02”

Repair suggestions
Refer to the parse_domain_in_route function currently in APISIX:

apisix/apisix/init.lua

Lines 242 to 267 in 3260931

local function parse_domain_in_route(route)
local nodes = route.value.upstream.nodes
local new_nodes, err = upstream_util.parse_domain_for_nodes(nodes)
if not new_nodes then
return nil, err
end
local up_conf = route.dns_value and route.dns_value.upstream
local ok = upstream_util.compare_upstream_node(up_conf, new_nodes)
if ok then
return route
end
-- don't modify the modifiedIndex to avoid plugin cache miss because of DNS resolve result
-- has changed
route.dns_value = core.table.deepcopy(route.value)
route.dns_value.upstream.nodes = new_nodes
if not route.dns_value._nodes_ver then
route.dns_value._nodes_ver = 0
end
route.dns_value._nodes_ver = route.dns_value._nodes_ver + 1
core.log.info("parse route which contain domain: ",
core.json.delay_encode(route, true))
return route
end

Record the result of dns resolution in the instance table as _dns_value, and maintain the _nodes_ver variable.
At the same time, when _nodes_ver changes, the health checker needs to be rebuilt.

  • I have explained the need for this PR and the problem it solves
  • I have explained the changes or the new features added to this PR
  • I have added tests corresponding to this change
  • I have updated the documentation to reflect this change
  • I have verified that this change is backward compatible (If not, please discuss on the APISIX mailing list first)

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working labels Sep 5, 2025
membphis
membphis previously approved these changes Sep 18, 2025
@nic-6443 nic-6443 requested a review from AlinsRan September 22, 2025 02:45
@Revolyssup Revolyssup merged commit 0151d9e into apache:master Sep 22, 2025
33 of 34 checks passed
@Revolyssup Revolyssup deleted the revolyssup/inconsistent-resolved-2 branch September 22, 2025 07:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants