Skip to content

feat: auto-detect proxy and translate peer addresses to contact point#153

Open
fruch wants to merge 2 commits intomainfrom
feat/proxy-auto-detect
Open

feat: auto-detect proxy and translate peer addresses to contact point#153
fruch wants to merge 2 commits intomainfrom
feat/proxy-auto-detect

Conversation

@fruch
Copy link
Copy Markdown
Collaborator

@fruch fruch commented May 4, 2026

Summary

  • Adds ProxyAddressTranslator that redirects all peer connections to the original contact point, enabling connections through proxies/load balancers (AWS NLB, PrivateLink, etc.)
  • Automatically installed on every connection — no user configuration needed
  • Includes 5 unit tests and 3 integration tests (including a socat-based proxy simulation)

Problem

When connecting through a proxy, the driver discovers internal node IPs from system.peers that are unreachable from the client, causing Connection error: No connections in the pool.

Solution

The scylla-rust-driver's AddressTranslator trait translates peer addresses discovered from system.peers. Since known_node addresses (the contact point) are never translated, we can safely always install a translator that redirects all peer addresses to the contact point.

Equivalent to scylladb/python-driver#833 (DynamicWhiteListRoundRobinPolicy) but using the rust driver's native AddressTranslator mechanism.

Testing

  • cargo test --lib proxy_address_translator — 5 unit tests
  • Integration tests with socat TCP proxy
  • Manually verified against proxy endpoint 18.208.144.200

@codecov
Copy link
Copy Markdown

codecov Bot commented May 4, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 56.10%. Comparing base (83dde53) to head (76bf605).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #153      +/-   ##
==========================================
+ Coverage   55.93%   56.10%   +0.16%     
==========================================
  Files          21       22       +1     
  Lines        4834     4852      +18     
==========================================
+ Hits         2704     2722      +18     
  Misses       2130     2130              
Flag Coverage Δ
integration-cassandra-4.1 18.86% <100.00%> (+0.09%) ⬆️
integration-cassandra-5.0 18.86% <100.00%> (+0.09%) ⬆️
integration-scylladb-2025.1 21.25% <100.00%> (+0.08%) ⬆️
integration-scylladb-2026.1 21.25% <100.00%> (+0.08%) ⬆️
unittests 43.65% <9.09%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

CI Summary

Status: ✅ All jobs passed

Rustfmt — passed No issues.
Clippy — passed No issues.
Tests — passed No issues.
Build — passed No issues.

🤖 Generated by CI Summary • Full logs

@fruch fruch force-pushed the feat/proxy-auto-detect branch from 44508e7 to b2b55d7 Compare May 4, 2026 18:44
@fruch fruch requested review from Lorak-mmk and dkropachev May 4, 2026 20:16
@fruch
Copy link
Copy Markdown
Collaborator Author

fruch commented May 4, 2026

@dkropachev @Lorak-mmk is the right way to use the rust driver ? equivalent to scylladb/python-driver#833

When connecting through a proxy/load balancer (e.g., AWS NLB, PrivateLink),
the driver discovers internal node IPs from system.peers that are unreachable
from the client. This installs a ProxyAddressTranslator that redirects all
peer connections to the original contact point address.

Since known_node addresses are never translated by the scylla driver (only
peer addresses from system.peers are), this is safe for both direct and
proxy connections.
@fruch fruch force-pushed the feat/proxy-auto-detect branch from b2b55d7 to 2e38ceb Compare May 5, 2026 06:55
Comment thread src/driver/proxy_address_translator.rs Outdated
Comment thread src/driver/scylla_driver.rs
Comment on lines +19 to +51
/// An [`AddressTranslator`] that redirects all peer connections to the original
/// contact point address. Used when the cluster is accessed through a proxy.
///
/// All discovered node addresses are translated to `proxy_address`, ensuring
/// the driver only connects through the proxy endpoint.
#[derive(Debug, Clone)]
pub struct ProxyAddressTranslator {
/// The proxy/contact point address to route all connections through.
proxy_address: SocketAddr,
}

impl ProxyAddressTranslator {
/// Create a new translator that routes all connections to `proxy_address`.
pub fn new(proxy_address: SocketAddr) -> Self {
Self { proxy_address }
}

/// Returns the proxy address this translator routes to.
pub fn proxy_address(&self) -> SocketAddr {
self.proxy_address
}
}

#[async_trait]
impl AddressTranslator for ProxyAddressTranslator {
async fn translate_address(
&self,
_untranslated_peer: &UntranslatedPeer,
) -> Result<SocketAddr, TranslationError> {
Ok(self.proxy_address)
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this achieves what you want. I assume the proxy address leads to one specific node.
Let's say you have 5 nodes in the cluster, each with 32 shards. With your changes, the driver will still see all 5 nodes in system.peers/local, try to open connection pools to all 5 nodes. The address translation will cause all connections to be opened to the same node (driver won't even know about it), so you'll get 160 connections to this node.
You could use PoolSize::PerNode(1) (which is a good idea in cqlsh regardless of all other changes) to get this down to 5.
Then the connection amount problem is not that bad, but you are still in a very weird state where driver thinks it opened pools to all nodes, but they are really all to one node. Will this work correctly? It may, I'm not completely sure.
TBH I don't know how to solve that will existing APIs. There isn't really a way to implement a HostFilter that would filter out other nodes, because HostFilter accepts Peer, which has an address fetched from system.peers or system.local - so you can't really say for sure if its the same peer as the contact point.

What would be nice here is a simplified session, with a separate builder, where driver only opens a single connection to the given address, and uses it both as CC and to execute user requests. This would also work for the maintenance socket. cc @wprzytula - let's discuss this when we meet, there are some not obvious decisions when implementing such session.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on the connection count concern. Added PoolSize::PerHost(1) — cqlsh is a single-user interactive tool so one connection per node is plenty. This brings it down to N connections (one per discovered node) all going through the proxy.

Re: the simplified single-connection session — that would be ideal. For now this is a pragmatic workaround. Happy to migrate when that API exists.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. This is a weird state for the driver to be in, but I think it should work. @wprzytula will be available tomorrow if you want him to also take a look (maybe I am missing some potential problem).

Comment on lines 467 to 470
Ok(ScyllaDriver {
session,
prepared_cache: Mutex::new(HashMap::new()),
consistency: Mutex::new(Consistency::One),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered using our CachingSession instead of implementing cache yourself?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first time I'm hearing about CachingSession, I don't think python have the equivalent, we'll check it out

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #164 to track this. The current CqlDriver trait separates prepare/execute-by-id, so it's not a trivial swap — but worth doing as a follow-up refactor.

- Resolve DNS hostnames via tokio::net::lookup_host instead of
  addr.parse::<SocketAddr>() which silently fails for domain names
- Add PoolSize::PerHost(1) to limit connections per node (cqlsh is
  single-user; also mitigates connection explosion through proxy)
- Remove unused detect_proxy() function and proxy_address() getter
- Trim module docs to match actual always-install strategy

Refs #164
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants