feat: auto-detect proxy and translate peer addresses to contact point#153
feat: auto-detect proxy and translate peer addresses to contact point#153
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #153 +/- ##
==========================================
+ Coverage 55.93% 56.10% +0.16%
==========================================
Files 21 22 +1
Lines 4834 4852 +18
==========================================
+ Hits 2704 2722 +18
Misses 2130 2130
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
CI SummaryStatus: ✅ All jobs passed ✅ Rustfmt — passedNo issues.✅ Clippy — passedNo issues.✅ Tests — passedNo issues.✅ Build — passedNo issues.🤖 Generated by CI Summary • Full logs |
44508e7 to
b2b55d7
Compare
|
@dkropachev @Lorak-mmk is the right way to use the rust driver ? equivalent to scylladb/python-driver#833 |
When connecting through a proxy/load balancer (e.g., AWS NLB, PrivateLink), the driver discovers internal node IPs from system.peers that are unreachable from the client. This installs a ProxyAddressTranslator that redirects all peer connections to the original contact point address. Since known_node addresses are never translated by the scylla driver (only peer addresses from system.peers are), this is safe for both direct and proxy connections.
b2b55d7 to
2e38ceb
Compare
| /// An [`AddressTranslator`] that redirects all peer connections to the original | ||
| /// contact point address. Used when the cluster is accessed through a proxy. | ||
| /// | ||
| /// All discovered node addresses are translated to `proxy_address`, ensuring | ||
| /// the driver only connects through the proxy endpoint. | ||
| #[derive(Debug, Clone)] | ||
| pub struct ProxyAddressTranslator { | ||
| /// The proxy/contact point address to route all connections through. | ||
| proxy_address: SocketAddr, | ||
| } | ||
|
|
||
| impl ProxyAddressTranslator { | ||
| /// Create a new translator that routes all connections to `proxy_address`. | ||
| pub fn new(proxy_address: SocketAddr) -> Self { | ||
| Self { proxy_address } | ||
| } | ||
|
|
||
| /// Returns the proxy address this translator routes to. | ||
| pub fn proxy_address(&self) -> SocketAddr { | ||
| self.proxy_address | ||
| } | ||
| } | ||
|
|
||
| #[async_trait] | ||
| impl AddressTranslator for ProxyAddressTranslator { | ||
| async fn translate_address( | ||
| &self, | ||
| _untranslated_peer: &UntranslatedPeer, | ||
| ) -> Result<SocketAddr, TranslationError> { | ||
| Ok(self.proxy_address) | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
I'm not sure if this achieves what you want. I assume the proxy address leads to one specific node.
Let's say you have 5 nodes in the cluster, each with 32 shards. With your changes, the driver will still see all 5 nodes in system.peers/local, try to open connection pools to all 5 nodes. The address translation will cause all connections to be opened to the same node (driver won't even know about it), so you'll get 160 connections to this node.
You could use PoolSize::PerNode(1) (which is a good idea in cqlsh regardless of all other changes) to get this down to 5.
Then the connection amount problem is not that bad, but you are still in a very weird state where driver thinks it opened pools to all nodes, but they are really all to one node. Will this work correctly? It may, I'm not completely sure.
TBH I don't know how to solve that will existing APIs. There isn't really a way to implement a HostFilter that would filter out other nodes, because HostFilter accepts Peer, which has an address fetched from system.peers or system.local - so you can't really say for sure if its the same peer as the contact point.
What would be nice here is a simplified session, with a separate builder, where driver only opens a single connection to the given address, and uses it both as CC and to execute user requests. This would also work for the maintenance socket. cc @wprzytula - let's discuss this when we meet, there are some not obvious decisions when implementing such session.
There was a problem hiding this comment.
Agreed on the connection count concern. Added PoolSize::PerHost(1) — cqlsh is a single-user interactive tool so one connection per node is plenty. This brings it down to N connections (one per discovered node) all going through the proxy.
Re: the simplified single-connection session — that would be ideal. For now this is a pragmatic workaround. Happy to migrate when that API exists.
There was a problem hiding this comment.
Makes sense. This is a weird state for the driver to be in, but I think it should work. @wprzytula will be available tomorrow if you want him to also take a look (maybe I am missing some potential problem).
| Ok(ScyllaDriver { | ||
| session, | ||
| prepared_cache: Mutex::new(HashMap::new()), | ||
| consistency: Mutex::new(Consistency::One), |
There was a problem hiding this comment.
Have you considered using our CachingSession instead of implementing cache yourself?
There was a problem hiding this comment.
first time I'm hearing about CachingSession, I don't think python have the equivalent, we'll check it out
There was a problem hiding this comment.
Opened #164 to track this. The current CqlDriver trait separates prepare/execute-by-id, so it's not a trivial swap — but worth doing as a follow-up refactor.
- Resolve DNS hostnames via tokio::net::lookup_host instead of addr.parse::<SocketAddr>() which silently fails for domain names - Add PoolSize::PerHost(1) to limit connections per node (cqlsh is single-user; also mitigates connection explosion through proxy) - Remove unused detect_proxy() function and proxy_address() getter - Trim module docs to match actual always-install strategy Refs #164
Summary
ProxyAddressTranslatorthat redirects all peer connections to the original contact point, enabling connections through proxies/load balancers (AWS NLB, PrivateLink, etc.)Problem
When connecting through a proxy, the driver discovers internal node IPs from
system.peersthat are unreachable from the client, causingConnection error: No connections in the pool.Solution
The scylla-rust-driver's
AddressTranslatortrait translates peer addresses discovered fromsystem.peers. Sinceknown_nodeaddresses (the contact point) are never translated, we can safely always install a translator that redirects all peer addresses to the contact point.Equivalent to scylladb/python-driver#833 (DynamicWhiteListRoundRobinPolicy) but using the rust driver's native
AddressTranslatormechanism.Testing
cargo test --lib proxy_address_translator— 5 unit tests18.208.144.200