Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v1.14] Cherry picks #554

Merged
merged 16 commits into from
Apr 5, 2024
Merged

Conversation

jaredledvina
Copy link
Member

@jaredledvina jaredledvina commented Apr 1, 2024

This PR cherry-picks the following commits into our fork of v1.14.9 from upstream:

YutaroHayakawa and others added 7 commits April 1, 2024 16:52
ENI CNI uses the 7th bit (0x80/0x80) of the mark for PBR. Cilium also
uses 7th bit to carry the upper most bit of the security identity in
some cases (e.g. Pod to Pod communication on the same node). ENI CNI
uses CONNMARK iptables target to record the flow came from the external
network and restores it in the reply path.

When ENI "restores" the mark for the new connection (connmark == 0), it
zeros the 7th bit of the mark and changes the identity. This becomes the
problem when Cilium is carrying the identity and the upper most bit is 1.
The upper most bit becomes 1 when the ClusterID of the source endpoint is
128-255 since we are encoding ClusterID into the upper most 8bits of the
identity.

There are two possible iptables rules which causes this error.

1. The rule in the CILIUM_PRE_mangle chain (managed by us, introduced in cilium#12770)

```
-A CILIUM_PRE_mangle -i lxc+ -m comment --comment "cilium: primary ENI" -j CONNMARK --restore-mark --nfmask 0x80 --ctmask 0x80
```

2. The rule in the PREROUTING chain of nat table (managed by AWS)

```
-A PREROUTING -m comment --comment "AWS, CONNMARK" -j CONNMARK --restore-mark --nfmask 0x80 --ctmask 0x80
```

There are three possible setup affected by this issue

1. Cilium is running with ENI mode after uninstalling AWS VPC CNI (e.g. create EKS cluster and install Cilium after that)
2. Cilium is running with AWS VPC CNI with chaining mode
3. Cilium is running with ENI mode without AWS VPC CNI from the beginning (e.g. self-hosted k8s cluster on EC2 hosts)

In setup 3, we can fix the issue by only modifying rule 1. This is what this
commit focuses on. It can be resolved by adding the matching rule that don't
restore the connmark when we are carrying the mark. We can check if the
MARK_MAGIC_IDENTITY is set.

```
-A CILIUM_PRE_mangle -i lxc+ -m comment --comment "cilium: primary ENI" -m mark ! --mark 0x0F00/0x0F00 -j CONNMARK --restore-mark --nfmask 0x80 --ctmask 0x80
```

Corresponding IP rule to lookup the main routing table based on mark value of 0x80 has also been updated to account for this exclusion.

Co-developed-by: Hemanth Malla <[email protected]>
Signed-off-by: Hemanth Malla <[email protected]>
Signed-off-by: Yutaro Hayakawa <[email protected]>
Suggested-by: Eric Mountain <[email protected]>
Currently we're not capturing response status and duration for some of
the calls made to Azure. This commit calls ObserveAPICall() with status
and duration for the operations missing them.

Signed-off-by: Hemanth Malla <[email protected]>
…ions

This commit introduces three new configuration flags for the Cilium agent, allowing users to choose the bpf event types they want to expose to Cilium monitor and Hubble.
 - `--bpf-events-drop-enabled`                                   Expose 'drop' events for Cilium monitor and/or Hubble (default true)
 - `--bpf-events-policy-verdict-enabled`                         Expose 'policy verdict' events for Cilium monitor and/or Hubble (default true)
 - `--bpf-events-trace-enabled`                                  Expose 'trace' events for Cilium monitor and/or Hubble (default true)

The default values for these flags remain set to `true`, not changing the current behaviour.

In our case, we found particularly useful to disable the TraceNotification in order to reduce the CPU overhead on some of our nodes when Hubble is enabled as we were mostly interested into dropped packets.

Signed-off-by: Maxime Visonneau <[email protected]>
Move the cilium-envoy image reference to an ARG so that it can be overridden
via build-args. This allows overrides using mirrored images for instance and
is useful in build environments without Internet access as well as avoiding
API throttling on public registries.

Signed-off-by: Eric Mountain <[email protected]>
Provides debug symbols and ensures `release` and `debug` binaries are from the same build - so symbol files always match.

- If the build is invoked with `NOSTRIP=0`, then the `release` image has stripped binaries and the `debug` image has stripped binaries + debug symbol files in `/usr/lib/debug`.
- If the build is invoked with `NOSTRIP=1`, then the `release` image has non-stripped binaries and the `debug` image has non-stripped binaries + debug symbol files in `/usr/lib/debug`.

The latter is somewhat redundant in the `debug` image since the debug symbols are present both in the binaires and in `/usr/lib/debug`. I'm hoping this will be acceptable upstream.

Part of the idea is that we should no longer need to touch `NOSTRIP`. In our current builds, the problem is that both our `release` and `debug` builds are lacking symbols by default. To get builds with symbols, we need to go and hack `.gitlab-ci.yml` and flip `NOSTRIP`, branch, tag and push. We can’t use our default `debug` images because they have no symbols, so delve doesn’t understand anything about the binary it’s running.

With this change, all tagged builds deliver a `release` image that is either stripped or not according to `NOSTRIP` (so stripped with our default `.gitlab-ci.yml`), and a `debug` image that includes the symbol files.
So the `debug` image works: delve understands the binary, you can do remote debugging from your IDE with port-forward etc.

Also, by systematically delivering the debug symbol files in the `debug` image in `/usr/lib/debug`, it makes it easy to split them out for post-processing. If, for instance, I didn’t include the debug files in the non-stripped version of the `debug` image, then it would be harder to identify the files that should be post-processed.

So I think this strikes a good balance in terms of making it easy to know where to find debug symbols and something I hope upstream should be OK with (it improves usability of the `debug` image, since currently if `NOSTRIP=0`, then the `debug` image has no symbols and it’s useless with Delve and Co).
Dockerfile changes split out of previousl change 'Setup Gitlab CI' (#516)
Signed-off-by: Jared Ledvina <[email protected]>
Copy link
Member

@EricMountain EricMountain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mustn't squash this one on merge! :)

squeed added 4 commits April 5, 2024 13:50
This adds a new field, EnableDefaultDeny, to CiliumNetworkPolicy and
CiliumClusterwideNetworkPolicy, that controls whether or not the subject
endpoints of this policy should drop unselected peer traffic.

By default, endpoints are in a default-allow mode. When the first policy
applies to an endpoint, it flips it to a default-deny mode. This option
allows disabling that behavior. If not specified, the existing behavior
remains: for each direction, traffic is allowed unless at least one
policy rule applies to that endpoint. If multiple policies select an
endpoint, then default-deny takes precedence.

It is useful in heterogeneous environments, it may not be desirable to
implicitly drop all non-matching traffic. Consider, for example, the
case where an administrator wishes to ensure a monitoring service can
access all namespaces. If they create a cluster-wide policy allowing
access from the monitoring service, it may create a deny policy where
none was previously; unexpectedly dropping traffic.

See: https://github.com/cilium/design-cfps/blob/main/cilium/CFP-30572-non-default-deny-policies.md

This commit contains only the API changes; a subsequent commit will
introduce the implementation.

Signed-off-by: Casey Callendrello <[email protected]>
Methods called by Sanitize() may alter the underlying structures. For
example. selectors are aggregated and address families are upper-cased.
However, top-level fields can't be written by Sanitize. In the future,
we'd like to do that. So, give it a pointer receiver.

Signed-off-by: Casey Callendrello <[email protected]>
This adjust the policy rule generation to take in to account
non-default-deny policies.

As before, an endpoint is normally in a policy-disabled state. If any
policies select this endpoint, then policy is enabled and all
non-allowed traffic is dropped.

If, however, an endpoint is only selected by default-allow policies,
then policy is enabled, but a special wildcard allow policy is inserted.
Since wildcard polcies have very low precedence, this ensures that any
Deny or L7-proxy rules will still take effect.

This commit also fixes tests that incorrectly failed to sanitize rules
before adding to the policy repository, leading to a nil pointer
exception. Production code *always* sanitizes rules before adding.

Signed-off-by: Casey Callendrello <[email protected]>
Add some tests that create various mixtures of default-allow and
default-deny policies. It is important that default-deny policies always
take precedence over default-allow. It is also important that Deny rules
take precedence over default-allow.

These need to be integration tests, since they rely on specific
interactions between the userspace and bpf policy engines.

Signed-off-by: Casey Callendrello <[email protected]>
@jaredledvina jaredledvina merged commit 8bec70c into v1.14-dd Apr 5, 2024
22 of 42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants