Skip to content

perf(config): replace issuer_dn() -> String with issuer_contains(&[u8]) -> bool#154

Open
pbeza wants to merge 1 commit intoPhala-Network:masterfrom
pbeza:perf/issuer-contains
Open

perf(config): replace issuer_dn() -> String with issuer_contains(&[u8]) -> bool#154
pbeza wants to merge 1 commit intoPhala-Network:masterfrom
pbeza:perf/issuer-contains

Conversation

@pbeza
Copy link
Copy Markdown
Contributor

@pbeza pbeza commented Apr 23, 2026

Summary

Replace ParsedCert::issuer_dn() -> Result<String> on the pluggable-backend trait with issuer_contains(&[u8]) -> bool. The two in-tree callers (intel::pck_ca_with, collateral::extract_fmspc_and_ca_with) only substring-test the DN for the ASCII literals "Processor" / "Platform", so rendering a full RFC 4514 string is wasteful — and on the audited X509CertBackend it drags in x509-cert's Display for AttributeTypeAndValue, which transitively reaches const_oid::db::DB.shortest_name_by_oid(...) and keeps the RFC OID-name database (~1959 lines of static ObjectIdentifier entries in const-oid-0.9.6/src/db/gen.rs) alive on callers that only want a yes/no substring check.

The new impl iterates the parsed RdnSequence directly and does a byte-substring match on each ATV whose tag is one of the RFC 5280 DirectoryString / IA5String byte-identical string choices (PrintableString, UTF8String, IA5String, TeletexString). Never materializes a String; never touches const_oid::db.

Treatment of BMPString / UniversalString

RFC 5280 DirectoryString also permits BMPString (UTF-16BE) and UniversalString (UTF-32BE); these are intentionally skipped. Two reasons:

  1. Semantic fidelity — the previous issuer_dn().to_string().contains(needle) form did not match ASCII needles against those encodings either. x509-cert's Display for AttributeTypeAndValue falls through to CN=#xxxx... hex encoding for non-byte-string ATVs, and an ASCII "Processor" never appears inside hex digits. So the new impl is strictly equivalent to the old for ASCII needles across all tag types.
  2. Production-path irrelevance — Intel PCK issuer CNs are UTF8String, so the skipped wide-char tags are not reachable on the production classifier path.

Trait docs explicitly call this out and explain why, to answer @copilot's feedback on the earlier revision.

Trait surface

issuer_contains returns plain bool, not Result<bool> — parsing has already happened in X509Codec::from_der, so the accessor is infallible and the Result wrapper was advertising a failure mode no impl could trigger. Callers lose the .context(...)? chain; the PCK-CA classifiers collapse from 15 to 10 lines each.

As a drive-by in the same backend, X509CertBackend::extension now compares ext.extn_id.as_bytes() == oid against the caller-supplied &[u8] directly, instead of re-parsing the needle via const_oid::ObjectIdentifier::from_bytes. Callers pass the body of a const_oid-constructed OID (see src/oids.rs), so re-validation at runtime was pure overhead. ext.extn_value.as_bytes() also replaces a needless .clone() + .into_bytes() round-trip. The ParsedCert::extension trait doc (src/config.rs) now makes the oid contract explicit — callers supply the DER-encoded OID body (typically via const_oid::ObjectIdentifier::as_bytes()), and a malformed oid falls through to Ok(None) by design (vacuously correct: it cannot match any well-formed cert's extn_id, which was DER-validated during from_der).

Breaking change

Breaking for any downstream Config backend. The trait was introduced in v0.4.0, so the blast radius is minimal. Migration:

 impl ParsedCert for MyParsedCert {
-    fn issuer_dn(&self) -> Result<String> {
-        Ok(self.render_rfc4514())
+    fn issuer_contains(&self, needle: &[u8]) -> bool {
+        self.issuer_string_valued_rdn_bytes().any(|b| b.windows(needle.len()).any(|w| w == needle))
     }
 }

Conformance test

tests/config_conformance.rs::assert_config_conforms previously asserted issuer_dn string equality between a custom config and DefaultConfig. Replaced with a parameterized issuer_contains check over four needles covering both truth-table outcomes on the bundled SGX/TDX PCK leaves: Processor, Platform, Intel SGX, NotPresent.

Wasm size impact

Measured on the NEAR MPC contract (wasm32-unknown-unknown, lto = "fat", opt-level = "z", codegen-units = 1, panic = "abort"; post-processed with wasm-opt -O -Oz --strip-debug --strip-producers):

Build bytes KiB
Before 1,429,634 1,396.13
After (this PR head) 1,426,922 1,393.48
delta −2,712 −2.65 KiB

Modest, not dramatic. LLVM with lto=fat + opt-level=z had already inlined Display for RdnSequence / AttributeTypeAndValue::fmt / shortest_name_by_oid into the single call site and DCE'd the unused halves of const_oid::db::DB; ObjectIdentifier::from_bytes + Arcs::try_next are still reachable via the certificate decode path (each cert has multiple OIDs that der::Decode<ObjectIdentifier> parses during from_der), so the extension() drive-by is a runtime/clarity win rather than a size win. What did shrink is the String / Vec<char> / hex::FromHex / iterator-adapter residue that survived the inlining, plus a few small helpers no longer reachable after the clone/into_bytes/filter/map chain became a plain for loop.

Variants evaluated for the duplicate-extension error message

@copilot suggested including the offending OID in the "extension appears more than once" error to aid debugging. Two variants were measured against the current head:

Error-message variant stripped wasm Δ vs head
bail!("extension appears more than once") (current) 1,426,922 baseline
bail!("extension {} appears more than once", ext.extn_id) (Display, dotted) 1,427,002 +80 B
bail!("... {} ...", hex::encode(ext.extn_id.as_bytes())) (hex) 1,427,043 +121 B

Twiggy-diff on the dotted variant pins the +80 B inside X509CertParsed::extension itself — extra format_args! setup and argument-packing machinery. <ObjectIdentifier as Display>::fmt is already alive via a fn-pointer table entry (306 B), so the Display body itself doesn't grow — but synthesizing the formatter descriptor at each call site is not free. The hex variant adds footprint inside hex::encode_to_iter on top of that.

Since the duplicate-extension branch is only reachable on a malformed certificate (Intel-signed PCK chains never produce duplicates), the ergonomics weren't worth a measurable regression in a release-contract wasm that exists specifically to fit a NEAR on-chain size budget. The bail! site carries an inline comment documenting the trade-off and both measured variants, so anyone wanting to revisit this has the numbers in hand. Happy to flip to the dotted form if reviewers prefer, but wanted a conscious default rather than silently paying the 80 B.

Test plan

  • cargo test --features std,ring,default-x509 — all pass (including config_conformance::default_config_conforms_to_itself with the four-needle check).
  • cargo clippy --features std,ring,default-x509 -- -D warnings — clean.
  • cargo fmt --check — clean.
  • Measured wasm delta on the NEAR MPC contract baseline (numbers above).
  • Size-regression gate on the review-round amend — stripped wasm byte-identical to pre-amend head (1,426,922 B).

Copilot AI review requested due to automatic review settings April 23, 2026 13:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the pluggable X.509 parsed-certificate API to avoid rendering issuer Distinguished Names into RFC 4514 strings when callers only need substring checks, reducing allocation and avoiding pulling in extra display/OID-name machinery.

Changes:

  • Replace ParsedCert::issuer_dn() -> Result<String> with ParsedCert::issuer_contains(&[u8]) -> Result<bool>.
  • Update in-tree issuer classification call sites to use issuer_contains and reuse a single parsed certificate.
  • Update the config conformance test to validate issuer_contains behavior across several needles.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/config.rs Changes the ParsedCert trait API and updates its documentation.
src/x509.rs Implements issuer_contains for the audited x509-cert/der backend via RDN/ATV traversal.
src/intel.rs Updates PCK CA classification logic to use issuer_contains on a parsed cert.
src/collateral.rs Updates FMSPC/CA extraction logic to use issuer_contains on the reused parsed cert.
tests/config_conformance.rs Replaces issuer DN string equality with parameterized issuer_contains checks.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/config.rs Outdated
Comment thread src/x509.rs Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes issuer DN classification and extension lookup in the pluggable X.509 backend by replacing a string-rendering accessor (issuer_dn() -> Result<String>) with a byte-substring predicate (issuer_contains(&[u8]) -> bool), avoiding expensive RFC4514 formatting and related transitive dependencies.

Changes:

  • Replace ParsedCert::issuer_dn() with ParsedCert::issuer_contains(&[u8]) -> bool and update in-tree callers to use byte needles.
  • Implement issuer_contains by iterating the parsed issuer RdnSequence and matching against supported string tag encodings without allocating.
  • Optimize X509CertBackend::extension() to compare raw OID body bytes and avoid unnecessary cloning, and update conformance tests accordingly.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/config.rs Updates the ParsedCert trait surface and documents the new issuer_contains semantics.
src/x509.rs Implements issuer_contains and streamlines extension() matching and value extraction.
src/intel.rs Switches PCK CA classification to use issuer_contains (no RFC4514 string rendering).
src/collateral.rs Switches CA-type extraction to use issuer_contains on the already-parsed cert.
tests/config_conformance.rs Updates config conformance assertions to compare issuer_contains behavior across several needles.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/x509.rs
Comment thread src/x509.rs
…[u8]) -> bool` in `ParsedCert`

The sole callers of `issuer_dn()` are `intel::pck_ca_with` and
`collateral::extract_fmspc_and_ca_with`, both of which only substring-test
the DN for the ASCII literals `"Processor"` / `"Platform"`. Returning a full
RFC 4514 rendering is wasteful and, on the audited `X509CertBackend`,
drags in `x509-cert`'s `Display for AttributeTypeAndValue` impl — which
transitively reaches `const_oid::db::DB.shortest_name_by_oid(...)`, keeping
the RFC OID-name database (~1959 lines of static `ObjectIdentifier`
entries in `const-oid-0.9.6/src/db/gen.rs`) alive on builds that only want
a yes/no substring check.

The new accessor iterates the parsed `RdnSequence` directly and does a
byte-substring match on each ATV whose tag is one of the RFC 5280
`DirectoryString` / `IA5String` byte-identical string choices
(`PrintableString`, `UTF8String`, `IA5String`, `TeletexString`). Wide-char
`BMPString` (UTF-16BE) and `UniversalString` (UTF-32BE) are intentionally
skipped — an ASCII needle cannot meaningfully match a UTF-16/-32
encoding, which is also the outcome the former
`issuer_dn().to_string().contains(...)` produced for those tags (x509-cert
hex-encodes non-byte-string ATVs via `CN=#xxxx...`). Non-string components
(e.g. `SET OF OID`) are likewise skipped. Never materializes a `String`;
never touches `const_oid::db`.

## Trait surface

`issuer_contains` returns plain `bool`, not `Result<bool>` — parsing has
already happened in `X509Codec::from_der`, so the accessor is infallible
and the `Result` wrapper was advertising a failure mode no impl could
trigger. Callers lose the `.context(...)?` chain; the PCK-CA classifiers
collapse from 15 to 10 lines each.

As a drive-by in the same backend, `X509CertBackend::extension` now
compares `ext.extn_id.as_bytes() == oid` against the caller-supplied
`&[u8]` directly, instead of re-parsing the needle via
`const_oid::ObjectIdentifier::from_bytes`. Callers pass the body of a
`const_oid`-constructed OID (see `src/oids.rs`), so re-validation at
runtime was pure overhead. `ext.extn_value.as_bytes()` also replaces a
needless `.clone()` + `.into_bytes()` round-trip.

## Breaking change

Breaking for any downstream `Config` backend. The trait was introduced in
v0.4.0, so the blast radius is minimal. Migration:

```diff
 impl ParsedCert for MyParsedCert {
-    fn issuer_dn(&self) -> Result<String> {
-        Ok(self.render_rfc4514())
+    fn issuer_contains(&self, needle: &[u8]) -> bool {
+        self.issuer_string_valued_rdn_bytes().any(|b| b.windows(needle.len()).any(|w| w == needle))
     }
 }
```

## Callers updated

- `intel::pck_ca_with` — two consecutive `issuer_contains` calls matching
  the prior `if / else if` branching. `Config::X509::from_der` result is
  bound once and reused.
- `collateral::extract_fmspc_and_ca_with` — same pattern, reusing the
  already-parsed cert that also drives the FMSPC extension lookup.

## Conformance test

`tests/config_conformance.rs::assert_config_conforms` previously asserted
`issuer_dn` string equality between a custom config and `DefaultConfig`.
Replaced with a parameterized `issuer_contains` check over four needles
covering both truth-table outcomes on the bundled SGX/TDX PCK leaves:
`Processor`, `Platform`, `Intel SGX`, `NotPresent`.

## Wasm size impact

Measured on the [NEAR MPC contract](https://github.com/near/mpc/tree/main/crates/contract)
(`wasm32-unknown-unknown`, `lto = "fat"`, `opt-level = "z"`,
`codegen-units = 1`, `panic = "abort"`; post-processed with
`wasm-opt -O -Oz --strip-debug --strip-producers`):

| Build | bytes | KiB |
|---|---|---|
| Before | 1,429,634 | 1,396.13 |
| After | 1,426,922 | 1,393.48 |
| **delta** | **−2,712** | **−2.65 KiB** |

Modest, not dramatic. LLVM with `lto=fat + opt-level=z` had already
inlined `Display for RdnSequence` / `AttributeTypeAndValue::fmt` /
`shortest_name_by_oid` into the single call site and DCE'd the unused
halves of `const_oid::db::DB`; `ObjectIdentifier::from_bytes` +
`Arcs::try_next` are still reachable via the certificate decode path
(each cert has multiple OIDs that `der::Decode<ObjectIdentifier>` parses
during `from_der`), so the `extension()` drive-by is a runtime/clarity
win rather than a size win. What did shrink is the `String` / `Vec<char>`
/ `hex::FromHex` / iterator-adapter residue that survived the inlining,
plus a few small helpers no longer reachable after the
clone/into_bytes/filter/map chain became a plain for loop.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the pluggable X.509 backend interface to avoid allocating/rendering full issuer DN strings when callers only need substring checks, and streamlines extension lookup by comparing raw OID body bytes.

Changes:

  • Replace ParsedCert::issuer_dn() -> Result<String> with ParsedCert::issuer_contains(&[u8]) -> bool, and update in-tree callers to use it.
  • Implement issuer_contains in the audited X509CertBackend by scanning the parsed RdnSequence without materializing a String.
  • Optimize X509CertBackend::extension to compare OID body bytes directly and avoid unnecessary cloning/parsing; update conformance tests accordingly.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/config.rs Updates the ParsedCert trait surface and documents the new issuer_contains and extension contracts.
src/x509.rs Implements issuer_contains and streamlines extension lookup to avoid extra parsing/allocations.
src/intel.rs Switches PCK CA classification to use issuer_contains instead of rendering the issuer DN string.
src/collateral.rs Switches CA-type extraction to use issuer_contains instead of rendering the issuer DN string.
tests/config_conformance.rs Updates the conformance harness to compare issuer_contains behavior across several needles.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@pbeza
Copy link
Copy Markdown
Contributor Author

pbeza commented Apr 23, 2026

@kvinwang — this is another attempt to slim down our contract. Long term, it would be ideal to have a minimized compilation profile for dcap-qvl, since it’ll likely be used in other contracts as well.

@kvinwang
Copy link
Copy Markdown
Collaborator

kvinwang commented Apr 24, 2026

First off, thanks for the thorough write-up — the const_oid::db removal and the measured wasm numbers are a real win, and the care you took documenting the tag-handling semantics is appreciated.

I'd like to raise a design-level observation on the trait shape itself. The concern is implementer burden and contract clarity. issuer_contains(&self, needle: &[u8]) -> bool is a method whose signature doesn't convey what a correct implementation must do. To know the contract, an implementer has to read the doc comment and internalize answers to questions like:

  • Which RDN components are searched?
  • Which ASN.1 string tags count as "byte-identical" and which are skipped?
  • What encoding is the needle expected to be?
  • What's the behavior for an empty needle?
  • How should this match the Display fallback for non-string ATVs on the default backend?

You've answered all of these carefully — but the length and subtlety of the doc comment needed to pin the contract down is, I think, the real signal. Any future backend author has to faithfully replicate those prose constraints across encodings they may never have thought about, and conformance testing has to enumerate needles to reverse-engineer equivalence (which is exactly what the four-needle check in this PR does).

Proposal for a follow-up: replace issuer_contains with a method whose contract is fully expressed by its signature:

pub enum PckCa { Processor, Platform }

pub trait ParsedCert {
    fn pck_ca(&self) -> Option<PckCa>;
    fn extension(&self, oid: &[u8]) -> Result<Option<Vec<u8>>>;
    // ...
}

What this buys:

  1. The signature is the contract. "Return the issuing CA, or None if unrecognizable." No prose needed to pin down edge cases.
  2. Implementation freedom. Backends classify however they like — RDN walk, precomputed hash, hardcoded lookup, whatever fits their footprint goals — without having to match a specified byte-matching algorithm.

@pbeza
Copy link
Copy Markdown
Contributor Author

pbeza commented Apr 27, 2026

@kvinwang, thank you for the review! I just opened a follow-up PR on top of this one: #158. It was pretty heavily vibe-coded and I only gave it a quick self-review. Don’t have more time to dig into it right now, but if you’ve got some cycles, would appreciate a look. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants