Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Record can't parse (at least some) unknown types #204

Closed
str4d opened this issue Jul 27, 2024 · 6 comments
Closed

Record can't parse (at least some) unknown types #204

str4d opened this issue Jul 27, 2024 · 6 comments

Comments

@str4d
Copy link
Contributor

str4d commented Jul 27, 2024

For an AtpAgent authenticated to my account,

dbg!(agent
    .api
    .com
    .atproto
    .identity
    .get_recommended_did_credentials()
    .await);

returns:

[src/lib.rs:44:19] agent.api.com.atproto.identity.get_recommended_did_credentials().await = Err(
    SerdeJson(
        Error("data did not match any variant of untagged enum Record", line: 1, column: 323),
    ),
)

The lexicon in question is:

{
  "lexicon": 1,
  "id": "com.atproto.identity.getRecommendedDidCredentials",
  "defs": {
    "main": {
      "type": "query",
      "description": "Describe the credentials that should be included in the DID doc of an account that is migrating to this service.",
      "output": {
        "encoding": "application/json",
        "schema": {
          "type": "object",
          "properties": {
            "rotationKeys": {
              "description": "Recommended rotation keys for PLC dids. Should be undefined (or ignored) for did:webs.",
              "type": "array",
              "items": { "type": "string" }
            },
            "alsoKnownAs": {
              "type": "array",
              "items": { "type": "string" }
            },
            "verificationMethods": { "type": "unknown" },
            "services": { "type": "unknown" }
          }
        }
      }
    }
  }
}

In both cases, the unknown type ends up being a map.

Record looks like it is intended to support parsing data with "type": "unknown", but that parsing is failing:

#[derive(serde::Serialize, serde::Deserialize, Debug, Clone, PartialEq, Eq)]
#[serde(untagged)]
pub enum Record {
Known(KnownRecord),
Unknown(crate::types::UnknownData),
}

/// The data of variants represented by a map and include a `$type` field indicating the variant type.
#[derive(serde::Serialize, serde::Deserialize, Debug, Clone, PartialEq, Eq)]
pub struct UnknownData {
#[serde(rename = "$type")]
pub r#type: String,
#[serde(flatten)]
pub data: Ipld,
}

If I make the following change to bypass Record and UnknownData, going straight to Ipld:

diff --git a/atrium-api/src/com/atproto/identity/get_recommended_did_credentials.rs b/atrium-api/src/com/atproto/identity/get_recommended_did_credentials.rs
index dbbef5a..c611dfc 100644
--- a/atrium-api/src/com/atproto/identity/get_recommended_did_credentials.rs
+++ b/atrium-api/src/com/atproto/identity/get_recommended_did_credentials.rs
@@ -10,9 +10,9 @@ pub struct OutputData {
     #[serde(skip_serializing_if = "Option::is_none")]
     pub rotation_keys: Option<Vec<String>>,
     #[serde(skip_serializing_if = "Option::is_none")]
-    pub services: Option<crate::records::Record>,
+    pub services: Option<ipld_core::ipld::Ipld>,
     #[serde(skip_serializing_if = "Option::is_none")]
-    pub verification_methods: Option<crate::records::Record>,
+    pub verification_methods: Option<ipld_core::ipld::Ipld>,
 }
 pub type Output = crate::types::Object<OutputData>;
 #[derive(serde::Serialize, serde::Deserialize, Debug, Clone, PartialEq, Eq)]

then parsing works, and the dbg! instead returns:

[src/lib.rs:44:19] agent.api.com.atproto.identity.get_recommended_did_credentials().await = Ok(
    Object {
        data: OutputData {
            also_known_as: Some(
                [
                    "at://str4d.xyz",
                ],
            ),
            rotation_keys: Some(
                [
                    "did:key:...",
                ],
            ),
            services: Some(
                Map({
                    "atproto_pds": Map({
                        "endpoint": String("..."),
                        "type": String("AtprotoPersonalDataServer"),
                    }),
                }),
            ),
            verification_methods: Some(
                Map({
                    "atproto": String("did:key:..."),
                }),
            ),
        },
        extra_data: Map({}),
    },
)
@str4d
Copy link
Contributor Author

str4d commented Jul 27, 2024

This alternative change also works:

diff --git a/atrium-api/src/types.rs b/atrium-api/src/types.rs
index 73d1d2b..4c04387 100644
--- a/atrium-api/src/types.rs
+++ b/atrium-api/src/types.rs
@@ -131,8 +131,8 @@ pub enum Union<T> {
 /// The data of variants represented by a map and include a `$type` field indicating the variant type.
 #[derive(serde::Serialize, serde::Deserialize, Debug, Clone, PartialEq, Eq)]
 pub struct UnknownData {
-    #[serde(rename = "$type")]
-    pub r#type: String,
+    // #[serde(rename = "$type")]
+    // pub r#type: String,
     #[serde(flatten)]
     pub data: Ipld,
 }

So the problem is that, for the returned data for these particular Lexicon records, there is no $type field, which is causing UnknownData to not match.

@str4d
Copy link
Contributor Author

str4d commented Jul 27, 2024

Aha, I think I see the problem: UnknownData is also used for the unknown data in a union, which is required to have a $type field:

A union schema definition with no refs is allowed and similar to unknown, as long as the closed flag is false (the default). An empty refs list with closed set to true is an invalid schema.

The schema definitions pointed to by a union are generally objects or types with a clear mapping to an object, like a record. All the variants must be represented by a CBOR map (or JSON Object) and include a $type field indicating the variant type.

By contrast, unknown has no such requirement:

Indicates than any data could appear at this location, with no specific validation. Note that the data must still be valid under the data model: it can't contain unsupported things like floats.

No type-specific fields.

So I think the fix is that here in codegen:

fn unknown_type(unknown: &LexUnknown, name: Option<&str>) -> Result<(TokenStream, TokenStream)> {
let description = description(&unknown.description);
if name == Some("didDoc") {
Ok((description, quote!(crate::did_doc::DidDocument)))
} else {
Ok((description, quote!(crate::records::Record)))
}
}

we need to use Ipld as the fallback for an unknown field type, instead of Record (which was picked in a5b7ab8). More specifically, we need to use a type that wraps Ipld and enforces the ATProto data model.

str4d added a commit to str4d/atrium that referenced this issue Jul 27, 2024
str4d added a commit to str4d/atrium that referenced this issue Jul 27, 2024
str4d added a commit to str4d/atrium that referenced this issue Jul 27, 2024
@sugyan
Copy link
Owner

sugyan commented Jul 28, 2024

Thank you for reporting the issue!

In the early stages of implementation, I wanted to treat only the parts that would be considered to have a record as Record anyway, and then I wanted to map didDoc easily, so I made unknown a messy codegen like it is now. So I thought I'd fix it if it caused any problems... But I was only using the app.bsky related parts myself and wasn't aware of the problem.

If possible, I would like to be able to handle the type Unknown correctly without switching by field name.
Let's also wait for responses about guidance in the official repository.

@str4d
Copy link
Contributor Author

str4d commented Jul 28, 2024

If possible, I would like to be able to handle the type Unknown correctly without switching by field name.

The sense I get from their current docs and issue comments is that for unknown fields, what they want is for applications to do no up-front parsing, and instead parse the outer Lexicon that contains the unknown fields, figure out the context from it, and then parse the unknown fields with that context in mind. (I'm poking around their TypeScript SDK to try and confirm this.)

That doesn't really work well with the current atrium-api / atrium-codegen approach; I think what we'd need to support that is:

  • A new Unknown type that wraps Ipld and enforces the ATProto data model (like I've added in Introduce atrium_api::types::Unknown for unknown fields #205).
  • One or more new traits that enable users to easily try to parse an unknown field as a particular kind of record (similar to impl TryFrom<Unknown> for Record but with better code UX than a bunch of .try_into()s scattered all over the place).

The "try to parse Unknown as a given schema" trait would also help with open unions in which we encounter third-party Lexicons: atrium-api wouldn't know what to do, but a downstream crate user could provide an implementation of the trait that converts the union's data into the type they know it to be.

@sugyan
Copy link
Owner

sugyan commented Aug 1, 2024

I tried to implement Unknown as an enum with some variants. Then added some tests.

Basically, an object (with or without $type) or null is assumed, and other primitive values can be accepted as DataModel as Other, considering the fact that the specification has not been finalized yet.

What do you think of this?

main...feature/unknown#diff-06f2b557198de0565c671d779f3873d644f26d182f530bff3ef539caae0bd824R152-R156

@sugyan
Copy link
Owner

sugyan commented Aug 12, 2024

I've merged #209, which inherited your #205 fix, to be able to handle Unknown types.
There are still some issues that need to wait for ipld-core to be updated, but for now, parse failures should be resolved and it should work now.
Please submit an issue if you are still having problems.

@sugyan sugyan closed this as completed Aug 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants