Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

localization support: problem message parser #819

Closed
unional opened this issue Jul 6, 2023 · 21 comments
Closed

localization support: problem message parser #819

unional opened this issue Jul 6, 2023 · 21 comments

Comments

@unional
Copy link

unional commented Jul 6, 2023

Request a feature

The maintainers of ArkType will do our best to provide prompt feedback for any feature requests- especially those that are concise and persuasive!

🤷 Motivation

What problem are you having?

The Problem class only expose the formatted message and not the other properties,
meaning the message cannot be localized into other languages.

Why should we prioritize solving it?

Any application supporting i18n probably need this support.

💡 Solution

How do you think we should solve the problem?

There are a few ways to do it.

One way is to simplify the Problem class containing only the data, while there is a overridable problem parser that parse the problem into message.

There are other ways such as accepting a problem factory, exposing the problem data so that application can create their own message parser using the resulting problem instances, etc.

I think the parser approach is probably the best as it properly separate the concerns between problem reporter and message parsing.

@ssalbdivad
Copy link
Member

Thanks, this is definitely useful to keep in mind!

The existing Problems class is an array subclass that does have a .summary prop, but every part of the error message is actually customizable. You can also iterate over each problem individually for more fine-grained information.

Note that this API is likely to change in the upcoming release, but we will aim to support the same level of customization.

Here are some examples from our unit tests. mustBe is used to describe the primary condition, then there's a couple other customization options (this is the best to use because it integrates with union errors to form valid messages automatically). writeReason is one level up and adds the rest of the error, then addContext is the most general and writes the whole message including a path.

Here are a couple simple examples from unit tests:

    it("type config", () => {
        const t = type("true", { mustBe: "unfalse" })
        attest(t(false).problems?.summary).snap("Must be unfalse (was false)")
    })
    it("anonymous type config at path", () => {
        const unfalse = type("true", { mustBe: "unfalse" })
        const t = type({ myKey: unfalse })
        attest(t({ myKey: "500" }).problems?.summary).snap(
            "myKey must be unfalse (was '500')"
        )
        // config only applies within myKey
        attest(t({ yourKey: "500" }).problems?.summary).snap(
            "myKey must be defined"
        )
    })
    it("customized builtin problem", () => {
        const types = scope(
            { isEven: "number%2" },
            {
                codes: {
                    divisor: {
                        mustBe: (divisor) => `a multiple of ${divisor}`,
                        writeReason: (mustBe, was) => `${was} is not ${mustBe}!`
                    }
                }
            }
        ).compile()
        attest(types.isEven(3).problems?.summary).snap(
            "3 is not a multiple of 2!"
        )
    })

Here's our internal default config for each problem type so you can see how the different message parts are composed:

const defaultProblemConfig: {
    [code in ProblemCode]: ProblemDefinition<code>
} = {
    divisor: {
        mustBe: (divisor) =>
            divisor === 1 ? `an integer` : `a multiple of ${divisor}`
    },
    class: {
        mustBe: (expected) => {
            const possibleObjectKind = getExactConstructorObjectKind(expected)
            return possibleObjectKind
                ? objectKindDescriptions[possibleObjectKind]
                : `an instance of ${expected.name}`
        },
        writeReason: (mustBe, data) =>
            writeDefaultReason(mustBe, data.className)
    },
    domain: {
        mustBe: (domain) => domainDescriptions[domain],
        writeReason: (mustBe, data) => writeDefaultReason(mustBe, data.domain)
    },
    missing: {
        mustBe: () => "defined",
        writeReason: (mustBe) => writeDefaultReason(mustBe, "")
    },
    extraneous: {
        mustBe: () => "removed",
        writeReason: (mustBe) => writeDefaultReason(mustBe, "")
    },
    bound: {
        mustBe: (bound) =>
            `${Scanner.comparatorDescriptions[bound.comparator]} ${
                bound.limit
            }${bound.units ? ` ${bound.units}` : ""}`,
        writeReason: (mustBe, data) =>
            writeDefaultReason(mustBe, `${data.size}`)
    },
    regex: {
        mustBe: (expression) => `a string matching ${expression}`
    },
    value: {
        mustBe: stringify
    },
    branches: {
        mustBe: (branchProblems) =>
            describeBranches(
                branchProblems.map(
                    (problem) =>
                        `${problem.path} must be ${
                            problem.parts
                                ? describeBranches(
                                      problem.parts.map((part) => part.mustBe)
                                  )
                                : problem.mustBe
                        }`
                )
            ),
        writeReason: (mustBe, data) => `${mustBe} (was ${data})`,
        addContext: (reason, path) =>
            path.length ? `At ${path}, ${reason}` : reason
    },
    multi: {
        mustBe: (problems) => "• " + problems.map((_) => _.mustBe).join("\n• "),
        writeReason: (mustBe, data) => `${data} must be...\n${mustBe}`,
        addContext: (reason, path) =>
            path.length ? `At ${path}, ${reason}` : reason
    },
    custom: {
        mustBe: (mustBe) => mustBe
    },
    cases: {
        mustBe: (cases) => describeBranches(cases)
    }
}

Again, this will likely change soon, so if you do have feedback it would be super useful to get now!

@unional
Copy link
Author

unional commented Jul 6, 2023

👍

One concern about i18n support is that the tooling typically behaves in certain way that does not support element composition, similar to tailwindcss.

e.g., formatjs does it the same way as tailwindcss that it parse the source code to extract the messages (in tailwindcs the class name is not composable, you can't do b-${var}, for example).

i.e. it looks at the AST for:

intl.formatMessage({
  defaultMessage: "some message for {name}"
}, {
  name: 'field name'
})

So changing individual element like mustBe would not work for those tools.

@unional
Copy link
Author

unional commented Jul 6, 2023

Also another thing about i18n is that element composition generally does not work, even without the parsing AST issue mentioned above.

The reason is that different language have different grammatical order (I can't remember the exact term for this, as I'm not a linguist 😛).

So it would be the best to only keep the data and let the message parser to compose the message however it wants.

@ssalbdivad
Copy link
Member

Hmm okay, I'll have go dig into this a bit since I haven't worked with these tools.

Currently the input type for those options is:

export type MustBeWriter<code extends ProblemCode> =
    | string
    | ((source: ProblemSources[code]) => string)

export type ReasonWriter<code extends ProblemCode = ProblemCode> = (
    mustBe: string,
    data: DataWrapper<
        code extends keyof ConstrainedRuleTraversalData
            ? ConstrainedRuleTraversalData[code]
            : unknown
    >
) => string

export type ContextWriter = (reason: string, path: Path) => string

If ReasonWriter and ContextWriter also accepted string, could that be used to configure it the way you hope? It is a bit of a footgun for those who define a mustBe then don't use it in addContext or only define an addContext, which would mean that if the error occurred as part of a union it would still have its original unconfigured mustBe message.

@ssalbdivad
Copy link
Member

ssalbdivad commented Jul 6, 2023

So it would be the best to only keep the data and let the message parser to compose the message however it wants.

To be clear, the data is still there. If you iterate over the array of problems, each one will have these props:

  • code: The kind of problem, e.g. "range"
  • path: Path at which the problem occurred
  • data: The exact value associated with the problem
  • source: More detailed information about the problem based on its code

Could you just use this problems object to create the messages you need?

The reason these other options exist is that writing a good error message for a union is very complex, so we want to give people the option to avoid it by relying on our solutions (or integrating with them). It's unfortunate that wouldn't be compatible with some of the tools you mentioned, but if you're just wanting the raw data describing the nature of the problem, that is already there, so you can do whatever you want with it.

@unional
Copy link
Author

unional commented Jul 6, 2023

I think the data and source are private atm: https://github.com/arktypeio/arktype/blob/main/src/traverse/problems.ts#L42-L43

Yes, with that, it will support the exposing the problem data so that application can create their own message parser using the resulting problem instances scenario.

i.e. something like:

export function Component() {
  return <SomeField validate={(v, fields) => {
     const { problems } = someFieldValidator(v)
     if (problems.length > 0) {
       return i18nProblems({ name: 'someField', fields }, problems)
     }
  }}/>
}


function i18nProblems(field: FieldInfo, problems: Problem[]) {
  return problems.map(p => i18nProblem(field, p))
}

function i18nProblem(field: FieldInfo, problem: Problem) {
  switch (problem.code) {
    case 'required':
      return intl.formatMessage(
        { defaultMessage: `{name) is required` },
        field
      )
    case 'min':
      return intl.formatMessage(
        { defaultMessage: `{name) must be more than {min}` },
        { name: field.name, min: problem.data.min }
      )
    ...
  }
}

@ssalbdivad
Copy link
Member

@unional Whoops, my bad! Not so useful if you can't access them😅

I'll be sure those are exposed for the next release, and the rest of this gives me good context to consider for some other details of this problem. I'll keep this issue open until then to ensure I follow through with that!

@unional
Copy link
Author

unional commented Jul 6, 2023

btw, on a separate note, I'm working on some custom parser for my application,
and I am considering this format:

const name = textParser({
  min: [3, (field, value) => format(...)],
})

@ssalbdivad
Copy link
Member

@unional Could you link it to me if it's open source or otherwise provide a bit of additional context in the example, e.g. what the input/output types are?

That would help me use it as a reference for the kind of solution that would work well for you.

@ssalbdivad ssalbdivad moved this from To do to In progress in arktypeio Jul 6, 2023
@Dimava
Copy link
Contributor

Dimava commented Jul 10, 2023

const username = type(
  [ 'string >= 3', 'throws', ({err, data, path, validatorContext, outerContext: { lang }}) => ({ en: 'No!', fr: 'Noes!', ru: 'Нет!' })[lang] ]
)
name(1, { lang: 'en', keysDefault: 'strict', algorithm: 'clone' }) // { sucess: false, error: whatever('No!') }

maybe?

@unional
Copy link
Author

unional commented Jul 10, 2023

Hi, sorry missed that message @ssalbdivad.

Could you link it to me if it's open source or otherwise provide a bit of additional context in the example, e.g. what the input/output types are?

Please don't worry about it. It's not open source and I'm just trying to figure out how to do it myself too.

@ssalbdivad ssalbdivad moved this from In progress to Backlog in arktypeio Dec 12, 2023
@TheOrdinaryWow
Copy link

Any update?

@ssalbdivad
Copy link
Member

ssalbdivad commented Jan 26, 2025

My current recommendation would be to map over ArkErrors, each of which has context on the individual error, and transform those to localized messages directly.

I'm interested in further improvements here but I don't have a clear idea of the right API for that yet and would need to do more research on popular i18n libraries.

@TheOrdinaryWow
Copy link

My current recommendation would be to map over ArkErrors, each of which has context on the individual error, and transform those to localized messages directly.

I'm interested in further improvements here but I don't have a clear idea of the right API for that yet and would need to do more research on popular i18n libraries.

My idea is to maintain locale internally like date-fns, which allow users to specify locale in options.

@unional
Copy link
Author

unional commented Jan 26, 2025

My idea is to maintain locale internally like date-fns, which allow users to specify locale in options

That likely wouldn't work, because validation is a logic and thus is dynamic (i.e. depends on context).

Localization in data-fns can be internalized because it describes a value. When translated, all words related to the value stay together and do not vary "in common form".

I put common form in qoutes because there are variations, which such approach wouldn't support.

For example, you can describe number with different characters in Chinese:

1 (one) can be written as 一, or 壹.

@TheOrdinaryWow
Copy link

TheOrdinaryWow commented Jan 26, 2025

My idea is to maintain locale internally like date-fns, which allow users to specify locale in options

That likely wouldn't work, because validation is a logic and thus is dynamic (i.e. depends on context).

Localization in data-fns can be internalized because it describes a value. When translated, all words related to the value stay together and do not vary "in common form".

I put common form in qoutes because there are variations, which such approach wouldn't support.

For example, you can describe number with different characters in Chinese:

1 (one) can be written as 一, or 壹.

What about using a combination of template strings and localized specific error types? eg.

en: must be a ${type} (was ${type})
zh: 需为 ${type} (得到 ${type})

Or to manipulate it via a function like

(name: string, errors: ArkErrors) => string

@TheOrdinaryWow
Copy link

TheOrdinaryWow commented Jan 26, 2025

My current recommendation would be to map over ArkErrors, each of which has context on the individual error, and transform those to localized messages directly.

I'm interested in further improvements here but I don't have a clear idea of the right API for that yet and would need to do more research on popular i18n libraries.

The problem now is that in zod, we can specify each type’s error message easily like

z.object({
  email: z
    .string({
      required_error: t("tips.field_required"),
      invalid_type_error: t("invalid_value"),
    })
    .email(t("msgs.invalid_email"))
    .min(1, t("msgs.incorrect_length"))
    .max(64, t("msgs.incorrect_length"))
})

But in ArkType, I did now see an easy approach, managing error messages is quite complicated.

For me that’s a major reason that stopped me from using ArkType. I respect it provides both "text" and "fluent" method to manage type, meanwhile the shortage of managing "option" or what say "detail" of each type is a bit confusing. From this perspective I think zod did a better job.

In other words, this might not be a "shortage" of ArkType, it’s that when comparing between ArkType and zod, they actually gives different design.

When it comes to zod, it’s actually acceptable to either specify error messages in type’s option or manage i18n globally (I’m a bit annoyed by how I need to set every localized error messages in EVERY type, even though they are almost the same so I’m desperately wanting for a new alternative).

But in ArkType, your type-specifying design only allow you to manage a global localization thing, since every type, if specified in "text" method, as far as I see there’s no concept of "options". So in this situation, the best solution I can imagine is to figure out a way to manage i18n like date-dns does, the only change is to configure it via global option.

@unional
Copy link
Author

unional commented Jan 26, 2025

Similar situation here. This is the main reason we can't use arktype.

What you show is one way to achieve that. The bottom line is who has control of the content, and who has control of the architecture.

The library (arktype) accepting a value or callback for the localization controls the architecture and gives control of the content to the consumer.

If it returns a validation result with all information needed for the consumer to process, it gives both control of the architecture and content to the consumer.

IMO the latter is the better approach.
The localization of error message is formatting, i.e. transforming data (error state) to some presentable form (localized string).

The presentation doesn't need to be only strings. What if the consumer wants to have different CSS styles for different errors?

This is essentially related to SRP. Don't take additional responsibility when not necessary.
It is beneficial for arktype to not get into the matter of formatting. It is one less thing to maintain, and it is more flexible.

@ssalbdivad
Copy link
Member

@unional If I'm understanding correctly, this is already available.

You can introspect data related to the errors in the final ArkErrors result without needing to customize anything via the builtin error configs and build the localized messages however you want.

@unional
Copy link
Author

unional commented Jan 26, 2025

You can introspect data related to the errors in the final ArkErrors result without needing to customize anything via the builtin error configs and build the localized messages however you want.

Hi there @ssalbdivad 😊

So seems like the needed data is exposed now? Haven't come back to look into arktype since our last conversation. If it is, feel free to close this issue. 🍻

@ssalbdivad
Copy link
Member

@unional I actually think that data was always exposed, although ArkError does need some documentation: #1273

I will close this for now in favor of this discussion for additional feedback on how we can best support i18n internally.

@github-project-automation github-project-automation bot moved this from Backlog to Done (merged or closed) in arktypeio Jan 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done (merged or closed)
Development

No branches or pull requests

4 participants