-
Notifications
You must be signed in to change notification settings - Fork 3k
Define normalization of full-width numeric characters in <input type=number> user input #11616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…number> user input This patch adds guidance that user agents should accept and normalize full-width characters commonly produced by CJK input methods in <input type="number"> fields. This includes: - Full-width digits (U+FF10–U+FF19) - Full-width hyphen-minus (U+FF0D) - Prolonged sound mark (U+30FC) - Unicode minus sign (U+2212) - Full-width full stop (U+FF0E) These characters must be normalized to their ASCII equivalents before applying floating-point parsing rules. This normalization applies only to user input (keyboard or IME), not to script-assigned values via the `.value` IDL attribute, which must remain ASCII-only. See issue: whatwg#11395
<li>the minus sign (U+2212), and</li> | ||
<li>full-width full stop (U+FF0E).</li> | ||
</ul> | ||
These characters must be interpreted according to their Unicode numeric meaning and normalized to their ASCII equivalents |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good start, but I think we want to spell out what they end up mapping to.
We can partially explain it with NFKC I think, but some substitutions might be needed as well, for instance for U+30FC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@annevk Thanks, I agree — spelling out the exact mappings makes sense, and NFKC alone won’t cover everything.
I’ll be giving a talk at a conference in Japan next Sunday, where I expect to gather input from Japanese developers who actively face these issues.
I’d like to incorporate their feedback before updating the PR, so I plan to follow up in about 2–3 weeks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial NFKC explanation probably makes the situation less clear than giving an explicit mapping for this handful of characters.
(I think it generally makes sense to do a mapping like this.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@whatwg/i18n is there anything in Unicode we could borrow for this that's better suited than normalization? It's not directly web-exposed, but it still seems unfortunate to have mappings of characters maintained outside of Unicode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The numeric property does a better job for digits than normalization. The other symbols that can appear in number formats (decimal separators, grouping separators, plus/minus signs, etc) can be found in CLDR data (noting that the meaning of the symbols depends on the locale). To @hsivonen's point, there are just a handful of wide/narrow equivalents. CLDR doesn't list these and it would be better to make that list than to introduce NFKC (even thought NFKC for the characters in question is identical). I'd probably push on the CLDR folks to address this, so that it percolates downstream into ICU, Intl/JS, etc. (and not just HTML), but I don't believe that there is a ready-made mapping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PS> I added this to I18N's agenda for 2025-09-25
Summary
This patch adds guidance that user agents should accept and normalize full-width characters commonly produced by CJK input methods in fields. This includes:
These characters must be normalized to their ASCII equivalents before applying floating-point parsing rules.
This normalization applies only to user input (keyboard or IME), not to script-assigned values via the
.value
IDL attribute, which must remain ASCII-only.Motivation
Intl.NumberFormat
.Non-goals
input.value = ...
).Related issue
Checklist
(See WHATWG Working Mode: Changes for more details.)