Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formally define the goals of Codex #5

Open
emilyyyylime opened this issue Nov 17, 2024 · 20 comments
Open

Formally define the goals of Codex #5

emilyyyylime opened this issue Nov 17, 2024 · 20 comments
Labels
meta Discussion about the structure of this repo

Comments

@emilyyyylime
Copy link
Collaborator

I think we should have some agreed upon document (even just as part of the README) directly stating which characters are or are not in scope for inclusion in this project, and how simple should a given character be to access.

This would also include some guiding principle for assigning names to characters, such as when abbreviations are okay and when they aren't, when can a character be accessible through multiple distinct names (and whether one of them should be considered "canonical"?), whether we strive to describe the usage of characters, their origin/formal meaning, or their visual appearance (or rather when do we do which).

In my opinion this will greatly help prioritise new additions to the repo and help reviewers to decide what changes to approve

(side note: do we have a preference for how to stylise the name; e.g. codex, Codex, or CodeX?)

@emilyyyylime emilyyyylime added the meta Discussion about the structure of this repo label Nov 17, 2024
@dccsillag
Copy link
Collaborator

dccsillag commented Nov 17, 2024

Strong agree. Maybe we can pull some points from the recent Discord conversations on this?

(Re. name styling: all sound good to me.)

@MDLC01
Copy link
Collaborator

MDLC01 commented Nov 17, 2024

Regarding the scope, I would say any Unicode character that is not part of a natural writing system may be considered for inclusion. But this may be too broad, and it is also immediately contradicted by most Greek and Hebrew letters being already included (and rightfully so).

(Re. name styling: all sound good to me.)

Please not CoDeX! I would say Codex (or CODEX in a capitalize context) is fine.

@emilyyyylime
Copy link
Collaborator Author

Are there any specific Unicode (or other) categorisations we could use to help refine this definition? I think it should firstly focus on the needs of Typst users (⇒ symbols used in academic writing and other typesetting settings, that are often not easily accessible in the keyboard layouts of users wishing to type them), and expand from there with more specific examples

@mkorje
Copy link
Collaborator

mkorje commented Nov 18, 2024

I'm of the opinion that Codex's scope should be fairly broad: to assign names to most Unicode characters. Whilst I agree that we should focus on the needs of Typst users first, I wouldn't want this to discourage adding names for symbols that fall outside of this. For example, #3, I think is a fine addition. Die face symbols aren't apart of any natural writing system (afaik...), and the name is clear-cut.

Regardless, I think any scope we set is destined to end up being too broad/restrictive. So I share @MDLC01's view that anything not part of a natural writing system may be considered, and things part of a natural writing system may be allowed if its inclusion makes sense. I'd suggest that the criterion for this inclusion is something along the lines of substantial usage and accessibility in users' keyboard layouts. (And this would rightly justify the inclusion of most Greek and Hebrew characters, I believe.)

With regards to naming guidelines, I think writing out the existing implicit "rules" would be a great start. For example, .rev and .not being the standard modifiers for the reversed variant of a character and the variant of a character with a forward slash through it.

@MDLC01
Copy link
Collaborator

MDLC01 commented Nov 18, 2024

With regards to naming guidelines, I think writing out the existing implicit "rules" would be a great start. For example, .rev and .not being the standard modifiers for the reversed variant of a character and the variant of a character with a forward slash through it.

I don't have any issue with something that would present itself as a sort of cheat-sheet, or as a general guideline. However, laying out precise normative rules is not a good idea, because most implicit rules have legitimate counter-examples.

@emilyyyylime
Copy link
Collaborator Author

+1 to @MDLC01

It seems that there's consensus that any non-deprecated Unicode character is theoretically in scope for Codex. Is coverage of all Unicode characters (that fit our criteria) a goal? That is; should we aim to map every character that could be useful to a convenient name?

One other question to consider that came to my mind is: should there be more namespaces than sym and emoji? I remember @laurmaedje has shown interest in keeping mathematical symbols in their own module (though of course that would be a very major breaking change and impractical to simply implement). A few new namespaces have already been suggested in The Symbols Document, but they've for the most part been contained under sym (e.g. see #2).

@MDLC01
Copy link
Collaborator

MDLC01 commented Nov 19, 2024

Is coverage of all Unicode characters (that fit our criteria) a goal?

I wouldn't say so. Some characters don't make sense to include. For example, control characters, or characters that are meant to be used as part of greater clusters.

should we aim to map every character that could be useful to a convenient name?

I may agree with this more. Specifically, the "that could be useful" part is important.

@emilyyyylime
Copy link
Collaborator Author

Alright, "could be useful" is part of our criteria for including characters then.

What I was trying to get at with "full-coverage" is, should we keep looking for new characters that fit our criteria throughout the Unicode planes, and then call the project "complete" until a new version of Unicode is released? Or perhaps we could take more of an 'on-demand' approach to adding new characters; only adding characters when the need for them presents itself (which could still include someone finding a character and figuring we should include it)

@MDLC01
Copy link
Collaborator

MDLC01 commented Nov 19, 2024

I would say a long term goal is to eventually consider every character. Short term, it of course make sense to start with blocks that contain more useful characters, and take people's suggestions into account. Also, deciding not to add a character at some point should not prevent us from adding it later based on demand.

If I understand well, this means: short term, take an "on-demand" approach; long term, take the "full coverage" approach.

@emilyyyylime
Copy link
Collaborator Author

Alright. Would anyone like to begin work on writing down the goals? Possibly we could draft it in a new file in the Proposals document and once everyone is happy with it create a PR to add it in README or a specific guidelines.md file

@MDLC01
Copy link
Collaborator

MDLC01 commented Nov 19, 2024

This should probably be in a separate document. I can create a new document in the Codex team on the webapp (formerly Symbols team, I just renamed it) if you want.

@MDLC01
Copy link
Collaborator

MDLC01 commented Nov 19, 2024

Alright, I created a document anyone can write to: https://typst.app/project/wfsdJgobtek11i1cZVXqIe. We should also be able to use the webapp's comment feature.

@mkorje
Copy link
Collaborator

mkorje commented Nov 20, 2024

With regards to naming guidelines, I think writing out the existing implicit "rules" would be a great start. For example, .rev and .not being the standard modifiers for the reversed variant of a character and the variant of a character with a forward slash through it.

I don't have any issue with something that would present itself as a sort of cheat-sheet, or as a general guideline. However, laying out precise normative rules is not a good idea, because most implicit rules have legitimate counter-examples.

Fair point, I agree then a general guideline would be the way to go (where we make clear that these are not normative).

@MDLC01
Copy link
Collaborator

MDLC01 commented Nov 20, 2024

There is also another question, which is whether Codex should be considered independant from Typst, or if Typst is the only use cas we should have in mind.

For example, we do not define names for mathematical calligraphic letters, because they are already accessible in Typst using other means. This is not compatible with the idea that Codex should be an independant library.

If Codex wants to be independant, some functions that are currently implemented in the Typst codebase should be moved here. Otherwise, we may prevent other use cases.

The fact that it is maintained separately from Typst, and has a unique name, makes me lean toward Codex should be usable outside of Typst. But in the end this is probably the Typst team's decision.

@emilyyyylime
Copy link
Collaborator Author

Yeah, I'd love to hear input from one of them

@laurmaedje
Copy link
Member

If Codex wants to be independant, some functions that are currently implemented in the Typst codebase should be moved here. Otherwise, we may prevent other use cases.

How would that look for calligraphic etc.?

@dccsillag
Copy link
Collaborator

dccsillag commented Dec 8, 2024

I strongly think that Codex should not be indenpendent from Typst. It's a cute idea, but probably would cause a lot of problems down the road. By keeping it 'tied' to Typst, we have clear end-users in mind, which should help us make decisions, especially any more subjective ones.

That said, I think we should strive to make it easily usable from outside the Typst codebase. But I believe that having a clear notion of our end-users is essential.

P.S.: also, calibraphic letters are another font, right? Feels like there are things that a caligraphic font enables that just the unicode symbols don't, but I'm not sure.

@MDLC01
Copy link
Collaborator

MDLC01 commented Dec 9, 2024

P.S.: also, calibraphic letters are another font, right? Feels like there are things that a caligraphic font enables that just the unicode symbols don't, but I'm not sure.

For use in maths, Unicode defines a set of calligraphic counterparts of Latin letters: https://unicode.org/charts/PDF/U1D400.pdf.

@MDLC01
Copy link
Collaborator

MDLC01 commented Dec 9, 2024

How would that look for calligraphic etc.?

I'm not sure. I also realize that I was probably too affirmative in my message. It seems reasonable to consider moving some functions here, but I no longer believe we should do it.

@laurmaedje
Copy link
Member

Okay. I think I'm in agreement with @dccsillag that the main focus should be on Typst, at least for now. We can always expand things here later, especially should there be interest in usage outside of Typst.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta Discussion about the structure of this repo
Projects
None yet
Development

No branches or pull requests

5 participants