-
Notifications
You must be signed in to change notification settings - Fork 50
Migrate to ICU4X #436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
conor-93
wants to merge
71
commits into
linebender:main
Choose a base branch
from
conor-93:icu4x
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+3,936
−1,097
Draft
Migrate to ICU4X #436
Changes from 58 commits
Commits
Show all changes
71 commits
Select commit
Hold shift + click to select a range
586f39f
side-by-side analysis with Swash (bidi levels, boundaries, scripts) +…
conor-93 f03116d
- resolve Mandatory boundaries
conor-93 4054508
- boundary analysis clean-up / condense logic / todos for optimisations
conor-93 0ec33d4
- avoid consuming and re-creating iterator over word boundary data
conor-93 4ab3adc
- remove `previous_substring_end`, made redundant by `building_range_…
conor-93 ca6b03a
avoid consuming iterator when getting first/last char lens
conor-93 8de6f53
avoid unnecessarily consuming iterators for script/line break data
conor-93 b2a0fc2
.
conor-93 9895c38
.
conor-93 326fd09
- dont reallocate string for fast path
conor-93 2d9a397
.
conor-93 8ce1898
avoid allocating vecs for boundaries/bidi levels
conor-93 663f79f
.
conor-93 0971ca3
.
conor-93 d3303ae
establish an iterator for `contiguous_word_break_substrings` instead …
conor-93 fb20378
just store index, not char too
conor-93 26a8591
.
conor-93 2581a69
.
conor-93 ac31353
.
conor-93 f3a53a4
address TODOs
conor-93 4406abd
add Swash-equivalent Cluster types to Parley, WIP select_font reimple…
conor-93 00609fb
icu-backed select_font equivalent impl
conor-93 36ed996
select_font working, minus force_normalize
conor-93 0621f5d
force_normalize pseudocode/groundwork
conor-93 fb24f87
- frontload/simplify analysis info access
conor-93 1ea1c3a
fix crash on empty style ranges
conor-93 09bac72
use icu for everything except script
conor-93 c77d830
use icu for script/locale/language
conor-93 77c2a36
- simplify and fix bidi level retrieval + add tests
conor-93 1269dcc
optimise is_emoji_grapheme
conor-93 4865e52
- remove 'UserData' concept, name as style_index within Parley, make …
conor-93 269d15d
.
conor-93 b6ef959
frontload analysis of remaining flags, clean-up
conor-93 ee98e60
.
conor-93 9305809
.
conor-93 678bc14
.
conor-93 5120ba7
avoid unnecessary cluster vec allocation in shape_item
conor-93 9343970
.
conor-93 cac1bfd
CharInfo comments
conor-93 364e37b
.
conor-93 7f62eec
move analysis + tests into analysis module
conor-93 c28f356
.
conor-93 c4d79ef
.
conor-93 99866c5
populate analysis mod
conor-93 8d41e9e
migrate from swash Whitespace
conor-93 a01e3a2
migrate from swash Boundary
conor-93 3f984d0
migrate from swash WordBreakStrength
conor-93 1168372
linting
conor-93 cf307f4
swash_convert -> icu_convert w/ fixes
conor-93 119a30b
test conversion (intermediate)
conor-93 43f69c0
.
conor-93 046d7e5
remove redundant/excessive assertions
conor-93 5197277
- baked data source build configuration
conor-93 7dc55ab
remove bidi.rs and remaining swash info stores
conor-93 31515ec
.
conor-93 0718714
.
conor-93 395b52f
Merge remote-tracking branch 'origin/main' into icu4x
conor-93 62578aa
- remove print statements
conor-93 d59c170
- use modular icu dependencies
conor-93 04bf451
move deps to workspace
conor-93 858b9c4
move Setting into Parley, remove swash dep entirely from the main crate
conor-93 e9e107c
.
conor-93 e7612cd
.
conor-93 ba430a3
stop compiling all locale data, just use en for now
conor-93 166a2a8
use short name in place of hard-coded fontique script tag
conor-93 f505e7c
generic loop -> while let
conor-93 3c52a47
.
conor-93 26a710b
.
conor-93 4637d82
icu_locale_core::LanguageIdentifier
conor-93 d2ee5ed
.
conor-93 50ecbc8
.
conor-93 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,55 @@ | ||
| //! Defines baked ICU4X Unicode data providers. | ||
| //! | ||
| //! This narrows data compiled from all Unicode data sets, to only that which we use. | ||
|
|
||
| use icu_provider_export::baked_exporter::*; | ||
| use icu_provider_export::prelude::*; | ||
| use std::path::PathBuf; | ||
|
|
||
| fn main() { | ||
| println!("cargo:rerun-if-changed=build.rs"); | ||
|
|
||
| let mod_directory = PathBuf::from(std::env::var_os("OUT_DIR").unwrap()) | ||
| .join("baked_data"); | ||
|
|
||
| let source = icu_provider_source::SourceDataProvider::new(); | ||
|
|
||
| ExportDriver::new( | ||
| [DataLocaleFamily::FULL], | ||
| DeduplicationStrategy::Maximal.into(), | ||
| LocaleFallbacker::new_without_data(), | ||
| ) | ||
| .with_markers([ | ||
| // Properties - Map data | ||
| icu::properties::provider::PropertyEnumScriptV1::INFO, | ||
| icu::properties::provider::PropertyEnumGeneralCategoryV1::INFO, | ||
| icu::properties::provider::PropertyEnumBidiClassV1::INFO, | ||
| icu::properties::provider::PropertyEnumLineBreakV1::INFO, | ||
| icu::properties::provider::PropertyEnumGraphemeClusterBreakV1::INFO, | ||
|
|
||
| // Properties - Set data | ||
| icu::properties::provider::PropertyBinaryVariationSelectorV1::INFO, | ||
| icu::properties::provider::PropertyBinaryBasicEmojiV1::INFO, | ||
| icu::properties::provider::PropertyBinaryEmojiV1::INFO, | ||
| icu::properties::provider::PropertyBinaryExtendedPictographicV1::INFO, | ||
| icu::properties::provider::PropertyBinaryRegionalIndicatorV1::INFO, | ||
|
|
||
| // Segmenters | ||
| icu::segmenter::provider::SegmenterBreakGraphemeClusterV1::INFO, | ||
| icu::segmenter::provider::SegmenterBreakWordOverrideV1::INFO, | ||
| icu::segmenter::provider::SegmenterDictionaryAutoV1::INFO, | ||
| icu::segmenter::provider::SegmenterLstmAutoV1::INFO, | ||
| icu::segmenter::provider::SegmenterBreakWordV1::INFO, | ||
| icu::segmenter::provider::SegmenterBreakLineV1::INFO, | ||
| ]) | ||
| .export( | ||
| &source, | ||
| BakedExporter::new(mod_directory.clone(), { | ||
| let mut options = Options::default(); | ||
| options.overwrite = true; | ||
| options | ||
| }) | ||
| .unwrap(), | ||
| ) | ||
| .expect("Datagen should be successful"); | ||
| } |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on the
icucrate with default features enabled doesn't seem right to me. That's going to pull in a lot of stuff which we surely aren't using?