Skip to content

Commit 9598c37

Browse files
Mingundralley
authored andcommitted
Add warning about unsupported encodings
1 parent 60dc37f commit 9598c37

File tree

2 files changed

+48
-3
lines changed

2 files changed

+48
-3
lines changed

Cargo.toml

+47-2
Original file line numberDiff line numberDiff line change
@@ -53,10 +53,55 @@ async-tokio = ["tokio"]
5353
## Currently, only ASCII-compatible encodings are supported, so, for example,
5454
## UTF-16 will not work (therefore, `quick-xml` is not [standard compliant]).
5555
##
56-
## List of supported encodings includes all encodings supported by [`encoding_rs`]
57-
## crate, that satisfied the restriction above.
56+
## Thus, quick-xml supports all encodings of [`encoding_rs`] except these:
57+
## - [UTF-16BE]
58+
## - [UTF-16LE]
59+
## - [ISO-2022-JP]
60+
##
61+
## You should stop to process document when one of that encoding will be detected,
62+
## because generated events can be wrong and do not reflect a real document structure!
63+
##
64+
## Because there is only supported encodings that is not ASCII compatible, you can
65+
## check for that to detect them:
66+
##
67+
## ```
68+
## use quick_xml::events::Event;
69+
## use quick_xml::reader::Reader;
70+
##
71+
## # fn to_utf16le_with_bom(string: &str) -> Vec<u8> {
72+
## # let mut bytes = Vec::new();
73+
## # bytes.extend_from_slice(&[0xFF, 0xFE]); // UTF-16 LE BOM
74+
## # for ch in string.encode_utf16() {
75+
## # bytes.extend_from_slice(&ch.to_le_bytes());
76+
## # }
77+
## # bytes
78+
## # }
79+
## let xml = to_utf16le_with_bom(r#"<?xml encoding='UTF-16'><element/>"#);
80+
## let mut reader = Reader::from_reader(xml.as_ref());
81+
## reader.trim_text(true);
82+
##
83+
## let mut buf = Vec::new();
84+
## let mut unsupported = false;
85+
## loop {
86+
## if !reader.decoder().encoding().is_ascii_compatible() {
87+
## unsupported = true;
88+
## break;
89+
## }
90+
## buf.clear();
91+
## match reader.read_event_into(&mut buf).unwrap() {
92+
## Event::Eof => break,
93+
## _ => {}
94+
## }
95+
## }
96+
## assert_eq!(unsupported, true);
97+
## ```
98+
## That restriction will be eliminated once issue [#158] is resolved.
5899
##
59100
## [standard compliant]: https://www.w3.org/TR/xml11/#charencoding
101+
## [UTF-16BE]: encoding_rs::UTF_16BE
102+
## [UTF-16LE]: encoding_rs::UTF_16LE
103+
## [ISO-2022-JP]: encoding_rs::ISO_2022_JP
104+
## [#158]: https://github.com/tafia/quick-xml/issues/158
60105
encoding = ["encoding_rs"]
61106

62107
## Enables support for recognizing all [HTML 5 entities] in [`unescape`] and

src/encoding.rs

+1-1
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ pub(crate) const UTF16_BE_BOM: &[u8] = &[0xFE, 0xFF];
2828
/// key is not defined or contains unknown encoding.
2929
///
3030
/// The library supports any UTF-8 compatible encodings that crate `encoding_rs`
31-
/// is supported. [*UTF-16 is not supported at the present*][utf16].
31+
/// is supported. [*UTF-16 and ISO-2022-JP are not supported at the present*][utf16].
3232
///
3333
/// If feature `encoding` is disabled, the decoder is always UTF-8 decoder:
3434
/// any XML declarations are ignored.

0 commit comments

Comments
 (0)