Skip to content

Add de::from_str_with_whitespace and de::from_reader_with_whitespace #855

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jamwil
Copy link

@jamwil jamwil commented Apr 19, 2025

My goal was to mimic the pattern from the procedural Reader in the declarative Deserializer. This would involve creating a configuration struct that is owned by Deserializer and mutating it before calling from_str or from_reader, similar to below.

fn make_de<'de>(source: &'de str) -> Deserializer<'de, SliceReader<'de>> {
    dbg!(source);
    Deserializer::from_str(source) 
}

mod config {
    use super::*;
    use pretty_assertions::assert_eq;

    #[test]
    fn preserve_whitespace() {
        let mut de = make_de(r#"<tag>   Some text with extra   whitespace   </tag>"#);
        de.config_mut().trim_text(false);
        assert_eq!(de.next().unwrap(), DeEvent::Start(BytesStart::new("tag")));
        assert_eq!(de.next().unwrap(), DeEvent::Text("   Some text with extra   whitespace   ".into()));
        assert_eq!(de.next().unwrap(), DeEvent::End(BytesEnd::new("tag")));
    }
}

Since from_str and from_reader form the entire public interface for the de module, we need to create two new functions to maintain backward compatibility:

  1. from_str_with_whitespace
  2. from_reader_with_whitespace

The configuration needs to be interpreted in two places, since trimming the start of the text happens in a separate procedure from trimming the end of the text. When we construct a Deserializer, it creates an XmlReader, which containers either a SliceReader or an IoReader. Both of those readers take a StartTrimmer, which has two jobs for each xml Event:

  1. it converts raw xml Events into PayloadEvents, and
  2. it trims the start of the text.

For each Event, the SliceReader or IoReader trims and processes it, handing a PayloadEvent to XmlReader. The XmlReader takes the PayloadEvent, trims the end before decoding, and hands back DeEvents. At this point, all trimming is complete. The Deserializer then decodes the trimmed DeEvents.

XmlReader shall own the DeConfig object, since that is the lowest-level common object that touches both the start-of-string and end-of-string trimming. Recall that XmlReader, which trims the end, owns IoReader and SliceReader, one of which trim the start.

@jamwil
Copy link
Author

jamwil commented Apr 19, 2025

My first instinct was to make the start_trimmer attribute of IoReader/SliceReader an option, and if it was None, simply create PayloadEvents without trimming. The issue is that we initialize the raw readers as part of the construction chain, so we don't have an opportunity to read our DeConfig to determine how to initialize them. We'll need to implement the trim switch at read time (when next() is called), not initialization time.

On to the next iteration...

@jamwil jamwil changed the title Add option to preserve whitespace to de::from_str and de::from_reader Add de::from_str_with_whitespace and de::from_reader_with_whitespace Apr 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant