Replies: 8 comments 5 replies
-
I think that sounds like a really interesting technical challenge and something you should work on if it excites you. I don't think it would be sensible for the project to adopt such a system though as I suspect the extra code and maintenance burden would outweigh the potential performance benefits over using I'm not much involved with the project though so others might come to another conclusion. Sorry to be a bit of a downer! |
Beta Was this translation helpful? Give feedback.
-
I started this, fully aware that the maintainers might not want to burden themselves with any C code, so I won't be too disappointed if this is rejected. Anyway, I should mention, in case people thought otherwise, that the code generator itself is written in Python. As an alternative, the generator could be made to output the equivalent Python code. It would still be a significant improvement over the current parser because it would use the Expat parser directly, building the output as it parses, with all the generated classes using |
Beta Was this translation helpful? Give feedback.
-
My fork is at https://github.com/Rouslan/breathe/tree/c_parser I still have a long way to go before the fork is usable, but the part that generates the C module is working. You can run As I'm replacing all references to the old parser classes, I'm also updating and adding more type annotations. Any comments or criticisms are appreciated. |
Beta Was this translation helpful? Give feedback.
-
The link is now https://github.com/Rouslan/breathe (I merged the c_parser branch into the main branch of my fork). It's mostly working now. Right now I need to fix some filters (all class members are being emitted regardless of given options). On that topic: would anyone object to having the high-level filter objects (subclasses of Selector, Accessor and Filter) removed and the filters replaced with simple callback functions. I can probably come up with a way to make the high-level objects type-safe but they really don't seem necessary in the first place. Even with the provided domain specific language, they don't seem to be more readable or significantly more concise than regular functions and functions would be faster anyway. Even the apparent functional quality is betrayed because one of the filters is impure (has side-effects), which is something that caught me by surprise. |
Beta Was this translation helpful? Give feedback.
-
The fork is now able to generate identical HTML output for the Pigweed project (* with one trivial exception, see below), a collection of libraries with a lot of files, making extensive use of Sphinx and Breathe. Currently, all the tests pass, including the ones I added. Later, I'm going to run the tests with Coverage.py to see what the tests' blind spots are and add more tests if needed. I'll also look up how to compile it according to the manylinux project so that binary wheels can be provided for Linux (in addition to Windows). * The exception is one method documented with the "param" command. The method has an unnamed argument and the "param" command is incorrectly given the type of the parameter in place of the name. This causes the original Breathe to omit the "[in]" qualifier in the output, for some reason. |
Beta Was this translation helpful? Give feedback.
-
We have tried using the fork to build documentation of ESP-IDF project, and the build time went down from 50 to 30 minutes. This is a very nice improvement, thanks @Rouslan for your work! I've noticed a few parser warnings ( |
Beta Was this translation helpful? Give feedback.
-
I have just removed an enormous bottleneck in the original code. Projects with a lot of code should be significantly faster now. |
Beta Was this translation helpful? Give feedback.
-
I have replaced the parser written in C with an equivalent one written in Python. It looks like the parser was never the problem and the reason my fork was faster was because I rewrote the "filters" and "finders". I still tried to make the new parser run as fast as I could. Its code is generated using the same system that generated the C parser. Interestingly, it's only about 5 times slower than the C one. On my machine, the new parser can still parse 2000 XML files, totaling 33MB, in 1.2 seconds. The memory usage should be about the same; the C parser's output stored all values in Python types and the values from the new parser all either use __slots__ or are named tuples. |
Beta Was this translation helpful? Give feedback.
-
I know someone was working on replacing the XML parser with one based on lxml, but how about one written in C? I started working on an XML parser generator that takes a simplified schema file written in JSON and outputs an extension module written in C.
The module uses Expat and Python's stable ABI. All complex types are their own classes and perfect hash lookup functions are generated for the element names, attribute names and enumerations. The generated code is meant to be easily readable and a fully-typed stub file will be generated.
I already have "compound.xsd" converted to the JSON schema and the project is able to generate most of the needed code. Before I finish and start trying to incorporate it into Breathe, I want to know what people's thoughts are, and if there are any objections to such a solution.
I'm also aware that the original author of Breathe is working on a Rust-based rewrite, but I think having most of the code in Python is preferable.
Beta Was this translation helpful? Give feedback.
All reactions