-
Notifications
You must be signed in to change notification settings - Fork 283
Add definition splitting to transpiler #1477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
16f6f85 to
4ffc28d
Compare
|
Not against this approach per se, just wanted to bring up the alternative approach of using the existing |
d2cf708 to
5b517cb
Compare
I discussed this with @thedataking in a call. One problem is that we only have the The other problem is that even given said annotations, we only know the start of the C definition, not its entire bounds (nor bounds of any relevant preceding comments). Most of the logic here is in getting precise bounds from the information in the Clang AST, which is otherwise lost. So we would need to add that logic somewhere even if we were going to use the |
e751b54 to
e51268a
Compare
|
We can't practically integrate the Rust splitting/merging tools into our |
kkysen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't practically integrate the Rust splitting/merging tools into our
Cargo.tomlbecause they use Rust 2024 and have dependencies that also do. But I don't think it's a problem to just have them live in their own subdirectories for now; they just have to be built before the c2rust-postprocess tool can invoke them.
That seems fine for now. When we get around to #1227, we can add it back.
85b2bc4 to
4fe4d8d
Compare
e51268a to
edcb0bf
Compare
485b683 to
6f349c1
Compare
|
We now seem to extract correct source ranges for all the definitions in libxml2 and lua in the testsuite. |
…tem to the corresponding C definition
this is being used for local definitions instead of the `static` keyword, so it should encompass the newer helpers we've defined recently
Clang's SourceRange ends at the beginning of the final token, but this is not useful if we want to look at every character of a declaration, as we may when considering the spelling of the corresponding C definition
otherwise, we treat them as one character, which means the next definition's source range will expand to include them as if they were leading comments for it
otherwise, debug builds panic here
this can occur if a macro is defined inside a top-level function
otherwise, inner decls like local variables confuse our tracking of definition bounds
c56e84d to
9d65be7
Compare
|
|
||
| ## Building | ||
|
|
||
| These tools rely on rust-analyzer's libraries, so they need a relatively recent version of Rust. Run `cargo build --release` in their respective directories to build them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me, that picks up the c2rust nightly toolchain and fails because it is too old. I had to run cargo +stable build --release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that's because I was running actual cargo, not rustup aliased to cargo. In the latter case (which is probably more common) your command is the right one; I'll clarify. Or alternately, rustup might pick up a rust-toolchain.toml if we placed one somewhere.
kkysen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do I test this? Also, do we need to merge the --emit-c-decl-map changes to c2rust-transpile/c2rust-ast-exporter at the same time as the new tools/? The former looks mostly good to me, but I haven't reviewed the latter. Also, is it worth it to review split_ffi_entry_points, as FWIU, it's already in crisp?
| decls_sorted.sort_by(|v1, v2| { | ||
| self.c_decls[v1] | ||
| .begin_loc() | ||
| .cmp(&self.c_decls[v2].begin_loc()) | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| decls_sorted.sort_by(|v1, v2| { | |
| self.c_decls[v1] | |
| .begin_loc() | |
| .cmp(&self.c_decls[v2].begin_loc()) | |
| }); | |
| decls_sorted.sort_by_key(|v| self.c_decls[v].begin_loc()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also move the sorting to before name_loc_map and prev_src_loc, which are only used in the loop?
| ), | ||
| }; | ||
|
|
||
| match serde_json::ser::to_writer(file, &decl_map) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably best to just write to a Vec<u8>/String and then write the file all at once. You could also write to a buffered writer, but that's easy to forget, like here.
| "Unable to write C declaration map to file {}: {}", | ||
| output_path.display(), | ||
| e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "Unable to write C declaration map to file {}: {}", | |
| output_path.display(), | |
| e | |
| "Unable to write C declaration map to file {}: {e}", | |
| output_path.display(), |
| } | ||
|
|
||
| let file_content = | ||
| std::fs::read(&t.ast_context.get_file_path(t.main_file).unwrap()).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use a .expect() here or fs_err? A file missing is a fairly common error, so better context is important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a good idea, will do.
| let mut begin_offset = src_loc_to_byte_offset(&line_end_offsets, begin); | ||
| let mut end_offset = src_loc_to_byte_offset(&line_end_offsets, end); | ||
| assert!(begin_offset <= end_offset); | ||
| const VT: u8 = 11; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's VT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vertical tab?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, ASCII VT, which has gone out of fashion so Rust lacks an escape for it.
| }) | ||
| .collect::<HashMap<_, _>>(); | ||
|
|
||
| // Generate a map from Rust items to the source code of their C declarations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we put this in its own function? I think it'd be easier to follow that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, an emit_c_decl_map function would probably be good to avoid bloating fn translate.
| .collect::<HashMap<_, _>>(); | ||
|
|
||
| // Generate a map from Rust items to the source code of their C declarations. | ||
| let decl_map = if let Some(decl_source_ranges) = decl_source_ranges { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
decl_source_ranges.map(|decl_sources_ranges| {})?
|
|
||
| let file_content = | ||
| std::fs::read(&t.ast_context.get_file_path(t.main_file).unwrap()).unwrap(); | ||
| let line_end_offsets = //memchr::memchr_iter(file_content, '\n') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the memchr for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be a more efficient equivalent, but I don't think we have the dependency and this hasn't been a bottleneck.
| assert!(begin_offset <= end_offset); | ||
| const VT: u8 = 11; | ||
| /* Skip whitespace and any trailing semicolons after the previous decl. */ | ||
| while let Some(b'\t' | b'\n' | &VT | b'\r' | b' ' | b';') = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use u8::is_ascii_whitespace + ;?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't include \v, but this could be expressed using that function if we wanted. I don't know if it's clearer, given the specific set of characters we care about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. is_ascii_whitespace also handles FF but not VT. Is VT worth handling at all? I guess it doesn't matter that much. I don't imagine anything will actually produce VTs or FFs.
The flag is independent from In addition to the testsuite, I've been using the following test case which I'll probably add as a snapshot test: #include <stddef.h>
#include <stdint.h>
int a;int bb;/*comment for cc*/int ccc;
/*
123456789012345678901234567890
*/
typedef uintptr_t ngx_uint_t;
void f(void){}void g(void){}/*comment for h*/void h(void){}int d;int e;;;;;;;;int another;
//comment for ngx_hash_init
void
ngx_hash_init(void)
{
size_t len;
ngx_uint_t align = len & ~sizeof(void *);
}
int
anotherfunc(void)
{
int var = 0;
return 4 + var;
}
int
funcholdingdefine(void)
{
#define FOO 4000+50
return FOO;
}
void forwarddeclbeforeotherdef(int x);
void funcafterdecl(int n) {}
/*real defn*/
void forwarddeclbeforeotherdef(int x) { } |
Currently this is just the C-side implementation; I'll add commits moving the impl of Rust splitting and merging from the crisp repository tomorrow.