-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a debugging form for car files. #341
Conversation
This change adds two new sub-commands to the car CLI car debug file.car creates a patch-file-compatible representation of the content of the car file. Blocks will be represented in dag-json pretty-printed form. car compile file.patch will do the inverse process of building a car file from a debug patch file. CIDs will be re-compiled based on the contents of blocks, with links in parent blocks updated to point to the compiled values.
Fully support having this! |
windows failures, probably crlf related? also this is weird on macos https://github.com/ipld/go-car/actions/runs/3410523283/jobs/5673582480, maybe just a flake since it's passing on the other 3 macos runners but it seems cmd related |
There's some sort of flakiness with testscript in general. @rvagg are you okay with design / the proposed debug patch format modulo the test? |
|
||
outStream.WriteString("car compile ") | ||
if rd.Version == 2 { | ||
outStream.WriteString("--v2 ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this is the only hint that it once was a v2 and there's not space to say anything else about it? I guess mostly we care about stability of v1 forms and the v2 is just a convenience wrapper, but if we went ahead with additional features, such as the messaging capability in #322, then where would we put these things? Would it be hard to extend this format to include those things, and could we do it in a non-breaking way perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is mirroring git patch
` where the first line mirrors the command used to generate the patch.
Here we use v2 to indicate if the original car was a v2 or not, and can use that as a default to re-build the same car format if it is not specified explicitly on the command line when re-compiling
if err != nil { | ||
return err | ||
} | ||
if strings.HasPrefix(string(rootLine), "root ") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, so this is is not exhaustive, if it doesn't match root
then it continues to loop and drop rootLine
. So is this where we get potential backward compatibility of additional v2 features that we can insert with ---
, and also the --foobar
arguments in the header?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(continuing my thought below about v2 features, where I started my comments on debug first before moving up here to compile)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, other meta info would go as lines in this section.
|
||
//fmt.Printf("structuring as tree...\n") | ||
// structure as a tree | ||
childMap := make(map[cid.Cid][]cid.Cid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoa, so I think this whole next section (x2) exists because we're not confident that the resulting CIDs will match the input CIDs, so we're doing a search, reconstruct, replace operation on them all, is that right?
- why do we not have confidence they're going to reconstruct byte-perfect, shouldn't that be reasonable? There might be a CIDv0 CIDv1 difference but we could easily do that check -- if expected CID is v0 then downcast actual CID and compare.
- why don't we just error if the resulting CID of the reconstructed block doesn't match the expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought about this after writing and I realise that it's probably because we anticipate some input to come in badly encoded forms so round-tripping is going to result in mismatched CIDs. So I guess that's why this is here.
This does seem like a lot of effort to go to though; and it's also a little error-prone just replacing the CIDs as strings. That won't necessarily always find the actual links, they could just be included as text and it assumes that they want them to be changed. A find and replace should probably be at least looking for "/": "...."\n
, but even that's not necessarily accurate either. A more complete approach would be to walk the instantiated data model form and change the links out, but I guess that's even more code complication! Then there's the CIDv0 vs CIDv1 thing, what if the original wants to be in CIDv0 but we up-convert them to CIDv1?
Lots of effort to do it the right way, but it makes me question whether doing it at all is worth it.
Your call I guess, but maybe add some comments in here about what it's doing so the first person to encounter a bug with this knows why and can choose whether to fix it. It wasn't clear to me what it was doing until I walked through the whole two blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is for the 'i want to mutate the graph and re-build a valid tree' case.
if i change the json in the patch, then that block will have a new cid when it's hashed.
that hash change needs to propagate back up the DAG.
data model still won't get all edge cases - what if it's a different codec for the same MH? what if the link is encoded in a block that's raw or that we can't parse?
this is the first pass that works reasonably well for the json/cbor cases that i've attempted to manually edit.
i am unconvinced it's worth the time to do much more complex work and still not do something perfect vs where it is currently, which supports a pretty valuable use case already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh right, tinker with the content and regenerate the graph; fair enough, I can see that being useful - it does seem like something that needs clear caveats in comments though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably OK, I'd like to know what itch was being scratched here? it's certainly a nice utility to peek inside a CAR and perhaps even a nice way to peek inside IPLD blocks in general without having to code something up (you could even ipfs export .. | car debug -
to get pretty-printed output that you can't get from ipfs dag get --output-codec=dagjson
, but that's quite a hack!).
Aside from comments inline, the main concern I have is the ability to corrupt the format if you include a raw
block that is diff-like. You could even end up doing that with this tool - make a .patch from a CAR, bundle that up as unixfs and include it in a CAR and make a .patch from that and 💥 it will break because ---
. But I think you could easily address that by changing isPrintable()
to also check for ^---
and false
out if there's one in there.
for example: |
* add check for bytes not containing end-of-patch sequence
made the raw blocks a bit more cautious per the edge case you pointed out |
}, outStream); err != nil { | ||
return err | ||
} | ||
for c, blk := range outBlocks { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the failing tests are because of this, and the same block below for v1 -- the blocks are ending up shuffled thanks to iterating over the go map[]
.. we're going to need to keep a slice of CIDs to iterate over and then rewrite those when you do the reconstruction.
good catch, @rvagg - made the output match the order of initial blocks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice green ticks
I still think it needs notes in the DAG rewrite bit, so I've added my suggestions for what that might look like.
Co-authored-by: Rod Vagg <[email protected]>
This change adds two new sub-commands to the car CLI
car debug file.car
creates a patch-file-compatible representation of the content of the car file. Blocks will be represented in dag-json pretty-printed form.
car compile file.patch
will do the inverse process of building a car file from a debug patch file. CIDs will be re-compiled based on the contents of blocks, with links in parent blocks updated to point to the compiled values.
an example debug patch of the car used in the testscript test fixture would be: