-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix DAG size / usage calculation #56
Comments
@walkah, some thoughts on this issue, perhaps for our 1:1 tomorrow... What is the goal of this issue? Given a CID representing a UnixFS directory, return the number of bytes consumed by all files nested anywhere underneath the directory, after deduping? If so, are we referring to block level deduping, not file level? Are we talking about space consumed after encoding/compression, or raw data before encoding? Is UnixFS the wrong layer and we need something more general? As it relates to WNFS, is the idea that we only want to know about the disk consumption for the contents of the latest version of the WNFS drive, for a metric that is easy for users to understand, not past content that is still being stored and might need to be recovered? Has anyone tried Is there a simple test that demonstrates the problem with the current DAG size calculation? My hope was that $ mkdir test1
$ echo hello > test1/file.txt
$ du -s test1
8 test1
$ CID=$(ipfs add -r -Q test1)
$ ipfs dag stat $CID
Size: 68, NumBlocks: 2
# What does this size represent? Bytes, but of what?
# My own modified dag stat that uses nodestats.DataSize.
$ ../go-ipfs-dag-size/cmd/ipfs/ipfs dag stat $CID
Size: 14, NumBlocks: 2
# 2x what I get from the shell. What is DataSize? As it relates to general issues related to size calculation in go-ipfs, I did see ipfs#5690, which is still broken on master as of today. I'll continue exploring but I would welcome any background knowledge you might have that I'm missing. |
@justincjohnson here’s my example file system which shows as 1.4PB file system in the public tree https://boris.files.fission.name/ so this is specifically for WNFS formatted IPFS blocks, return how much is stored. We do store versions — but this is deduped at the block level. |
Thanks for the feedback @bmann. According to
|
That’s sounds closer to correct @justincjohnson! That’ll be a bug in the gateway code, so would be good to file. The other half of this is running this command regularly / in some way to store in our DB for the user. That’ll need some work to scope, James linked to the Talk post with background. |
Sounds good. This specific issue is about making sure go-ipfs gives us the correct numbers, so any other work can be scoped in a separate issue. |
First off: this is super valuable digging. 🙏 thank you Justin! It sounds like I would say let's treat this issue as:
As for this:
I did a little bit of digging - we don't need to run a command at all. We already do this in the fission-server code, I think we're just not doing it quite right. When we receive updates (new CIDs) we call IPFS stat (for example, on an app publish here- https://github.com/fission-suite/fission/blob/a0b46415e1e8b858aa6c6e503a7f2943a14e8218/fission-web-server/library/Fission/Web/Server/Types.hs#L753 ) and then we store the resulting size value in the DB for the app. (Metabase, e.g., is just pulling values directly from a read only follower of the production database). If I'm reading the code right (always a big "if"), that code calls this HTTP endpoint: https://docs.ipfs.io/reference/http/api/#api-v0-object-stat The docs recommends @justincjohnson it might be worth a little investigation into the difference between files stat vs dag stat? As soon as I hit "comment" here, I'll open an issue against the fission-server to update which endpoint we're pulling stat data from. |
An update on the status of this issue... The work to update the Fission server to use the /dag/stat API is being tracked in #603. This current issue is staying open to track the following.
|
I have no current plans to come back to this. Unassigning for now. |
Currently, the directory usage calculation is incorrect. This means that we can not, for example, implement file usage quotas or even get an accurate accounting of disk required for go-ipfs nodes.
See https://talk.fission.codes/t/file-metrics-and-quotas-part-2/1303 for context
The text was updated successfully, but these errors were encountered: