-
Notifications
You must be signed in to change notification settings - Fork 20
Conversation
044551d
to
34e97fb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(first pass, will take it for a spin on Friday)
backend/handlers.go
Outdated
|
||
// Setup header for the output car | ||
err = car.WriteHeader(&car.CarHeader{ | ||
Roots: []cid.Cid{rootCid}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💭 is ipfspath
always /ipfs/cid
and never /ipfs/cid/sub/path
?
Producing a CAR that has /ipfs/cid
listed as root, and not the CID of /ipfs/cid/sub/path
, means it will over-fetch when someone tried to import it via ipfs dag import
.
If subpaths are supported here, we either need to list CID of /path
and not /cid
, or list no roots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's definitely also /ipfs/cid/path/to/stuff
. It's unclear to me what is supposed to go in the roots here though. IIUC some of the CAR splitting tools that exist in dag-house/filecoin/etc. for when they want to take a DAG and split it across CAR files put in the root of an incomplete graph at the front of the CAR files.
Maybe we could just remove the roots entirely, but not sure if some tooling expects them to exist. This (perhaps out of date?) document basically leaves all this as 🤷 https://ipld.io/specs/transport/car/carv1/#unresolved-items.
@rvagg what roots are you emitting from Lassie for partial DAGs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always the requested CID - based on the assumption that since we're working with "verifiable" CARs that you want to start from the root and navigate into your content and that the root is the user's trust anchor that we have to revolve around.
It also gives us a nice workflow for these partial DAGs where we can pipe them into plain UnixFS tooling that assumes the root is where everything starts. So right now you can do things like lassie fetch -o - <cid>/path/to/thing | car extract <cid>/path/to/thing -
and you have verified download and extraction without needing to do anything else! (Latest car
can also skip the whole <cid>/path/to/thing
argument, it'll just fish around in your incomplete DAG underneath the root of the CAR and find whatever's in there to extract and lassie will only give you the minimum, which is still verified).
Without additional metadata to embed "/path/to/thing" in the CAR header, I can kind of see an argument for wanting to put an additional "root" for each step of the DAG along that path (which could get quite large!), but that seems like a bad fix for something that's not really essential—we only ever give you just enough to reconstitute what you need to get from the root to your content so I think our tooling should be able to deal with these cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh and I forgot to mention also wrt multiple roots for each step on the path, even if we just only wrote the root that was at the path terminus - we don't have that information when we start writing the CAR and it'd be ideal to not have to buffer and delay until we're deep enough to get that information, because again these path DAGs could get quite large!
And aside, there still remains open the possibility of abusing CARv2 to squish in some kind of metdata if we really want it: ipld/go-car#322, ipld/js-car#89. But I think first we should explore what we're really solving for—if it's just for ipfs dag import
then let's either (a) fix that, or (b) provide adjacent tooling to make the problems go away. (or more radically, (c) consider that a legacy problem?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, my worry is the ipfs dag import
UX becomes a footgun when we introduce partial CARs.
@rvagg my vote would be to (a) fix partial CAR use case by changing default to --pin-roots=false
.
Filled ipfs/kubo#9765, would appreciate support/feedback there.
629358e
to
d8b4d4e
Compare
Co-authored-by: Marcin Rataj <[email protected]>
Co-authored-by: Marcin Rataj <[email protected]>
backend/handlers.go
Outdated
if !isCar { | ||
http.Error(w, "only car format supported", http.StatusBadRequest) | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Also support single block requests (without paths)
/* | ||
Implementation iteration plan: | ||
|
||
1. Fetch CAR into per-request memory blockstore and serve response | ||
2. Fetch CAR into shared memory blockstore and serve response along with a blockservice that does block requests for missing data | ||
3. Start doing the walk locally and then if a path segment is incomplete send a request for blocks and upon every received block try to continue | ||
4. Start doing the walk locally and keep a list of "plausible" blocks, if after issuing a request we get a non-plausible block then report them and attempt to recover by redoing the last segment | ||
5. Don't redo the last segment fully if it's part of a UnixFS file and we can do range requests | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling out that this is the basic plan. At the moment 1 is done. 2 is mostly done. We don't need to land all of this before merging or doing testing with mirror traffic but we do need 4 for safety DoS purposes when being used with actually untrusted backends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💭 @aschmahmann when we start doing the walk locally, and only fetch missing parts of graph, caboose will be able to see that the requested path is different than the original Content Path requested by the user that we pass in context.
My thinking is that for such "internal" block/CAR requests, we will want to forward the parent content path to the backend HTTP server in Ipfs-Path-Affinity
header.
Forwarding this hint does not cost us much, but will allow for caching/content routing optimizations (e.g. if the root of CAR i requested is not on DHT, Lassie will still be able to find it because it was able to find root CID from Ipfs-Path-Affinity
, if present). This is not specific to Rhea, RAPIDE, or any other block+car client like IPFS in Chromium would benefit for this (I plan to IPIP Ipfs-Path-Affinity
header at some point and add it as optional to https://specs.ipfs.tech/http-gateways/trustless-gateway/)
bsrv blockservice.BlockService | ||
} | ||
|
||
func NewGraphGatewayBackend(f CarFetcher, blockFetcher exchange.Fetcher, opts ...GraphGatewayOption) (*GraphGateway, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using this as a thread anchor for any misc questions, but overall this code is very much WIP (with a bunch of copy-paste) to try and get this all to work reasonably.
There's a lot of cleaning up and improvements to do here. Here are a few, but feel free to recommend more.
- Incorrect internal abstraction usage
- Caboose is using a blockstore, which means we have a proxy blockstore, but these are both the wrong abstraction. At best they are exchange.Fetcher's. This starts bleeding through the codebase here. Let's wrap them up nicely and isolate some of the crazy.
- Copy pasted code from the BlocksGateway
- We should separate out some of the blocks gateway components (e.g. the mutability pieces have nothing to do with blocks and should be separately usable)
Some todo item's to consider in other refactors that were noticed here:
- go-path's reliance on go-fetcher to do sessions is .... bad. It effectively forces you to break up the session into "path" + other.
- go-fetcher's requirement on using a blockservice is overkill, I basically had to copy-paste it so I could give it a NodeGetter, which is fine for now. Although even that should probably be an exchange.Fetcher rather than a NodeGetter
redundant / caused 404 on some gateways
Makes it easy to test both locally and on staging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Pushed small fix (details below), added some debug logs, and switched to env variable
GRAPH_BACKEND=true
. - It is possible to test this against Saturn without caboose by leveraging DNS router at https://l1s.strn.pl.
- this one-liner works pretty well for local testing:
GOLOG_LOG_LEVEL="info,bifrost-gateway=debug,caboose=debug" PROXY_GATEWAY_URL="https://l1s.strn.pl" GRAPH_BACKEND=true go run .
- this one-liner works pretty well for local testing:
- The code delta is too big to assume we are still protected by Kubo sharness tests.
Until we have a decent test coverage in gateway-conformance test suite (being added in Run new gateway-conformance tests in CI #63) we will be running blind, with potential bugs.
Planning test on Rhea staging
While this is by no means safe to run on production,
I feel the GRAPH_BACKEND=true
experience with https://l1s.strn.pl is good enough to give this a try on Rhea staging next week.
For that, we need to double check wiring up in caboose.
My understanding of that work is:
- decide if we need to clean up
CarFetcher
API, or leave as-is - inspect how caboose implements and exposes
CarFetcher
interface fromlib/graph_gateway.go
(poc inblockstore_proxy.go
) - add/inspect car fetch metrics for
CarFetcher
- add/inspect car fetch metrics to grafana so we can observe things like size/TTFB/TTLB of CAR fetches in grafana
type DataCallback = caboose.DataCallback | ||
|
||
type CarFetcher interface { | ||
Fetch(ctx context.Context, path string, cb DataCallback) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💭 path
is concatenation of IPFS Content Cath with CAR format params. This smells funny, may lead to issues, but maybe ok for now – we can revisit if gateway tests ever start failing for percent-encoded things.
Potential way to clean this up, remove URL params ?format=..
from path
and have a separate carParams url.Values
with optional depth
and bytes
(and any future ones).
This way path is always Content Path, carParams
is unambiguous, and format
param becomes implicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, this is pretty messy. Also I got it wrong here (not that it matters because there's no behavior change) since Caboose's implementation of Fetch
uses the CAR accept header anyway.
As described above we should clean up and define the abstractions we want here and then pass them onto Caboose or our HTTP proxy calls.
We can use url.Values
or be more specific with types, likely doesn't matter much 🤷.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think url.Values
is good enough, because Values.Encode()
will preserve order.
Also fine to have own type, but the serialization is important: requests for the same thing will have parameters in the same order, and caches that use URL as cache key will work as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found a memory related hiccup:
- memory issue during load test
- using caboose (
STRN_ORCHESTRATOR_URL="https://orchestrator.strn.pl/nodes/nearby?count=1000" GRAPH_BACKEND=true BLOCK_CACHE_SIZE=2
) and runningab -k -n 20000 -c 1000 -w "http://en.wikipedia-on-ipfs.org.ipns.localhost:8081/wiki/Science"
shows that there is a memory issue, after ~18k requestsab
hangs and dies:
Completed 18000 requests apr_pollset_poll: The timeout specified has expired (70007) Total of 19949 requests completed 0.34s user 0.71s system 0% cpu 4:53.76 total
- using caboose (
If bifrost-gateway
is not killed by OOM, it continues to run, new requests work fine, but ~10GiB of memory is still used by something.
dac5cbc
to
f06dc97
Compare
19s was way too low: filecoin-saturn/caboose@aaf9ff9
EOD update:
TODO before we merge and release to stagingPlan for Tue:
|
lib/graph_gateway.go
Outdated
//TODO: interfaces here aren't good enough so we're getting around the problem this way | ||
runtime.SetFinalizer(gr, func(_ *gateway.GetResponse) { closeFn() }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lidel some of this is due to the implementation concerns, but others are from the gateway API. Will raise a PR in boxo to discuss
- IPFSBackend metrics from ipfs/boxo#245 - GraphGateway metrics from #61 - Version for tracking rollouts
EOD update:
Next
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Range requests are fixed, metrics are added.
This can be merged on Monday if we want to unblock deployment to staging,
but would like someone to look at the upstream ipfs/boxo#245 if possible.
My plan for today (after meetings):
|
This is the updated tag after undesired binary got merged.
Based on ipfs/boxo#176, and ipfs/boxo#245, part of #62
Update 3-27-23:
The general implementation iteration plan is:
We have completed the first and most of the second.
Blockers for merging + testing this PR more widely are:
Finishing the basics of phase 2 will likely also happen and will help with memory pressure.