Data written to Vespa pass through document processing, where indexing is one example. Applications can add custom processing, normally done before indexing. This is done by adding a Document Processor. Such processing is synchronous, and this is problematic for processing that requires other resources with high latency - this can saturate the threadpool.
This application demonstrates how to use Progress.LATER and the asynchronous Document API. Summary:
- Document Processors: modify / enrich data in the feed pipeline
- Multiple Schemas: store different kinds of data, like different database tables
- Enrich data from multiple sources: here, look up data in one schema and add to another
- Document API: write asynchronous code to fetch data
Flow:
- Feed album document with the music schema
- Look up in the lyrics schema if album with given ID has lyrics stored
- Store album with lyrics in the music schema
Make sure you see at minimum 4 GB. Refer to Docker memory for details and troubleshooting:
$ docker info | grep "Total Memory" or $ podman info | grep "memTotal"
Install the Vespa CLI:
Using Homebrew:
$ brew install vespa-cli
You can also download Vespa CLI for Windows, Linux and macOS.
$ vespa config set target local
$ docker run --detach --name vespa --hostname vespa-container \ --publish 127.0.0.1:8080:8080 --publish 127.0.0.1:19071:19071 \ vespaengine/vespa
$ vespa status deploy --wait 300
$ vespa clone examples/document-processing myapp && cd myapp
$ mvn -U clean package
$ vespa deploy --wait 300
... and get the document after the feed as well:
$ vespa document src/test/resources/A-Head-Full-of-Dreams-lyrics.json
$ vespa document get id:mynamespace:lyrics::a-head-full-of-dreams
$ vespa document src/test/resources/A-Head-Full-of-Dreams.json
Get the document to validate - see lyrics in music document:
$ vespa document get id:mynamespace:music::a-head-full-of-dreams
Compare, the original document did not have lyrics - it has been added in the LyricsDocumentProcessor:
$ cat src/test/resources/A-Head-Full-of-Dreams.json
Inspect what happened:
docker exec vespa sh -c '/opt/vespa/bin/vespa-logfmt | grep LyricsDocumentProcessor'
...LyricsDocumentProcessor info In process
...LyricsDocumentProcessor info Added to requests pending: 1
...LyricsDocumentProcessor info Request pending ID: 1, Progress.LATER
...LyricsDocumentProcessor info In process
...LyricsDocumentProcessor info Request pending ID: 1, Progress.LATER
...LyricsDocumentProcessor info In handleResponse
...LyricsDocumentProcessor info Async response to put or get, requestID: 1
...LyricsDocumentProcessor info Found lyrics for : document 'id:mynamespace:lyrics::1' of type 'lyrics'
...LyricsDocumentProcessor info In process
...LyricsDocumentProcessor info Set lyrics, Progress.DONE
In the first invocation of process, an async request is made - set Progress.LATER
In the second invocation of process, the async request has not yet completed (there can be many such invocations) -
set Progress.LATER
Then, the handler for the async operation is invoked as the call has completed
In the subsequent process invocation, we see that the async operation has completed - set Progress.DONE
$ docker rm -f vespa