Skip to content
This repository has been archived by the owner on Mar 28, 2022. It is now read-only.

echoprint-inverted-index gives error with piping large dump file to it #10

Open
abb4s opened this issue Mar 1, 2017 · 2 comments
Open

Comments

@abb4s
Copy link

abb4s commented Mar 1, 2017

actually i wanted to test echoprint by its database : http://echoprint-data.s3.amazonaws.com/echoprint-dump-1.json
and i try to do this :
cat echoprint-dump-1.json|jq -r '.[].code' | echoprint-inverted-index index.bin
and it gives this error :
Traceback (most recent call last): File "/usr/local/bin/echoprint-inverted-index", line 19, in <module> create_inverted_index(streamer(sys.stdin), args.indexfile) File "/usr/local/lib/python2.7/dist-packages/echoprint_server/lib.py", line 57, in create_inverted_index for batch_index, batch in enumerate(split_seq(songs, 65535)): File "/usr/local/lib/python2.7/dist-packages/echoprint_server/lib.py", line 30, in split_seq item = list(itertools.islice(it, size)) File "/usr/local/lib/python2.7/dist-packages/echoprint_server/lib.py", line 78, in parsing_code_streamer yield decode_echoprint(line.strip())[1] File "/usr/local/lib/python2.7/dist-packages/echoprint_server/lib.py", line 42, in decode_echoprint unzipped = zlib.decompress(zipped) zlib.error: Error -5 while decompressing data: incomplete or truncated stream

i think it happens just when file is being larger , i tested it with small json files and it works.
any one encounter with this error ?
is this a bug or the problem is just mine ?

@fascinated
Copy link

This is likely happening because there is a null or a "" in the "code" field of one of the objects.

@jci
Copy link

jci commented Jul 1, 2018

Hi,
stumbled into this as well. Just put one egrep to avoid the nulls. This is what I did:

$ cat fingerprint.json | jq -r '.[].code' | egrep -v null$ | echoprint-inverted-index index.bin

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants