Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xloader still always pending #240

Open
riccardoricciagid opened this issue Dec 31, 2024 · 6 comments
Open

xloader still always pending #240

riccardoricciagid opened this issue Dec 31, 2024 · 6 comments

Comments

@riccardoricciagid
Copy link

CKAN version

Describe the bug

After loading CSV file, tables built and populated correctly, in the ckan log appear 403 /api/3/action/xloader_hook and resources appear in "datastore pending" state.

Steps to reproduce

Try to upload a CSV file

immagine

@wardi
Copy link
Contributor

wardi commented Jan 7, 2025

Hello @riccardoricciagid can you provide some information about how you're running ckan? Is there any additional information in your logs?

If you have a reverse proxy or other more complicated networking environment you might need something like the ckanext.xloader.site_url configuration option from #234

@wardi wardi transferred this issue from ckan/ckan Jan 7, 2025
@wardi wardi removed their assignment Jan 7, 2025
@duttonw
Copy link
Collaborator

duttonw commented Jan 7, 2025

Hi @riccardoricciagid

What version of CKAN and what other plugins are installed.

Is the background jobs enabled?
Had all jobs been processed
Ckan jobs list

@fpichardom
Copy link

I have the same or a very similar issue, I think. I have a fresh install of CKAN 2.11.1 from source with the activity, scheming_datasets, xloader plugins and my custom plugin that adds a couple of dataset types with scheming. I checked that the datastore is working properly using the API to read and write. My instance is running with NGINX as a proxy from http://127.0.0.1:8080/ to my VPS IP address(for now). The service is running using wsgi and supervisor according to the default instructions. I have background workers running simply as can -c $CKAN_INI jobs worker, and it can successfully complete a test job using `can -c $CKAN_INI jobs test.

So my issue is, as originally described in this thread when I upload a new CSV file or click on Upload to Datastore via the UI, it just says pending and it is stuck there. Using debug mode in my uwsgi.ERR file I can see this when I click on the Upload to Datastore button:

2025-01-08 00:00:33,675 INFO  [ckanext.xloader.action] A pending task was found '748d2665-5137-40be-b398-64f1a1c1e0d8', but its not found in the queue [] and is 0:05:26.018433 hours old
2025-01-08 00:00:33,682 DEBUG [ckanext.xloader.action] Timeout for XLoading resource f54f63bc-540a-41d5-9c3f-c3ad4158923b is 3600
2025-01-08 00:00:33,683 INFO  [ckan.lib.jobs] Added background job c64e6bec-1abd-423d-807c-9aef45e6cccf ("xloader_submit: package: 39fd2cf6-366e-4cb5-8be6-be42e1ee026e resource: f54f63bc-540a-41d5-9c3f-c3ad4158923b") to queue "default"
2025-01-08 00:00:33,683 DEBUG [ckanext.xloader.action] Enqueued xloader job=c64e6bec-1abd-423d-807c-9aef45e6cccf res_id=f54f63bc-540a-41d5-9c3f-c3ad4158923b
...

When I try to force the process using ckan -c $CKAN_INI xloader submit --sync <PACKAGE_ID> I originally would get: 2025-01-08 00:10:19,095 ERROR [ckan.lib.api_token] Cannot decode JWT token: Not enough segments and nothing would happen. After I added in the ckan.ini the ckanext.xloader.site_url it still shows the error but it updates the datastore, but still it shows as pending in the UI.

When I added what @wardi suggested (ckanext.xloader.site_url with my VPS IP address) I would get something like:

2025-01-08 00:22:42,454 ERROR [ckan.lib.api_token] Cannot decode JWT token: Not enough segments
      ckanext.xloader.cli INFO  Express Load starting: /dataset/test-new-uploads/resource/8279ab2c-1c80-4059-b479-9afb61f56079
      ckanext.xloader.cli INFO  Fetching from: http://127.0.0.1:8080/dataset/39fd2cf6-366e-4cb5-8be6-be42e1ee026e/resource/8279ab2c-1c80-4059-b479-9afb61f56079/download/task_clusters.csv
      ckanext.xloader.cli INFO  Downloaded ok - 300.0 bytes
...

Still throws the Cannot decode JWT token error but it uploads the data to the datastore. In the UI it still appears as pending, and it still doesn't work naturally when uploading a file or clicking the Upload to Datastore button.

@duttonw
Copy link
Collaborator

duttonw commented Jan 8, 2025

Hmm,

I seems 2.11 now needs extra 'secrets' can you give this a go on both the ckan primary node and ensure that worker node has the same ckan.ini config (if on different instances)

echo "Setting beaker.session.secret in ini file"
          ckan config-tool $CKAN_INI "beaker.session.secret=$(python3 -c 'import secrets; print(secrets.token_urlsafe())')"
          ckan config-tool $CKAN_INI "SECRET_KEY=$(python3 -c 'import secrets; print(secrets.token_urlsafe())')"
          ckan config-tool $CKAN_INI "WTF_CSRF_SECRET_KEY=$(python3 -c 'import secrets; print(secrets.token_urlsafe())')"
          JWT_SECRET=$(python3 -c 'import secrets; print("string:" + secrets.token_urlsafe())')
          ckan config-tool $CKAN_INI "api_token.jwt.encode.secret=${JWT_SECRET}"
          ckan config-tool $CKAN_INI "api_token.jwt.decode.secret=${JWT_SECRET}"

https://github.com/duttonw/ckan-docker-base/blob/main/ckan-2.11/setup/start_ckan.sh#L16

@fpichardom
Copy link

Hmm,

I seems 2.11 now needs extra 'secrets' can you give this a go on both the ckan primary node and ensure that worker node has the same ckan.ini config (if on different instances)

echo "Setting beaker.session.secret in ini file"
          ckan config-tool $CKAN_INI "beaker.session.secret=$(python3 -c 'import secrets; print(secrets.token_urlsafe())')"
          ckan config-tool $CKAN_INI "SECRET_KEY=$(python3 -c 'import secrets; print(secrets.token_urlsafe())')"
          ckan config-tool $CKAN_INI "WTF_CSRF_SECRET_KEY=$(python3 -c 'import secrets; print(secrets.token_urlsafe())')"
          JWT_SECRET=$(python3 -c 'import secrets; print("string:" + secrets.token_urlsafe())')
          ckan config-tool $CKAN_INI "api_token.jwt.encode.secret=${JWT_SECRET}"
          ckan config-tool $CKAN_INI "api_token.jwt.decode.secret=${JWT_SECRET}"

https://github.com/duttonw/ckan-docker-base/blob/main/ckan-2.11/setup/start_ckan.sh#L16

I tried updating the ckan.ini with the above mentioned code and I'm still getting the same error 2025-01-08 02:42:22,455 ERROR [ckan.lib.api_token] Cannot decode JWT token: Not enough segments and still showing as pending in the UI. It seems that in the generated ckan.ini the other 'secrets' take the value of the the main SECRET_KEY by default. I generated the main SECRET_KEY with the code you shared, I believe.

WTF_CSRF_SECRET_KEY = string:%(SECRET_KEY)s
api_token.jwt.encode.secret = string:%(SECRET_KEY)s
api_token.jwt.decode.secret = string:%(SECRET_KEY)s

I'm also running the worker in the same instance using the same ckan.ini file

@riccardoricciagid
Copy link
Author

riccardoricciagid commented Jan 8, 2025

Hello @riccardoricciagid can you provide some information about how you're running ckan? Is there any additional information in your logs?

If you have a reverse proxy or other more complicated networking environment you might need something like the ckanext.xloader.site_url configuration option from #234

My CKAN run in a custom docker container.
immagine

Actually the URL is here:
ckan.site_url = http://192.168.2.20:5000
it will be changed when I expose the CKAN.

All wprkers are running...
immagine

Datasets not in CSV are uploaded via a CURL script (every dataset has >100 datasets). The datasets in CSV format are uploaded manually from the UI. These datasets was converted and written to DB by xloader correctly. No errors in logs only INFO about Chunks and ANALYZE table at end.

immagine

Strange warnng about the webassets.yml on ckanext-xloader... but this seems not be an important error.

CKAN Version: 2.11.1
Xloader Version: the last on https://github.com/ckan/ckanext-xloader.git (7 days ago).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants