-
Notifications
You must be signed in to change notification settings - Fork 15
Description
Originally reported here:
transientskp/trap-ng#1 (comment)
Hi @gijzelaerr I solved myself, but thank you for the quick response! Really appreciated.
you're welcome.
I'm in the process of taking the tkp pipeline and banana webapp over, and make it more usable for the the astro-physics groups here at the uni of Sydney.
That sounds amazing. Radio only or also other freqs? Do you want to fork it or contribute to the code? Are you in contact with any other astronomers who have been involved?
Please understand that I'm not deep in the understanding of the whole trap processes, but I have a general understanding of the whole pipeline, step involved (e.g. source extraction, associations - one to one, etc. -, force extraction, etc). Also consider that we are dealing with very big images (min size of 700 MB, approx 30k x 30k pixels).
Take into account that I am familiar with Dask parallelization (I build a pipeline before), and Web Dev in Django (Front end, Back End, ORM, etc.),
Great. That is what is needed.
so here are the questions:
1 - seems that the tkp database schema (ORM) has been design with taking in mind a Django App, is that right?
Yes and no. The django app came after, and is just made to visuale te results
2 - wouldn't be better separate the pipeline from the web app?
It is already.
3 - why don't run the pipeline and store the end result in a flat table? What about a pandas dataframe "friendly" storage solution (e.g. parquet files)?
If you store everything in a flat table you will get a lot of data duplication.
4 - how the user permission on database are handled? Trap uses a different database per project, why is that, don't want to keep all in one database and let Django administrate the user permissions? (e.g. Django have a user table with permission on project, set in the admin page)
This architecture is the result of evolution. I think the biggest reason is speed. A database becomes extremely slow after processing 10K images or so, so you need to make a new database. For this reasons and others, i'm of the opinion a new version of trap should not use the database for source association but keep an in memory model to this, only to keep the final light curve in the permanent storage (database).
5 - the source association mechanism can be simplified, based on the trap-ng (next gen trap)
jupyter notebook using Astropy stuff?
Maybe. I've been playing around with that the last week I worked for the uni of amsterdam. That source finder is optimised for optical data though, and pyse is oriented for radio data.
6 - do you have any suggestions about the pipeline architecture for a re-implementation of the pipeline for batch processing of images?
Did you read my TODO document?
https://github.com/transientskp/tkp/blob/master/TODO.md
In short what I think needs to happen is:
- Move the source association outside the database
- Parallellise independent tasks with something like dask
- Restructure database setup since it is too complex now.
If I was you I would start over with a fresh Python 3 project and cherry-pick the elements that look usable. There is a lot of evolutionary code in the TKP repo that is not required anymore.