Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Loader #1

Open
andrealoddo opened this issue Jun 10, 2023 · 6 comments
Open

Data Loader #1

andrealoddo opened this issue Jun 10, 2023 · 6 comments

Comments

@andrealoddo
Copy link

Hi everyone, and thanks for sharing your outstanding contribution!

I would like to know how the datasets are supposed to be organized.
We are getting some troubles with the init function of the DataLoader because we need help understanding how the images should be managed. Can you help us with this? I really appreciate any help you can provide.

@RahelehSalehi
Copy link
Collaborator

Dear Andrealoddo,
Thanks for reaching out, I would be happy to know how our method performs on your data,
Your data loader class should have two main methods of get_item and len. If you have 1000 cells, "len" should return 1000, and "get item" returns the following for a given index between 0 and 999:

1- Feature vector obtained from Mask R-CNN (feats in code) with a shape of 256x14x14
2- Cropped RGB image of the cell using the bounding box obtained from Mask R-CNN (roi_cropped) with a shape of 3x128x128
3- The label of the given cell indicating which class it belongs to (label)
4- The dataset the cell comes from (ds). This should be one hot encoded vector of size 1 x number of datasets you have. in our case 1 x 3
5- I returned key which is the unique identifier of the cell for later retrieval. But not necessary for training.

Please check the get_item code for details about normalization of feature vectors and bounding boxes.
Based on how your data is stored, you need to rewrite the init function such that it can accommodate get_item calls. I wrote init function such that it loads everything into memory first and stores them in a dictionary data structure.

Best,
Raheleh

@costantin0
Copy link

Dear Raheleh,

Thank you for the prompt response. Absolutely we will let you know how your method performs with our data!

The indications you have provided are clear and we thank you for that.

However, we are stuck on the following point: the readme says "To train the model, please run train.py, then to extract the features, you can use FeatureExtraction.py code and finally to evaluate the quantitatively of the extracted features by AE-CFE, please run RandomForest.py.".
Therefore, we tried to train the model by running train.py. However, train.py calls the DataLoader and we have some troubles in lines 84 - 111 of DataLoader.py. In particular, what do the first two fors do? In lines 91-101, it seems that you tried to load feature data but it cannot be extracted using FeatureExtraction.py because it also needs the DataLoader. How can we address this issue?

Many thanks again,
for your help and response,
Best,
Costantino and Andrea

@RahelehSalehi
Copy link
Collaborator

Dear Andrea,
Sorry for the misunderstanding about the extracted features' names. 
Here is an explain in a little bit more details:

  • We trained the MaskRCNN on almost 1000 images of the Matek-19 dataset. Then feature vector was obtained from Mask R-CNN with a shape of 256x14x14.  Unfortunately, the trained model and the extracted Mask R-CNN feature code are not in the github. 
  • Feature Extraction code in github repo extracts the features representations from the latent space when the model is trained in an unsupervised way. So these are two different sets of feature vectors with different sizes and sources. Then random forest model classifies these latent representations . 

I hope this helps. Please don't hesitate if you have any further questions.

@costantin0
Copy link

costantin0 commented Jul 7, 2023

Hi Raheleh,

Thanks for uploading the feature code here, it will be of great help to us.
There is another thing that we would like to ask you. We are having some trouble with the program dependencies, since the latest versions of tensorflow and of other modules are giving us errors (for example, the latest version of tensorflow gives the error "no module named keras.engine", while the version 2.12 does not).
Would it be possible to also have a requirements.txt file and know the Python version that you are using, in order to avoid any sort of ambiguity?

Many thanks again

@RahelehSalehi
Copy link
Collaborator

RahelehSalehi commented Jul 10, 2023

Hi,
I updated the README file. Please find it there.
Best,
Raheleh

@RahelehSalehi
Copy link
Collaborator

RahelehSalehi commented Jul 10, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants