Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Complete_plants Dataset #100

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

Mohitkumar6122
Copy link
Contributor

Added U.S. Department of Agriculture's PLANTS Database - The Complete PLANTS http://www.plants.usda.gov/dl_all.html dataset.
In Short, this is a solution to Issue Add Agriculture datasets from awesome-public-datasets
.
@henrykironde Could you have a look at this PR?
Thanks.

@henrykironde
Copy link
Contributor

henrykironde commented Mar 17, 2021

@Mohitkumar6122 could you give me a command or a set of commands you used to test this PR and the results.

For example.

DeepTest(fix-module) $ retriever install sqlite iris
=> Installing iris
Downloading bezdekIris.data: 3.00B [00:00, 7.63B/s]                                                                                               
Installing iris_Iris
Progress: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 151/151 [00:00<00:00, 36192.92rows/s]
Done!

@Mohitkumar6122
Copy link
Contributor Author

@Mohitkumar6122 could you give me a command or a set of commands you used to test this PR and the results.

For example.

DeepTest(fix-module) $ retriever install sqlite iris
=> Installing iris
Downloading bezdekIris.data: 3.00B [00:00, 7.63B/s]                                                                                               
Installing iris_Iris
Progress: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 151/151 [00:00<00:00, 36192.92rows/s]
Done!

@henrykironde How to test a specific PR ?
Moreover doesn't PR tests are automatically done before merging? or are these tests different from those which you are specifying?

@henrykironde
Copy link
Contributor

@Mohitkumar6122, when you create a script, you should test if it actually works.
Some commands that I can run are
Check if the new script name shows up in the list
retriever ls.
Install the data into any engine
retriver install csv name-of-new-scrpt

@Mohitkumar6122
Copy link
Contributor Author

@henrykironde, Ok, so I tested this, and got this output :
Image

@henrykironde
Copy link
Contributor

Nice update.
So this script is great to actually learn the process of creating the script/data package for this data.

Now I will show you the same script.
I want you to compare the two scripts and change areas like the naming protocol.

File names use _
For example new_script.json.

Inside the script, the name is the same as the file name but with -

"name": "new-script",

The script you are working on is already in the retriever
https://github.com/weecology/retriever-recipes/blob/main/scripts/plant_taxonomy_us.json
Compare yours and find out the areas you want to improve.

Note: the script does not work because the url needs updating.
The over all goal is to show you how we go from raw data to script template, and populating the script to testing it.

once you are done, rename your script plant_taxonomy_us.json and push the changes. This will be the same as repairing the script.

Please also read the docs on creating a script

@henrykironde
Copy link
Contributor

Looks good, a few things we need to take care of.

  1. rename the file to plant_taxonomy_us.json.
    This will overwrite the old plant_taxonomy_us.json script.
  2. Since we are basically reparing the old script we should change the version number from "version": "1.1.3" to "version": "1.1.4".
  3. They run python version.py. This will update version.txt.
  4. Add all the changed files git add -u
  5. Commit and push.
  6. Once this is all ready for merge, we shall copy the same script to the weecology/retriever/scripts and update it too. run step 3 to 5 again.

@Mohitkumar6122
Copy link
Contributor Author

Looks good, a few things we need to take care of.

  1. rename the file to plant_taxonomy_us.json.
    This will overwrite the old plant_taxonomy_us.json script.
  2. Since we are basically reparing the old script we should change the version number from "version": "1.1.3" to "version": "1.1.4".
  3. They run python version.py. This will update version.txt.
  4. Add all the changed files git add -u
  5. Commit and push.
  6. Once this is all ready for merge, we shall copy the same script to the weecology/retriever/scripts and update it too. run step 3 to 5 again.

Sure I will do it :-).

@henrykironde
Copy link
Contributor

You have a miss spelling, taxonomy hence we have two files now

@Mohitkumar6122
Copy link
Contributor Author

@henrykironde , What are your suggestions ?

@henrykironde
Copy link
Contributor

Am yet to review this. I have some work that needs to be done but will get back to you soon. Planning for Monday afternoon.

@Mohitkumar6122
Copy link
Contributor Author

Am yet to review this. I have some work that needs to be done but will get back to you soon. Planning for Monday afternoon.

Sure @henrykironde whenever you like !

@henrykironde
Copy link
Contributor

@Mohitkumar6122 so could you run retriever install postgres plant-taxonomy-us and open the data in the postgres database server. Take a screenshot of the data

@Mohitkumar6122
Copy link
Contributor Author

@henrykironde, how should I access data stored in Postgres server ?

@Mohitkumar6122
Copy link
Contributor Author

This is the output i am getting at the terminal IMG

@henrykironde
Copy link
Contributor

Am going to fix this is about 2 hours

@Mohitkumar6122
Copy link
Contributor Author

Am going to fix this is about 2 hours

Any updates on this @henrykironde ?

@henrykironde
Copy link
Contributor

@Mohitkumar6122
Copy link
Contributor Author

I fixed this. Update your config file https://retriever.readthedocs.io/en/latest/developer.html#passwordless-configuration

I mean about this PR ?

@henrykironde
Copy link
Contributor

henrykironde commented Mar 30, 2021

@Mohitkumar6122 the data looks fine. Always include a screenshot of the installed data.
What we not have to do is to learn how to clean up the commit messages

If you have setup git as recommended just like in the retriever repository, and you have upstream set up in your .git/config file as below

[remote "upstream"]
	url = https://github.com/weecology/retriever-recipes.git
	fetch = +refs/heads/*:refs/remotes/upstream/*
	fetch = +refs/pull/*/head:refs/remotes/origin/pr/*

You should be able to clean up this Pr using the commands

git fetch upstream
git reset --soft upstream/main # Brute Force the branch to have your changes but also be at the last commit as upstream main
Python version.py
git add -u
# check that you have only to added filed 
git commit
git push origin Changes -f #force the push 

Screen Shot 2021-03-30 at 12 06 26 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants