From 2022237400b8ea4b7d21eb68f0ea40b31cc1bb4d Mon Sep 17 00:00:00 2001 From: Amy Wooding Date: Tue, 1 Jun 2021 17:46:25 -0400 Subject: [PATCH] remove workshops based instructions from README --- {{ cookiecutter.repo_name }}/README.md | 59 -------------------------- 1 file changed, 59 deletions(-) diff --git a/{{ cookiecutter.repo_name }}/README.md b/{{ cookiecutter.repo_name }}/README.md index 2a3daea..a004a1a 100644 --- a/{{ cookiecutter.repo_name }}/README.md +++ b/{{ cookiecutter.repo_name }}/README.md @@ -23,27 +23,6 @@ EASYDATA REQUIREMENTS GETTING STARTED --------------- -### Initial Git Configuration and Checking Out the Repo - -If you haven't yet done so, please follow the instrucitons -in [Setting up git and Checking Out the Repo](reference/easydata/git-configuration.md) in -order to check-out the code and set-up your remote branches - -Note: These instructions assume you are using SSH keys (and not HTTPS authentication) with {{ cookiecutter.upstream_location }}. -If you haven't set up SSH access to {{ cookiecutter.upstream_location }}, see [Configuring SSH Access to {{cookiecutter.upstream_location}}](https://github.com/hackalog/easydata/wiki/Configuring-SSH-Access-to-Github). This also includes instuctions for using more than one account with SSH keys. - -Once you've got your local, `origin`, and `upstream` branches configured, you can follow the instructions in this handy [Git Workflow Cheat Sheet](reference/easydata/git-workflow.md) to keep your working copy of the repo in sync with the others. - -### Setting up your environment -**WARNING**: If you have conda-forge listed as a channel in your `.condarc` (or any other channels other than defaults), you may experience great difficulty generating reproducible conda environments. - -We recommend you remove conda-forge (and all other non-default channels) from your `.condarc` file and [set your channel priority to 'strict'](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-channels.html). Alternate channels can be specified explicitly in your your `environment.yml` by prefixing your package name with `channel-name::`; e.g. -``` - - wheel # install from the default (anaconda) channel - - pytorch::pytorch # install this from the `pytorch` channel - - conda-forge::tokenizers # install this from conda-forge - - ### Initial setup * Make note of the path to your conda binary: @@ -66,44 +45,6 @@ Now you're ready to run `jupyter notebook` (or jupyterlab) and explore the noteb For more instructions on setting up and maintaining your environment (including how to point your environment at your custom forks and work in progress) see [Setting up and Maintaining your Conda Environment Reproducibly](reference/easydata/conda-environments.md). -### Loading Datasets - -At this point you will be able to load any of the pre-built datasets by the following set of commands: -```python -from {{ cookiecutter.module_name }}.data import Dataset -ds = Dataset.load("") -``` -Because of licenses and other distribution restrictions, some of the datasets will require a manual dowload step. If so, you will prompted at this point and given instructions for what to do. Some datasets will require local pre-processing. If so, the first time your run the command, you will be executing all of the processing scripts (which can be quite slow). - -After the first time, data will loaded from cache on disk which should be fast. - -To see which datasets are currently available: -```python -from {{ cookiecutter.module_name }} import workflow -workflow.available_datasets(keys_only=True) -``` - -Note: sometimes datasets can be quite large. If you want to store your data externally, we recommend symlinking your data directory (that is `{{cookiecutter.repo_name}}/data`) to somewhere with more room. - -For more on Datasets, see [Getting and Using Datasets](reference/easydata/datasets.md). - -### Using Notebooks and Sharing your Work -This repo has been set up in such a way as to make: - -* environment management easy and reproducible -* sharing analyses via notebooks easy and reproducible - -There are some tricks, hacks, and built in utilities that you'll want to check out: [Using Notebooks for Analysis](reference/easydata/notebooks.md). - -Here are some best practices for sharing using this repo: - -* Notebooks go in the...you guessed it...`notebooks` directory. The naming convention is a number (for ordering), the creator’s initials, and a short - delimited description, e.g. `01-jqp-initial-data-exploration`. Please increment the starting number when creating a new notebook. -* When checking in a notebook, run **Kernel->Restart & Run All** or **Kernel->Restart & Clear Output** and then **Save** before checking it in. -* Put any scripts or other code in the `{{ cookiecutter.module_name }}` module. We suggest you create a directory using the same initials you put in your notebook titles (e.g. `{{ cookiecutter.module_name }}/xyz`) You will be able to import it into your notebooks via `from {{ cookiecutter.module_name }}.xyz import ...`. -* See the Project Organization section below to see where other materials should go, such as reports, figures, and references. - -For more on sharing your work, including using git, submitting PRs and the like, see [Sharing your Work](reference/easydata/sharing-your-work.md). - ### Quick References * [Setting up and Maintaining your Conda Environment Reproducibly](reference/easydata/conda-environments.md) * [Getting and Using Datasets](reference/easydata/datasets.md)