Skip to content

Commit 04dc035

Browse files
authored
Merge pull request #147 from DistributedScience/metadata_clarification
Clarify how to pass metadata in various DCP modes
2 parents 851e295 + 9c7634c commit 04dc035

File tree

1 file changed

+12
-5
lines changed

1 file changed

+12
-5
lines changed

documentation/DCP-documentation/passing_files_to_DCP.md

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,15 @@
22

33
Distributed-CellProfiler can be told what files to use through LoadData.csv, Batch Files, or file lists.
44

5+
## Metadata use in DCP
6+
7+
Distributed-CellProfiler requires metadata and grouping in order to split jobs.
8+
This means that, unlikely a generic CellProfiler workflow, the inclusion of metadata and grouping are NOT optional for pipelines you wish to use in Distributed-CellProfiler.
9+
- If using LoadData, this means ensuring that your input CSV has some metadata to use for grouping and "Group images by metdata?" is set to "Yes".
10+
- If using batch files or file lists, this means ensuring that the Metadata and Groups modules are enabled, and that you are extracting metadata from file and folder names _that will also be present in your remote system_ in the Metadata module in your CellProfiler pipeline.
11+
You can pass additional metadata to CellProfiler by `Add another extraction method`, setting the method to `Import from file` and setting Metadata file location to `Default Input Folder`.
12+
Metadata of either type can be used for grouping.
13+
514
## Load Data
615

716
![LoadData.csv](images/LoadDataCSV.png)
@@ -58,15 +67,13 @@ Note that if you do not follow our standard file organization, under **#not proj
5867

5968
## File lists
6069

61-
You can also simply pass a list of absolute file paths (not relative paths) with one file per row in `.txt` format.
62-
Note that file lists themselves do not associate metadata with file paths (in contrast to LoadData.csv files where you can enter any metadata columns you desire.)
63-
Therefore, you need to extract metadata for Distributed-CellProfiler to use for grouping by extracting metadata from file and folder names in the Metadata module in your CellProfiler pipeline.
64-
You can pass additional metadata to CellProfiler by `Add another extraction method`, setting the method to `Import from file` and setting Metadata file location to `Default Input Folder`.
70+
You can also simply pass a list of absolute file paths (not relative paths) with one file per row in `.txt` format.
71+
These must be the absolute paths that Distributed-CellProfiler will see, aka relative to the root of your bucket (which will be mounted as `/bucket`.
6572

6673
### Creating File Lists
6774

6875
Use any text editing software to create a `.txt` file where each line of the file is a path to a single image that you want to process.
6976

7077
### Using File Lists
7178

72-
To use a file list with submitJobs, put the path to the `.txt` file in **data_file:**.
79+
To use a file list with submitJobs, put the path to the `.txt` file in **data_file:**.

0 commit comments

Comments
 (0)