Skip to content

Commit

Permalink
Merge pull request #7 from cellgeni/refactor_metadata_pulling
Browse files Browse the repository at this point in the history
Refactored metadata collection and fixes an issue with newly loaded datasets
  • Loading branch information
apredeus authored Jan 27, 2025
2 parents 41f1977 + 5a40dd2 commit b80355c
Show file tree
Hide file tree
Showing 60 changed files with 2,179 additions and 207 deletions.
57 changes: 57 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
name: Test Collect Metadata Script

on:
push:
branches:
- main
pull_request:

jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
include:
- series_id: GSE191067
subset_list: ""
comment: "No ENA metadata"
- series_id: GSE264508
subset_list: ""
comment: ".fastq and .sra files in ENA metadata"
- series_id: GSE274955
subset_list: ""
comment: "Broken .sra files. Files in .bam format available"
- series_id: GSE250130
subset_list: ""
comment: "No Project or SubProjects in soft_family file"
- series_id: E-MTAB-9221
subset_list: ""
comment: "Regular ENA dataset"
- series_id: GSE111360
subset_list: test_data/GSE111360/GSE111360.subset.list
comment: "Subset list provided"
- series_id: GSE117988
subset_list: ""
comment: "Crap .fastq files, but .sra files are available in ENA metadata"
- series_id: GSE160513
subset_list: ""
comment: "Regular GEO dataset"
- series_id: PRJNA511433
subset_list: ""
comment: "Regular GEO dataset but using BioProject"
name: "Test ${{ matrix.series_id }}: ${{ matrix.comment }}"

steps:
- name: Checkout repository
uses: actions/checkout@v3

- name: Set up environment
run: |
sudo apt-get update
sudo apt-get install -y wget perl curl jq
- name: Run metadata collection tests
run: |
chmod +x ./scripts/*
chmod +x ./tests/test_metadata.sh
./tests/test_metadata.sh ${{ matrix.series_id }} ${{ matrix.subset_list }}
1 change: 1 addition & 0 deletions reprocess_public_10x.sh
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ if [[ $SUBSET != "" ]]
then
>&2 echo "WARNING: Using file $SUBSET to only process select samples!"
SUBSET=`readlink -f $SUBSET`
cp $SUBSET $SERIES.subset.list
if [[ `grep "^GSM" $SUBSET` == "" && `grep "^SRS" $SUBSET` == "" && `grep "^ERS" $SUBSET` == "" ]]
then
>&2 echo "ERROR: The subset file $SUBSET can only contain GSM, SRS, or ERS IDs!"
Expand Down
Loading

0 comments on commit b80355c

Please sign in to comment.