Skip to content

Conversation

@MaximeBICMTL
Copy link
Contributor

@MaximeBICMTL MaximeBICMTL commented Oct 30, 2025

Replaces #1211

BIDS dataset import script rewrite

List of new features

  • Fully incremental BIDS import. That is, a BIDS dataset can be imported in several parts, and only the files that have not been imported yet will be added to LORIS.
  • Better logging and error reporting. Prints and errors are now logged in the LORIS-MRI logging system. Moreover, additional information is given during the script run, and previously silent errors (such as unidentified scan types) are now signaled to the user.
  • Better support for sessionful and sessionless BIDS datasets. Before, some code paths were only implemented for one of those case, now the system is generic over the presence or absence of BIDS session directories, ensuring better support for both these cases.
  • The --create_candidate or --create_session options are now transactional. Either all candidates and sessions are created without error, or none is created if an error occurs (due to incorrect input data).
  • The --create_candidate option now looks for sites and projects using both names and aliases, in that order.

List of breaking changes

  • The script has been renamed from bids_import.py to import_bids_dataset.py. This should better follow the LORIS-MRI scripts naming conventions.
  • An error is now returned for MRI files if the --type=derivatives or --no_copy options are used. It seems to me that the current code was not tested and probably not working, so it is better to have a hard error rather than an untested and potentially wrong implementation IMO.
  • The --idsvalidation option has been removed. This option allowed to validate the BIDS directory name according to a pscid_candid_blablabla format. This is obviously esoteric and overly-specific as BIDS dataset can contain several participants, and their identifiers should be retrieved from the subject directory names or the participants.tsv file instead.
  • The --create_candidate option no longer tries to find the site by matching site aliases against participants IDs. This approach can give false positives if a similar (and not even necessarily equal) alias as a project. The site participants.tsv column is now mandatory to create candidates.

List of major architectural changes

Note that these changes only apply to the generic BIDS import code (subjects and sessions), and the MRI import code. It does not apply to EEG import code.

  • The code has been fully rewritten.
  • The code is now fully typed 😌.
  • The code now uses the new ORM database abstraction instead of the old database abstraction or inline queries.
  • The code now uses the LORIS-MRI log abstraction instead of print.
  • The BIDS dataset import code has been moved and isolated in the new lib.import_bids_dataset module.
  • Created the new lib.imaging_lib module to replace the old lib.imaging module. Unlike lib.imaging, the new module is fully typed, uses the new ORM database abstraction, and is divided into submodules related to different kinds of imaging data (BIDS, DICOM, file parameters...).
  • Candidates and sessions are obtained at the subject and session levels. This was previously done at the data type level like before and introduced significant duplication between MRI and EEG code and nonsensical dependencies.

Future works

  • Migrate the EEG code.
  • Add an overwrite option to replace the existing files by the new ones if present.
  • Remove the default visit label configuration option and make it an argument (along with default site, default cohort...) ?

@MaximeBICMTL MaximeBICMTL force-pushed the refactor_bids_importer branch 5 times, most recently from e387a28 to 0850687 Compare October 30, 2025 21:00
@MaximeBICMTL
Copy link
Contributor Author

MaximeBICMTL commented Nov 6, 2025

From @cmadjar in #1324

Two questions coming from testing the BIDS import on CBIGR today, not sure if that has implications on the PRs sent to this repo:

  1. even if the bids validation is turned off, it requires dataset_description.json in the BIDS dataset to import. Is that expected?
  2. got an error that -b option does not exist when using -b short cut option. We had to use --nobidsvalidation long option instead
  3. the creation of session via the BIDS importer did not work (at jeffersoncasimir did you try the creation of visits and candidates when testing the PR?)

Maybe those are just happening on CBIGR but just in case, figured I would mention it here.

  1. Yes, this is expected. Even in the current importer, the dataset_description.json file is used to extract information about the BIDS dataset to do the import, independently of BIDS validation, see the relevant code of the current importer here:
    dataset_json = bids_dir + "/dataset_description.json"
    if not os.path.isfile(dataset_json) and not type:
    print('No dataset_description.json found. Please run with the --type option.')
    print(usage)
    sys.exit(lib.exitcode.MISSING_ARG)
  2. Whoops, this is indeed a small mistake on my part, thanks for catching it! This PR moves the importer from manual argument parsing to the LorisGetOpt object. In the process, it seems that I had renamed some arguments without updating the documentation. I had kind of forgotten about it TBH. Anyway, I undid all renamings in 6e25d28, so -b and short options work again.
  3. Whoops again. The code was missing some fields in the database insertion queries (the only ones that are not type checked yet...). This is now fixed. I just tested --create_candidate and --create_session and it now works for me. I also updated the code in C-BIG dev and C-BIG test if you want to test it there.

@maximemulder maximemulder changed the base branch from maxime_changes to main November 6, 2025 14:49
@MaximeBICMTL MaximeBICMTL marked this pull request as ready for review November 6, 2025 14:58
@MaximeBICMTL MaximeBICMTL force-pushed the refactor_bids_importer branch 2 times, most recently from 484a4a2 to de1d73b Compare November 6, 2025 15:30
@MaximeBICMTL MaximeBICMTL force-pushed the refactor_bids_importer branch from de1d73b to d2390c6 Compare November 15, 2025 10:45
@MaximeBICMTL MaximeBICMTL force-pushed the refactor_bids_importer branch from d2390c6 to 55d4303 Compare November 16, 2025 05:53
@github-actions github-actions bot added the Language: Python Issue or PR related to the Python codebase label Nov 16, 2025
@MaximeBICMTL MaximeBICMTL force-pushed the refactor_bids_importer branch from 2af7b5b to 55d4303 Compare November 16, 2025 06:07
@MaximeBICMTL MaximeBICMTL added Complexity: Complex Issue or PR that requires a great effort to implement, review, or test Pipeline: BIDS importer PR or issue related to the BIDS importer labels Nov 16, 2025
maximemulder and others added 7 commits November 21, 2025 13:23
bids session dataclass

bids participants dataclass

factorize combination iteration

fix layout ignore

skip files already inserted

wip

fix mri path join

rebase commit

migrate to new database abstraction

rewrite

optional dataset_description.json

return an error on unknown scan types

fix wrong counter and memory use
@MaximeBICMTL MaximeBICMTL force-pushed the refactor_bids_importer branch from f959945 to f708365 Compare November 21, 2025 13:24
@MaximeBICMTL MaximeBICMTL force-pushed the refactor_bids_importer branch 2 times, most recently from ea87fee to 97307c8 Compare November 22, 2025 05:34
@MaximeBICMTL MaximeBICMTL force-pushed the refactor_bids_importer branch from 97307c8 to 915f2b4 Compare November 22, 2025 06:05
@MaximeBICMTL
Copy link
Contributor Author

Current status: awaiting discussion on the modularization of LORIS Python (#1339) to decide how to merge this PR. I would personally like to merge the modularization first and then the BIDS importer refactor.

@MaximeBICMTL MaximeBICMTL marked this pull request as draft December 5, 2025 05:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Complexity: Complex Issue or PR that requires a great effort to implement, review, or test Language: Python Issue or PR related to the Python codebase Pipeline: BIDS importer PR or issue related to the BIDS importer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants