Skip to content

Commit

Permalink
Merge pull request #131 from djarecka/update_readme
Browse files Browse the repository at this point in the history
updating README files
  • Loading branch information
djarecka authored Jan 15, 2025
2 parents 87f4b32 + 68e9382 commit 64d421d
Show file tree
Hide file tree
Showing 6 changed files with 152 additions and 48 deletions.
79 changes: 39 additions & 40 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,42 +1,21 @@
# BICAN Knowledgebase Data Models

This repo contains data models generated using [LinkML](https://linkml.io/linkml/) for BICAN knowledgebase. LinkML is a linked data modeling language that supports YAML, RDF, and JSON.
This repo contains data models generated using [LinkML](https://linkml.io/linkml/) for the [BICAN knowledgebase project](https://www.portal.brain-bican.org/teams/bican-knowledgebase) founded by the National Institute of Mental Health.

In `/linkml-schema`, there are linkML `yaml` schema files that adhere to the linkML version of `1.5.0`.
* The __Library Generation Model__ is designed to represent types and relationships of samples and digital data assets generated during processes that generate multimodal genomic data.
* The __Genome Annotation Model__ is designed to represent types and relationships of an organism's annotated genome i.e. gene annotations, genome annotations, genome assemblies, organisms.
* The __Anatomical Strucutre Model__ is designed to represent types and relationships of anatomical brain structures.
* The existing models such as __Biolink__ and __CCN__ are imported.

In `/json-schema` and `/models_py`, there are `json` and `py` files generated using linkML schema for e.g.:
* `gen-json-schema linkml-schema/genome_annotation.yaml > json-schema/genome_annotation.json`
* `gen-pydantic linkml-schema/genome_annotation.yaml > models_py/genome_annotation.py`

In `/data-examples`, there are source data files:
* `figure1exampledata.yaml` is representing relational data in figure 1 of `A high-resolution transcriptomic and spatial atlas of cell types in the whole mouse brain`
* https://www.biorxiv.org/content/10.1101/2023.03.06.531121v1.full.pdf

Notes:
Initialize the packages using:
* `python -m pip install .`
* `python -m pip install .[test]`

Run `pytest` to run all tests in `/tests`

## Status Board

Here are the BICAN LinkML knowledgebase schemas and their statuses.

| Model | Version | Release | Status |
|:--|:--|:--|:--|
| [Assertion Evidence Model] | [] | [] | under development |
| [Library Generation Model] | [] | [] | under development |
| [Anatomical Structure Model] | [] | [] | under development |
| [Genome Annotation Model] | [] | [] | under development |
| [BICAN BioLink] | [] | [] | under development |
| [CCN2] | [] | [] | deprecated |
| [Figure1] | [] | [] | deprecated |
| | | | |
| Model | Short Description | Release with Latest Updates | Status |
|:---------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------|:----------------------------|:--|
| [Assertion Evidence Model] | Types and relationships of assertions and evidences/ | 0.2.0 | under development |
| [Library Generation Model] | Types and relationships of samples and digital data assets generated during processes that generate multimodal genomic data. | 0.2.0 | under development |
| [Anatomical Structure Model] | Types and relationships of anatomical brain structures. | 0.1.0 | under development |
| [Genome Annotation Model] | Types and relationships of an organism's annotated genome. | 0.2.0 | under development |
| [BICAN BioLink] | BICAN subset of classes from the Biolink model. | 0.2.0 | under development |
| [CCN2] | | 0.1.0 | deprecated |
| [Figure1] | | 0.1.0 | deprecated | | | |

[Assertion Evidence Model]: linkml-schema/assertion_evidence.yaml

Expand All @@ -52,23 +31,43 @@ Here are the BICAN LinkML knowledgebase schemas and their statuses.

[Figure1]: linkml-schema/figure1.yaml

## Contact
## Structure of the Repository

### LinkML Schema

In this project we use the [LinkML](https://linkml.io/linkml/), the Linked Data Modeling Language, to define the data models.
LinkML is a flexible modeling language that allows you to author schemas in YAML format, and it is designed to be both human-readable and machine-readable.
LinkML framework provides also a set of tools to generate code in different languages, such as Python, JSON, and RDF, from the schema files.

Satra Ghosh (PI -- MIT)
All the LinkML schema files are stored in the `linkml-schema` directory.
Some of the models are written directly in the YAML format, while others automatically generated from Google sheets using the LinkML tool [schemasheets](https://linkml.io/schemasheets/),
and the `schema2model` tool from the [`bkbit` package](https://github.com/brain-bican/bkbit).
You can find the specific information in [linkml-schema/README.md](linkml-schema/README.md).

Lydia Ng (PI -- Allen Institute for Brain Science)
### Additional Formats

Puja Trivedi (MIT)
The LinkML schema files are used to generate additional formats, such as JSON Schema and Pydantic models.
All files are generated automatically using GitHub Actions workflow whenever the LinkML schema files are updated.
You can see the specific workflow in the [reusable workflow](.github/workflows/reusable-generate_other_formats.yaml) that is reused for all models.

Dorota Jarecki (MIT)
Currently, we are supporting the following formats:
- [Pydantic models](models_py-autogen): these models are used in the [Brain Knowledge Base Interaction Toolkit (bkbit)](https://github.com/brain-bican/bkbit)
- [JSON Schema](json-schema-autogen)
- [JSON-LD Context](jsonld-context-autogen)
- [ER Diagrams](erdiagram-autogen)

Prajal Bishkawarma (Allen Institute for Brain Science)

Tim Fliss (Allen Insitute for Brain Science)

Pamela Baker (Allen Institute for Brain Science)
### Validation
All the schemas are automatically tested in the GitHub Actions workflow
using LinkML validation tools ([see test_lint.yaml for details](.github/workflows/tests_lint.yaml))
and Python API and pytest ([see test_models.yaml for details](.github/workflows/tests_models.yaml)).

Patrick Ray (Allen Institute for Brain Science)
In order to run the pytest test locally, you can use the following commands:
```python
pip install -e .[test]
pytest -vs
```

## Terms of Use

Expand Down
3 changes: 3 additions & 0 deletions erdiagram-autogen/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## Autogenerated ER diagrams

This directory contains the autogenerated Pydantic models from the [LinkML schemas](../linkml-schema).
3 changes: 3 additions & 0 deletions json-schema-autogen/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## Autogenerated JSON schemas

This directory contains the autogenerated Pydantic models from the [LinkML schemas](../linkml-schema).
3 changes: 3 additions & 0 deletions jsonld-context-autogen/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## Autogenerated JSON-LD contexts

This directory contains the autogenerated Pydantic models from the [LinkML schemas](../linkml-schema).
106 changes: 99 additions & 7 deletions linkml-schema/README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,107 @@
## `LinkML` schemas for the BICAN project:
# `LinkML` schemas for the BICAN project

### bican_biolink.yaml
The model contains a subset of classes from the [Biolink model](https://biolink.github.io/biolink-model/)
with some small modification to fit the needs of the BICAN project (currently only the `category` slot is modified).
This folder contains all the original LinkML schemas written in YAML format.
You can learn more about the LinkML language [here](https://linkml.io/linkml/).

The yaml file can be recreated by running the [LinkML trimmer](https://github.com/brain-bican/bkbit/blob/main/bkbit/model_editors/README.md)
from `bkbit` package.:
Some of the models are written directly in the YAML format, while others automatically generated from Google sheets using the LinkML tool [schemasheets](https://linkml.io/schemasheets/),
and the `schema2model` tool from the [`bkbit` package](https://github.com/brain-bican/bkbit).


## Main models

The list below contains the main models that are exported to different formats, such as JSON Schema, Pydantic models, and JSON-LD context.
We also have some additional auxiliary models that are used to extract the core types and used by the main models.

### [anatomical_structure](anatomical_structure.yaml)
The Anatomical Structure schema is designed to represent types and relationships between anatomical brain structures.

##### Updates
The model has been created directly in the YAML format and all the updates can be done by editing the file directly.


### [assertion_evidence](assertion_evidence.yaml)
The Assertion Evidence schema is designed to represent types and relationships between assertions and evidence items.

##### Updates
The model has been created from a Google sheet and all information of the Google sheet id and id of the specific tabs
are in the [setting file](source_assertion_evidence/gsheet.yaml).
The [source_assertion_evidence/gsheet_output](source_assertion_evidence/gsheet_output) folder contains the _cvs_ files generated from the Google sheet
at the time of the model creation.

In order to update the model, the Google sheet has to be edited,
and the [generate_yaml_model workflow](../.github/workflows/generate_yaml_model.yaml) has to be triggered manually.


### [genome_annotation](genome_annotation.yaml)
The Genome Annotation schema is designed to represent types and relationships between entities that constitute an organism's annotated genome.

##### Updates
The model has been created directly in the YAML format, and all the updates can be done by editing the file directly.

### [library_generation](library_generation.yaml)
The Library Generation schema is designed to represent types and relationships between samples and
digital data assets generated during processes that generate multimodal genomic data.

##### Updates
The model has been created from the Google sheet, all information of the Google sheet id and id of the specifics tabs
are in the [setting file](source_library_generation/gsheet.yaml).
The [source_library_generation/gsheet_output](source_library_generation/gsheet_output) folder contains the _cvs_ files generated from the Google sheet
at the time of the model creation.

In order to update the model, the Google sheet has to be edited,
and the [generate_yaml_model workflow](../.github/workflows/generate_yaml_model.yaml) has to be triggered manually.


## Auxiliary models

These models are used to extract the core types and used by the main models, you can see it in the `imports` sections.

### [anatomical_structure_core](anatomical_structure_core.yaml)

Contains the core types used in the [Anatomical Structure Schema](anatomical_structure.yaml).

##### Updates
The model has been created directly in the YAML format, and all the updates can be done by editing the file directly.


### [bican_biolink](bican_biolink.yaml)
The model contains a subset of classes from the [Biolink Model](https://biolink.github.io/biolink-model/)
with some modifications to fit the needs of the BICAN project (currently only the `category` slot is modified). The model
is created using the [LinkML Schema Trimmer](https://brain-bican.github.io/bkbit/linkml_trimmer.html) from the bkbit package. The Biolink Model was
trimmed to contain these classes: 'gene', 'genome', 'organism taxon', 'thing with taxon', 'material sample', 'procedure', 'entity', 'activity', 'named thing';
as well as respective dependency classes, slots, and enums to create BICAN Biolink.

##### Updates

The yaml file can be recreated by running the [LinkML Schema Trimmer](https://brain-bican.github.io/bkbit/linkml_trimmer.html)
from `bkbit` package:
```bash
TODO
$ bkbit linkml-trimmer --classes "gene, genome, organism taxon, thing with taxon, material sample, procedure, entity, activity, named thing" biolink.yaml > bican-biolink.yaml
```
In order to adjust the `category` slot, the following you can run:
```commandline
python ../utils/bican_biolink_edit.py bican_biolink.yaml
```

### [bican_core](bican_core.yaml)
The BICAN Core schema is designed to represent classes, slots, and enums that are frequently used in BICAN schemas.

##### Updates
The model has been created directly in the YAML format, and all the updates can be done by editing the file directly.

### [bican_prov](bican_prov.yaml)
The BICAN Prov schema contains a subset of classes from the Prov Data Model (PROV-DM) that are frequently used in BICAN schemas.

##### Updates
The model has been created directly in the YAML format, and all the updates can be done by editing the file directly.


## Deprecated models

These are models that are no longer used, but are kept for reference.

### [ccn2](ccn2.yaml)
A depreciated model, initial attempt to convert a CCN2 model to LinkML.

### [figure1](figure1.yaml)
A depreciated model, initial attempt to provide a schema for data presented on Figure1 from [Yao, Z. et al., _Nature_ 624 (2023)](https://www.nature.com/articles/s41586-023-06812-z#citeas).
6 changes: 5 additions & 1 deletion models_py-autogen/README.md
Original file line number Diff line number Diff line change
@@ -1 +1,5 @@
Autogenerated Pydantic models
## Autogenerated Pydantic models

This directory contains the autogenerated Pydantic models from the [LinkML schemas](../linkml-schema).

The Pydantic models are used in the [Brain Knowledge Base Interaction Toolkit (bkbit)](https://github.com/brain-bican/bkbit).

0 comments on commit 64d421d

Please sign in to comment.