Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add function to read and format diann pg files #22

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 9 additions & 4 deletions .github/workflows/black.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,16 @@ jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Setup Python 3.8
uses: actions/setup-python@v2
- uses: actions/checkout@v3
- name: Setup Python 3.9
uses: actions/setup-python@v4
with:
python-version: "3.8"
python-version: "3.9"

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install black[colorama]==24.10.0
- name: Run black
uses: psf/black@stable
Expand Down
1 change: 1 addition & 0 deletions gopher/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
"""See the README for detailed documentation and examples."""

try:
from importlib.metadata import PackageNotFoundError, version

Expand Down
1 change: 1 addition & 0 deletions gopher/annotations.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
"""Get GO annotations."""

import uuid
from pathlib import Path

Expand Down
1 change: 1 addition & 0 deletions gopher/config.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
"""This module contains the configuration details for ppx"""

import logging
import os
from pathlib import Path
Expand Down
1 change: 1 addition & 0 deletions gopher/enrichment.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
"""Calculate the enrichments for a collection of experiments."""

import logging

import numpy as np
Expand Down
1 change: 1 addition & 0 deletions gopher/gopher.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
"""The command line entry point for gopher-enrich"""

import logging
from argparse import ArgumentParser

Expand Down
1 change: 1 addition & 0 deletions gopher/ontologies.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
"""Download the GO ontologies"""

from collections import defaultdict

from . import config, utils
Expand Down
1 change: 1 addition & 0 deletions gopher/parsers/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
"""The parsers"""

from .tabular import read_encyclopedia, read_metamorpheus
35 changes: 35 additions & 0 deletions gopher/parsers/tabular.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
"""Parse tabular result files from common tools"""

import pandas as pd


Expand Down Expand Up @@ -52,3 +53,37 @@ def read_metamorpheus(proteins_txt: str) -> pd.DataFrame:
.fillna(0)
)
return proteins


def read_diann(proteins_tsv: str) -> pd.DataFrame:
"""
Reads a DIANN-generated TSV file containing protein information, processes it,
and returns a cleaned Pandas DataFrame with relevant data.

The function:
- Extracts the first protein accession from the "Protein.Ids" column to use as the DataFrame index.
- Renames the index axis to "Protein".
- Drops unnecessary metadata columns.

Args:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Please use Numpy-Style docstrings https://numpydoc.readthedocs.io/en/latest/format.html

proteins_tsv (str): Path to the DIANN-generated TSV file.

Returns:
pd.DataFrame: A DataFrame with the processed protein data, indexed by the first protein accession.
The returned DataFrame excludes the following columns:
["Protein.Group", "Protein.Ids", "Protein.Names", "Genes", "First.Protein.Description"].
"""
proteins = pd.read_table(proteins_tsv)
accessions = proteins["Protein.Ids"].str.split(";").str[0]

proteins = proteins.set_index(accessions)
proteins = proteins.rename_axis("Protein", axis="index")
return proteins.drop(
columns=[
"Protein.Group",
"Protein.Ids",
"Protein.Names",
"Genes",
"First.Protein.Description",
]
)
1 change: 1 addition & 0 deletions gopher/stats.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
"""Numba Mann-Whitney U test"""

import numba as nb
import numpy as np
from scipy import stats
Expand Down
1 change: 1 addition & 0 deletions gopher/utils.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
"""Utility functions"""

import socket
from pathlib import Path

Expand Down
1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
"""Setup ppx"""

import setuptools

setuptools.setup()
1 change: 1 addition & 0 deletions tests/unit_tests/annotations_test.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
"""Test that the annotations functions are working correctly"""

import re

import pandas as pd
Expand Down
1 change: 1 addition & 0 deletions tests/unit_tests/enrichment_test.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
"""Test that the enrichment functions are working correctly"""

import random

import numpy as np
Expand Down
1 change: 1 addition & 0 deletions tests/unit_tests/test_version.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
"""Test that setuptools-scm is working correctly"""

import gopher


Expand Down
Loading