FeatureSelectionTensorFlow

This project focuses on developing a machine learning model that accurately classifies microbiome communities into their respective taxonomic ranks (e.g., Kingdom, Phylum, Class, Order, Family, Genus, Species) based on sequencing data. The goal is to predict the correct taxonomic classification for each community in a given dataset.

Data Inputs:

Features: Microbiome data represented by sequencing reads, operational taxonomic units (OTUs), or Amplicon Sequence Variants (ASVs). Labels: Taxonomic ranks for each sample, such as Kingdom, Phylum, Class, etc. Objectives: Accuracy: The model should correctly classify as many samples as possible. Precision: The model should minimize false positives for each taxonomic rank. Recall: The model should minimize false negatives for each taxonomic rank.

Challenges:

Class Imbalance: Certain taxonomic ranks might be overrepresented or underrepresented in the data, leading to class imbalance. Data Complexity: Microbiome data can be highly dimensional and sparse, making feature selection and model training challenging. Hierarchical Nature: The taxonomic ranks are hierarchical, meaning that errors at higher levels (e.g., Kingdom) may cascade down to lower levels (e.g., Species).

Data Collection and Preparation Using QIIME2

Install and Set Up QIIME2: Ensure you have QIIME2 installed on your system. You can install it using conda:

source ~/anaconda3/bin/activate
conda activate qiime2-amplicon-2023.9

Data Acquisition: Assign Taxonomy Using SILVA: Use the SILVA sequences and taxonomy files (silva-138-99-seqs.qza and silva-138-99-tax.qza) to assign taxonomy to your representative sequences:

qiime feature-classifier classify-sklearn \
  --i-classifier silva-138-99-seqs.qza \
  --i-reads rep-seqs.qza \
  --o-classification taxonomy.qza

Own Data: If you have your own sequencing data (e.g., FASTQ files), you can import it into QIIME2 following a similar process.

Train Naive Bayes Classifier: Assign taxonomy to your sequences using a Naive Bayes classifier trained on the SILVA database:

qiime feature-classifier fit-classifier-naive-bayes \
  --i-reference-reads silva-138-99-seqs.qza \
  --i-reference-taxonomy silva-138-99-tax.qza \
  --o-classifier silva-138-99-classifier.qza

qiime feature-classifier classify-sklearn \
  --i-classifier silva-138-99-classifier.qza \
  --i-reads rep-seqs.qza \
  --o-classification taxonomy.qza

Export Data for Machine Learning: Export the feature table and taxonomy data for use in TensorFlow:

qiime tools export \
  --input-path table.qza \
  --output-path exported-feature-table

qiime tools export \
  --input-path taxonomy.qza \
  --output-path exported-taxonomy

Convert to CSV: Convert the QIIME2 outputs to a CSV format that TensorFlow can ingest:

biom convert \
  -i exported-feature-table/feature-table.biom \
  -o feature-table.csv \
  --to-tsv

qiime tools export \
  --input-path taxonomy.qza \
  --output-path taxonomy.tsv

Data Labeling: The taxonomy.tsv file will contain the taxonomic classifications for each sequence, which will serve as the labels for your model.
Data Splitting: Split your dataset into training, validation, and test sets using pandas in Python:

import pandas as pd
from sklearn.model_selection import train_test_split

data = pd.read_csv('feature-table.csv', sep='\t', index_col=0)
labels = pd.read_csv('taxonomy.tsv', sep='\t', index_col=0)

X_train, X_test, y_train, y_test = train_test_split(data, labels, test_size=0.2, random_state=42)

Plans for TensorFlow

After evaluating the performance of the Naive Bayes classifier, the next step is to develop a custom deep learning model using TensorFlow. The objectives for this model are:

Model Design: Create a neural network architecture tailored to the hierarchical nature of taxonomic classification. Training: Train the TensorFlow model on the same dataset, optimizing for accuracy, precision, and recall. Comparison: Compare the performance of the TensorFlow model with the Naive Bayes classifier to determine which method provides better classification results. Optimization: Fine-tune the TensorFlow model using techniques such as hyperparameter tuning, dropout, and batch normalization to improve performance. This process will involve experimenting with different neural network architectures and layers to best capture the complexity and hierarchical structure of the microbiome data.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FeatureSelectionTensorFlow

Data Inputs:

Challenges:

Data Collection and Preparation Using QIIME2

Plans for TensorFlow

About

Uh oh!

Releases

Packages

Metu-O/FeatureSelectionTensorFlow

Folders and files

Latest commit

History

Repository files navigation

FeatureSelectionTensorFlow

Data Inputs:

Challenges:

Data Collection and Preparation Using QIIME2

Plans for TensorFlow

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages