Skip to content

BioLM/silica

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

70 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

silica

Silica team repo for the Evolved-2024 BioML hackathon

Projects

Projects were split over lipase enzyme and EGFR binding nanobody design. Both projects consisted of two main components with the ultimate goal of synthesizing top sequences and providing more wet lab data for various ML methods. Candidate Generation: This included techniques such as constrained diffusion, inversefolding and genetic algorithms including the use of ESM3. These generated seqs can be found here Candidate ranking/scoring This included log probabilites computed by several pretrained methods as well as some structural based metrics over predicted binding compelexes and embedding based distance measures.

Lipase

Methods ProstT5 generation and Variant stability prediction

EGFR Nanobody Binders

Methods ESM3 Example and ESM3 for generation and both a ESM3 generation and filtering method with nanoBERT embeddings. AntiFold generation and scoring. MPNN inversefolding. Motif scaffolded diffusion generation. GA candidate generation. Nanobody structure prediction for self consistency. NanoBERT log probability for scoring

Scoring for Both Projects

ESM2 LP and ESM3 LP

Data

EGFR Affinity Data

source_data/adaptyv_biolm_egfr_data.csv contains data from Adaptyv Bio's EGFR Competition Round 1 (202 sequences), plus 36 additional nanobody sequences ordered by BioLM.

  • 1 ScFV control Cetuximab
  • 3 nanobody control therapeutics
  • Approximately 10 antibody binders
  • 2 peptide binders

Columns are described as:

  • sequence: the AA sequence ordered
  • replicate: the number of replicate affinity assays performed
  • expression: Adaptyv's assessment of protein expression (use combined_expression)
  • binding: Adaptyv's call on antibody-antigen binding (use combined_expression)
  • confidence: Adaptyv's assessment of confidence in calls
  • kd: Kd measurement for antibody-antigen
  • kon/koff: Kon and Koff for the same
  • normalized_sequence_lp: ESM2 sequence log probability normalized by length
  • source: Adaptyv Bio competition or BioLM order
  • metadata: temperature, recovery, scores, other data from BioLM sequence design
  • parent: closest known therapeutic antibody for BioLM design
  • sequence_lp: ESM2 sequence LP
  • tm_prediction: predicted Tms of sequence
  • aliphatic_index through mz: computed physiochemical properties from sequence
  • combined_expression and combined_binding: BioLM's assessment of sequence across replicates
  • iptm_tm through pi_score: computed multimer confidence, docking, other scores

About

Silica team repo for BioML hackathon

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 52.7%
  • Python 45.5%
  • HTML 1.6%
  • Other 0.2%