Hi, thank you for putting together FLAb. It is a great resource. While working with the Phillips et al. 2021
binding affinity data (phillips2021binding_*.csv), I had a few questions about how the data was processed
and wanted to check my understanding.
- Genotype filtering
The four Phillips CSVs appear to contain only genotypes where the first position is '1' (i.e., the first
mutation is the somatic allele). For example, phillips2021binding_cr9114_h3_kd.csv has 32,768 rows, all
starting with '1', which is exactly 2^15 (half of the full 2^16 = 65,536 combinatorial library). The same
pattern holds for the other three files.
Was this filtering intentional? The original data from the paper
contains the full genotype space including the germline sequence (all-0 genotype). I could not find
documentation for this in the README or metadata files, so I wanted to confirm.
- Flu B antigen
Phillips et al. measured CR9114 binding against three antigens: H1, H3, and Flu B. The Flu B data does not
appear to be included in FLAb. Was this excluded deliberately (perhaps because only 198/65,536 variants show
measurable binding)?
- Metadata labels
In flab_metadata.csv, the Phillips entries list the assay as SPR Kd and the units as -log( Kd [nM]) Fab.
My reading of the paper is that the measurement method is Tite-Seq (flow cytometry + deep sequencing)
rather than SPR, and that the antibody format is scFv (single-chain variable fragment on yeast display)
rather than Fab. Could you confirm whether these labels are correct, or if they should be updated?
Thanks again for maintaining this resource. Happy to discuss further.
Hi, thank you for putting together FLAb. It is a great resource. While working with the Phillips et al. 2021
binding affinity data (
phillips2021binding_*.csv), I had a few questions about how the data was processedand wanted to check my understanding.
The four Phillips CSVs appear to contain only genotypes where the first position is '1' (i.e., the first
mutation is the somatic allele). For example,
phillips2021binding_cr9114_h3_kd.csvhas 32,768 rows, allstarting with '1', which is exactly 2^15 (half of the full 2^16 = 65,536 combinatorial library). The same
pattern holds for the other three files.
Was this filtering intentional? The original data from the paper
contains the full genotype space including the germline sequence (all-0 genotype). I could not find
documentation for this in the README or metadata files, so I wanted to confirm.
Phillips et al. measured CR9114 binding against three antigens: H1, H3, and Flu B. The Flu B data does not
appear to be included in FLAb. Was this excluded deliberately (perhaps because only 198/65,536 variants show
measurable binding)?
In
flab_metadata.csv, the Phillips entries list the assay asSPR Kdand the units as-log( Kd [nM]) Fab.My reading of the paper is that the measurement method is Tite-Seq (flow cytometry + deep sequencing)
rather than SPR, and that the antibody format is scFv (single-chain variable fragment on yeast display)
rather than Fab. Could you confirm whether these labels are correct, or if they should be updated?
Thanks again for maintaining this resource. Happy to discuss further.