Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about structure files of customized databases #68

Open
rileyjiang opened this issue Jul 30, 2024 · 6 comments
Open

Questions about structure files of customized databases #68

rileyjiang opened this issue Jul 30, 2024 · 6 comments

Comments

@rileyjiang
Copy link

Hi, I have a question about the structure files of customized databases. According to the Readme, the first column of the structure file should be sequences ID. Does this need to be unique? How can we deal with the situation that a sequence has several different types? For example, should I construct the structure file like:

level1 level2 level3
seq1 subtype1 type1(Ni)
seq1 subtype1 type2(Co)

or

level1 level2 level3
seq1 subtype1 type1(Ni),type2(Co)

Looking forward to your reply, Thank you!

@xinehc
Copy link
Owner

xinehc commented Jul 30, 2024

level1 need to be unique, so two seq1 is not allowed. Your second construction seems fine.

@rileyjiang
Copy link
Author

Thank you very much for your quick reply! Another question is that what's the difference between '--structure1(single component)', '--structure2(two-component)' and '--structure3 (multi-component)'. The help page does not detail this.

@xinehc
Copy link
Owner

xinehc commented Aug 13, 2024

--structure2 is for two-component systems so each component is weighted by 0.5, --structure3 by 1/3.

@rileyjiang
Copy link
Author

What do you mean by two-component systems? For example, I cannot see the difference between 'two-component_structure.txt' and 'multi-component_structure.txt' for the default database.

two-component:
截屏2024-08-13 14 25 38
multi-component:
截屏2024-08-13 14 28 57

Does it refer to the situation that a gene has two types or subtypes? And what's the influence on the result by using --structure2/--structure3?

@xinehc
Copy link
Owner

xinehc commented Aug 13, 2024

All genes listed in the two-component.txt file will be weighted by a factor of 0.5. The structure of the three files (single, two, multi) is identical, the only difference it the weight (1, 1/2, 1/3) applied when calculating the abundance.

@rileyjiang
Copy link
Author

Thanks, that's clear!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants