-
Notifications
You must be signed in to change notification settings - Fork 4
MatthewThe edited this page Nov 26, 2024
·
4 revisions
The indexing in the second column of the clustering output file (i.e. <sample>.clusters_p<pval_threshold>.tsv) depends on the file format of the original spectrum files:
- ms2 input: the scan number
- mgf input with SCANS field: the SCANS field
- mgf input without SCANS field: a zero-based index for the spectrum in its original file.
- mzML input: try to read the scan number from the title, if given as a formatted field like 'scan=1234', otherwise a zero-based index.
For the resulting consensus files, the scan number in the output spectrum file depends on the chosen output format, as well as the -S/--splitMassChargeStates flag:
- if the
-Sflag is not set, and the ms2 or mzML output format is used, the output scan number is a one-based index starting from the first cluster in the clustering output file. - if the
-Sflag is set and/or the mgf output format is used, it is this same one-based index as above, but multiplied by 100 plus an one-based precursor identifier. E.g. SCANS=2303 means it is the 3rd precursor of the 23rd cluster.
Each of the rows represents a link in the hierarchical clustering tree:
<file_idx_1> <scannr_1> <file_idx_2> <scannr_2> <p-value>
By default, MaRaCluster uses all available threads on the system. This can result in out of memory problems if your system has many threads relative to the available memory.
To change this, prepend the maracluster command with OMP_NUM_THREADS=<num_threads>, e.g.:
OMP_NUM_THREADS=2 maracluster ...