Skip to content
MatthewThe edited this page Nov 26, 2024 · 4 revisions

How can I map the scan indices from the clustering output file back to the original spectra?

The indexing in the second column of the clustering output file (i.e. <sample>.clusters_p<pval_threshold>.tsv) depends on the file format of the original spectrum files:

  • ms2 input: the scan number
  • mgf input with SCANS field: the SCANS field
  • mgf input without SCANS field: a zero-based index for the spectrum in its original file.
  • mzML input: try to read the scan number from the title, if given as a formatted field like 'scan=1234', otherwise a zero-based index.

How can I map the scan indices of the consensus spectra to the clusters in the clustering output?

For the resulting consensus files, the scan number in the output spectrum file depends on the chosen output format, as well as the -S/--splitMassChargeStates flag:

  • if the -S flag is not set, and the ms2 or mzML output format is used, the output scan number is a one-based index starting from the first cluster in the clustering output file.
  • if the -S flag is set and/or the mgf output format is used, it is this same one-based index as above, but multiplied by 100 plus an one-based precursor identifier. E.g. SCANS=2303 means it is the 3rd precursor of the 23rd cluster.

What are the columns in the output files *.pvalue_tree.tsv?

Each of the rows represents a link in the hierarchical clustering tree:

<file_idx_1> <scannr_1> <file_idx_2> <scannr_2> <p-value>

How do I change the number of threads that MaRaCluster uses?

By default, MaRaCluster uses all available threads on the system. This can result in out of memory problems if your system has many threads relative to the available memory.

To change this, prepend the maracluster command with OMP_NUM_THREADS=<num_threads>, e.g.:

OMP_NUM_THREADS=2 maracluster ...