Added more explanations

svarona · svarona · commit a91de21faf6d · 2024-07-10T16:44:23.000+02:00
diff --git a/docs/IRMA-code-explanation.md b/docs/IRMA-code-explanation.md
@@ -37,6 +37,26 @@ After checking dependencies, creating paths with the variables, setting options
 
 1. _Disk check_: Checks if there is enough space for the analysis. This step can be skipped with the parameter `ALLOW_DISK_CHECK` set to 0.
 2. _FastQconverter_: It is a custom Perl script. It handles quality and size filtering, trimming, and adapter masking.
+
+    - **Possible parameters**:
+      - `-T, --read-quality <threshold>`: Specify the read quality threshold (geometric mean or median).
+      - `-M, --use-median`: Interprets the threshold (-T) as the median, not the average.
+      - `-Q, --fastQ-output`: Outputs in fastQ format instead of fasta.
+      - `-L, --min-length <threshold>`: Minimum length of sequence data, default = 0.
+      - `-C, --complement-add`: Take the reverse complement and add to the data.
+      - `-O, --ordinal-headers`: Replace header with strict ordinals.
+      - `-F, --file-id <STR>`: File ID for ordinals.
+      - `-S, --save-quality <STR>`: Save quality file for back-mapping.
+      - `-A, --save-stats <STR>`: Save quality vs. length statistics file for analysis.
+      - `-K, --skip-remaining`: Do not output FASTA/FASTQ data (assumes -A).
+      - `-H, --keep-header`: Keep header as usual.
+      - `-c, --clip-adapter <STR>`: Clip adapter.
+      - `-m, --mask-adapter <STR>`: Mask adapter.
+      - `-Z, --fuzzy-adapter`: Allow one mismatch.
+      - `-U, --uracil-to-thymine`: Convert uracil to thymine.
+      - `-E, --enforce-clipped-length`: The minimum length threshold (-L) is enforced when adapter clipped (-c).
+      - `-R, --read-side <INT>`: If FASTQ header is in SRA format and missing a read identifier, alter the header.
+
     - It uses **options** from the configuration at this step such as:
       - `ADAPTER`: Transposase adapter sequence. To disable, set it as an empty string (ADAPTER=""). It trims 5′ on the forward adapter and 3′ on the reverse complement adapter. It can be applied to NextTera paired-end reads.
       - `FUZZY_ADAPTER`: If ADAPTER is set and FUZZY_ADAPTER is enabled, it also trims adapters with up to 1 mismatch.
@@ -230,6 +250,13 @@ As the default `SORT_PROG` is the same as the `MATCH_PROG`, it assumes that this
 
 6. _parseSORTresults_: Perl script. Analyzes the results generated by SORT (Sequence Occupancy Read Trace) to determine the occurrence count of each target sequence and the number of reads contributing to each occurrence (score). Additionally, the script processes information about the number of reads used to generate each occurrence, which appears to be encoded in sequence identifiers (ID) in the SORT_results.tab file. Certain filters are applied to determine which sequences are considered valid for further analysis. For example, you can specify a minimum read count (-C) and a minimum read pattern count (-D). You can also choose to ignore annotations in sequence identifiers (-G). If a list of patterns is provided (-P), the script divides sequences into groups based on these patterns and selects the best sequence from each group, as well as any secondary sequence that meets the filter criteria. Patterns can also be provided to ban sequences (-B), meaning these sequences will be excluded from analysis.
 
+    - **Possible parameters**:
+      - `-P, --pattern-list <STRING>`: Comma-separated list of patterns to group genes. Special case __ALL__ to select the top gene.
+      - `-G, --ignore-annotations`: Ignore annotations in target identifiers.
+      - `-C, --min-read-count <INTEGER>`: Minimum read count threshold for a target to be considered valid (default = 1).
+      - `-D, --min-read-patterns <INTEGER>`: Minimum read patterns threshold for a target to be considered valid (default = 1).
+      - `-B, --ban-list <STRING>`: Comma-separated list of patterns to ban specific genes`
+
     - Uses **options** from the configuration in this step such as:
       - `SORT_GROUPS`: Determines sorting groups for primary and secondary data.
       - `BAN_GROUPS`: Patterns not allowed.
@@ -718,7 +745,8 @@ This is done because it will store the score from the .sam file into a variable,
         ATGGAATCCAACACCATGTCAAGCTTTCAGGTAGACTGTTTTCTTTGGCATATTCGCAAGCGATTTGCAGACAATGGATTGGGTGATGCCCCATTCCTCGATCGGCTACGCCGAGATCAAAAGTCCTTAAAAGGAAGAGGCAACACCCTTGGCCTCGACATCAAAACAGCCACTCTTGTTGGGAAACAAATTGTGGAATGGATTTTGAAAGAGAAATCCAGCGAGACACTTAGAATGGCAATTGCATCTATACCTACTTCGCGTTACATTTCTGACATGACCCTCGAGGAAATGTCACGAGACTGGTTCATGCTTATGCCTAGGCAAAAGATAATAGGCCCTCTTTGCGTGCGATTGGACCAGGCGGTCATGGATAAGAACATAGTACTGGAAGCAAACTTCAGTGTAATCTTCAACCGATTAGAGACCTTGATACTACTAAGGGCTTTCACTGAGGAGGGAACAATAGTTGGAGAAATTTCACCATTACCTTCTCTTCCAGGACATACTTATGAGGATGTCAAAAATGCAATTGGGGTCCTCATCGGAGGACTTGAGTGGAATGGTAACACGGTTCGAGTCTCTGAAAATATACAGAGATTCGCTTGGAGAAGCTGTGATGAGAATGGGAGACCTTCACTACCTCCAGAGCAGAAATGAGAAATGGCGGGAACAATTGGGACAGAAATTTGAGGAAATAAGATGGTTAATTGAAGAAATACGACACAGATTGAAAGCGACAGAAAATAGTTTCGAGCAAATAACATTTATGCAAGCCTTACAACTACTGCTTGAAGTAGAGCAAGAGATAAGAGCTTTCTCGTTTCAGCTTATTTAA
         ```
 
-Steps 17 to 23 are repeated up to `MAX_ITER_ASSEM`.
+> [!NOTE]
+> Steps 17 to 23 are repeated up to `MAX_ITER_ASSEM`.
 
 ##### 2.2.2. Call variants: