buildLoci
A utility to build gene loci (i.e., sets of overlapping transcripts) out of a transcript set.
Usage example:
bedtools intersect -s -wao -a inGTF -b inGTF | buildLoci.pl - > test.loci.gff
The input file is provided as first argument to the script. It consists in two ("left" and "right") GTF records per line, separated by a tab
. This is typically the standard output produced by bedtools intersect -wao -a inGTF -b inGTF
(inGTF
being a single GTF file).
The flexibility of bedtools
allows the user to build gene loci based on whatever definition they need, e.g., with or without respect to genomic strand (see bedtools
's -s
option).
- keepGeneid = If set, any
gene_id
values present in the input will be kept in the output, under attributegene_id_bkp
. - locPrefix (string) = When set, this parameter's value will be prepended to all gene_id values in the output (in the form of
<locPrefix>LOC_XXXXXXXXXX
)
One GTF line per unique "left" GTF record in the input, with a supplementary gene_id
attribute (in the form of LOC_XXXXXXXXXX
) appended to its 9th field.
Any gene_id value present in the input will be overwritten, except if the --keepGeneid
option is used.
Any pair of GTF records present within a line of input is assumed to represent overlapping features. Gene loci are then built based on these overlaps, i.e. the <transcript_id>s of both records are assigned the same arbitrary <gene_id> value in the output.
Although not strictly necessary, BEDTools is recommended, solely to provide input to buildLoci.pl
.
Julien Lagarde, CRG, Barcelona, contact [email protected]