Invoking the TAGGING_POLICY option, you can instruct the program to mark all the duplicates (All), only the optical duplicates (OpticalOnly), or no duplicates (DontTag). To do this, a new tag called the duplicate type (DT) tag was recently added as an optional output in the 'optional field' section of a SAM/BAM file. If you are not familiar with this type of annotation, please see the following blog post for additional information.Īlthough the bitwise flag annotation indicates whether a read was marked as a duplicate, it does not identify the type of duplicate. Duplicates are marked with the hexadecimal value of 0x0400, which corresponds to a decimal value of 1024. The tool's main output is a new SAM or BAM file, in which duplicates have been identified in the SAM flags field for each read. After duplicate reads are collected, the tool differentiates the primary and duplicate reads using an algorithm that ranks reads by the sums of their base-quality scores (default method). An BARCODE_TAG option is available to facilitate duplicate marking using molecular barcodes. The MarkDuplicates tool works by comparing sequences in the 5 prime positions of both reads and read-pairs in a SAM/BAM file. These duplication artifacts are referred to as optical duplicates. Duplicate reads can also result from a single amplification cluster, incorrectly detected as multiple clusters by the optical sensor of the sequencing instrument. See also EstimateLibrar圜omplexity for additional notes on PCR duplication artifacts. Duplicates can arise during sample preparation e.g. This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |