| pileLettersAt {GenomicAlignments} | R Documentation |
pileLettersAt extracts the letters/nucleotides of a set of
reads that align to a set of individual genomic positions of interest.
The extracted letters are returned as "piles of letters" (one per
genomic position of interest) stored in an XStringSet
(typically DNAStringSet) object.
pileLettersAt(x, seqnames, pos, cigar, at)
x |
An XStringSet (typically DNAStringSet) object containing N unaligned read sequences (a.k.a. the query sequences) reported with respect to the + strand. |
seqnames |
A factor-Rle parallel to |
pos |
An integer vector parallel to |
cigar |
A character vector parallel to |
at |
A GRanges object containing the individual genomic
positions of interest. |
x, seqnames, pos, cigar must be 4 parallel
vectors describing N aligned reads.
An XStringSet (typically DNAStringSet)
object parallel to at (i.e. with 1 string per individual
genomic position).
H. Pages
DNAStringSet objects in the Biostrings package.
GRanges objects in the GenomicRanges package.
The stackStringsFromBam function
for stacking the read sequences (or their quality strings)
stored in a BAM file on a region of interest.
GAlignments objects.
cigar-utils for the CIGAR utility functions used internally
by pileLettersAt.
The SAMtools mpileup command available at http://samtools.sourceforge.net/ as part of the SAMtools project.
## Input
## - A BAM file:
bamfile <- BamFile(system.file("extdata", "ex1.bam", package="Rsamtools"))
seqinfo(bamfile) # to see the seqlevels and seqlengths
stackStringsFromBam(bamfile, param="seq1:1-21") # a quick look at
# the reads
## - A GRanges object containing Individual Genomic Positions Of
## Interest:
my_IGPOI <- GRanges(Rle(c("seq1", "seq2"), c(7, 2)),
IRanges(c(1:5, 21, 1575, 1513:1514), width=1))
## Some preliminary massage on 'my_IGPOI'
seqinfo(my_IGPOI) <- merge(seqinfo(my_IGPOI), seqinfo(bamfile))
seqlevels(my_IGPOI) <- seqlevelsInUse(my_IGPOI)
## Load the BAM file in a GAlignments object. We load only the reads
## aligned to the sequences in 'seqlevels(my_IGPOI)' and we filter out
## reads not passing quality controls (flag bit 0x200) and PCR or
## optical duplicates (flag bit 0x400). See ?ScanBamParam and the SAM
## Spec for more information.
which <- as(seqinfo(my_IGPOI), "GRanges")
flag <- scanBamFlag(isNotPassingQualityControls=FALSE,
isDuplicate=FALSE)
what <- c("seq", "qual")
param <- ScanBamParam(flag=flag, what=c("seq", "qual"), which=which)
gal <- readGAlignments(bamfile, param=param)
seqlevels(gal) <- seqlevels(my_IGPOI)
## Extract the read sequences (a.k.a. query sequences) and quality
## strings. Both are reported with respect to the + strand.
qseq <- mcols(gal)$seq
qual <- mcols(gal)$qual
nucl_piles <- pileLettersAt(qseq, seqnames(gal), start(gal), cigar(gal),
my_IGPOI)
qual_piles <- pileLettersAt(qual, seqnames(gal), start(gal), cigar(gal),
my_IGPOI)
mcols(my_IGPOI)$nucl_piles <- nucl_piles
mcols(my_IGPOI)$qual_piles <- qual_piles
my_IGPOI
## Finally, to summarize A/C/G/T frequencies at each position:
alphabetFrequency(nucl_piles, baseOnly=TRUE)