| SNPlocs-class {BSgenome} | R Documentation |
The SNPlocs class is a container for storing known SNP locations for a
given organism. SNPlocs objects are usually made in advance by
a volunteer and made available to the Bioconductor community as
"SNPlocs data packages".
See ?available.SNPs for how to get the list of
"SNPlocs data packages" curently available.
This man page's main focus is on how to extract information from a SNPlocs object.
snpcount(x)
snpsBySeqname(x, seqnames, ...)
## S4 method for signature 'SNPlocs'
snpsBySeqname(x, seqnames, drop.rs.prefix=FALSE)
snpsByOverlaps(x, ranges, maxgap=0L, minoverlap=0L,
type=c("any", "start", "end", "within", "equal"), ...)
## S4 method for signature 'SNPlocs'
snpsByOverlaps(x, ranges, maxgap=0L, minoverlap=0L,
type=c("any", "start", "end", "within", "equal"),
drop.rs.prefix=FALSE, ...)
snpsById(x, ids, ...)
## S4 method for signature 'SNPlocs'
snpsById(x, ids, ifnotfound=c("error", "warning", "drop"))
## Old API
## ------------------------------------
snplocs(x, seqname, ...)
## S4 method for signature 'SNPlocs'
snplocs(x, seqname, as.GRanges=FALSE, caching=TRUE)
snpid2loc(x, snpid, ...)
## S4 method for signature 'SNPlocs'
snpid2loc(x, snpid, caching=TRUE)
snpid2alleles(x, snpid, ...)
## S4 method for signature 'SNPlocs'
snpid2alleles(x, snpid, caching=TRUE)
snpid2grange(x, snpid, ...)
## S4 method for signature 'SNPlocs'
snpid2grange(x, snpid, caching=TRUE)
x |
A SNPlocs object. |
seqnames |
The names of the sequences for which to get SNPs. Must be a subset of
|
... |
Additional arguments, for use in specific methods. Arguments passed to the |
drop.rs.prefix |
Should the |
ranges |
One or more regions of interest specified as a GRanges
object. A single region of interest can be specified as a character string
of the form |
maxgap, minoverlap, type |
These arguments are passed to |
ids, snpid |
The RefSNP ids to look up (a.k.a. rs ids). Can be integer or character
vector, with or without the |
ifnotfound |
What to do if SNP ids are not found. |
seqname |
The name of the sequence for which to get the SNP locations and alleles. If |
as.GRanges |
|
caching |
Should the loaded SNPs be cached in memory for faster further retrieval but at the cost of increased memory usage? |
snpcount returns a named integer vector containing the number
of SNPs for each sequence in the reference genome.
snpsBySeqname, snpsByOverlaps, and snpsById return
a GRanges object with 1 element (genomic range)
per SNP and the following metadata columns:
RefSNP_id: RefSNP ID (aka "rs id"). Character vector
with no NAs and no duplicates.
alleles_as_ambig: A character vector with no NAs
containing the alleles for each SNP represented by an IUPAC
nucleotide ambiguity code.
See ?IUPAC_CODE_MAP in the
Biostrings package for more information.
Note that all the elements (genomic ranges) in this
GRanges object have their strand set to "+".
If ifnotfound="error", the object returned by snpsById
is guaranteed to be parallel to ids, that is, the i-th
element in the GRanges object corresponds to the
i-th element in ids.
Note that snplocs is superseded by snpsBySeqname, and
snpid2loc, snpid2alleles, and snpid2grange are
superseded by snpsById.
By default (i.e. when as.GRanges=FALSE), snplocs returns a
data frame with 1 row per SNP and the following columns:
RefSNP_id: Same as above but with "rs" prefix
always removed.
alleles_as_ambig: Same as above.
loc: The 1-based location of the SNP relative to the
first base at the 5' end of the plus strand of the reference
sequence.
Otherwise (i.e. when as.GRanges=TRUE), it returns a
GRanges object with metadata columns
"RefSNP_id" and "alleles_as_ambig".
snpid2loc and snpid2alleles both return a named vector
(integer vector for the former, character vector for the latter)
where each (name, value) pair corresponds to a supplied SNP id.
For both functions the name in (name, value) is the chromosome
of the SNP id. The value in (name, value) is the position of the
SNP id on the chromosome for snpid2loc, and a single IUPAC
code representing the associated alleles for snpid2alleles.
snpid2grange returns a GRanges object
similar to the one returned by snplocs (when used with
as.GRanges=TRUE) and where each element corresponds to a
supplied SNP id.
H. Pages
IUPAC_CODE_MAP in the Biostrings
package.
library(SNPlocs.Hsapiens.dbSNP141.GRCh38)
snps <- SNPlocs.Hsapiens.dbSNP141.GRCh38
snpcount(snps)
## ---------------------------------------------------------------------
## snpsBySeqname()
## ---------------------------------------------------------------------
## Get all SNPs located on chromosome 22 and MT:
snpsBySeqname(snps, c("ch22", "chMT"))
## ---------------------------------------------------------------------
## snpsByOverlaps()
## ---------------------------------------------------------------------
## Get all SNPs overlapping some regions of interest:
snpsByOverlaps(snps, "ch22:33.63e6-33.64e6")
## With the regions of interest being all the known CDS for hg38
## located on chr22 or chrMT (except for the chromosome naming
## convention, hg38 is the same as GRCh38):
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene
my_cds <- cds(txdb)
seqlevels(my_cds, force=TRUE) <- c("chr22", "chrMT")
seqlevelsStyle(my_cds) # UCSC
seqlevelsStyle(snps) # dbSNP
seqlevelsStyle(my_cds) <- seqlevelsStyle(snps)
genome(my_cds) <- genome(snps)
snpsByOverlaps(snps, my_cds)
## ---------------------------------------------------------------------
## snpsById()
## ---------------------------------------------------------------------
## Lookup some RefSNP ids:
my_rsids <- c("rs10458597", "rs12565286", "rs7553394")
## Not run:
snpsById(snps, my_rsids) # error, rs7553394 not found
## End(Not run)
snpsById(snps, my_rsids, ifnotfound="drop")