| filterVcf {VariantAnnotation} | R Documentation |
Filter Variant Call Format (VCF) files from one file to another
## S4 method for signature 'character'
filterVcf(file, genome, destination, ..., verbose = TRUE,
index = FALSE, prefilters = FilterRules(), filters = FilterRules(),
param = ScanVcfParam())
## S4 method for signature 'TabixFile'
filterVcf(file, genome, destination, ..., verbose = TRUE,
index = FALSE, prefilters = FilterRules(), filters = FilterRules(),
param = ScanVcfParam())
file |
A |
genome |
A |
destination |
A |
... |
Additional arguments, possibly used by future methods. |
verbose |
A |
index |
A |
prefilters |
A |
filters |
A |
param |
A |
This function transfers content of one VCF file to another, removing
records that fail to satisfy prefilters and
filters. Filtering is done in a memory efficient manner,
iterating over the input VCF file in chunks of default size 100,000
(when invoked with character(1) for file) or as
specified by the yieldSize argument of TabixFile (when
invoked with TabixFile).
There are up to two passes. In the first pass, unparsed lines are
passed to prefilters for filtering, e.g., searching for a fixed
character string. In the second pass lines successfully passing
prefilters are parsed into VCF instances and made
available for further filtering. One or both of prefilter and
filter can be present.
Filtering works by removing the rows (variants) that do not meet a criteria. Because this is a row-based approach and samples are column-based most genotype filters are only meaningful for single-sample files. If a single samples fails the criteria the entire row (all samples) are removed. The case where genotype filtering is effective for multiple samples is when the criteria is applied across samples and not to the individual (e.g., keep rows where all samples have DP > 10).
The destination file path as a character(1).
Martin Morgan mailto:mtmorgan@fhcrc.org and Paul Shannon mailto:pshannon@fhcrc.org.
fl <- system.file(package="VariantAnnotation", "extdata",
"chr22.vcf.gz")
destination <- tempfile()
pre <- FilterRules(list(isLowCoverageExomeSnp = function(x) {
grepl("LOWCOV,EXOME", x, fixed=TRUE)
}))
filt <- FilterRules(list(isSNP = function(x) info(x)$VT == "SNP"))
filtered <- filterVcf(fl, "hg19", destination, prefilters=pre, filters=filt)
vcf <- readVcf(filtered, "hg19")