mapVariants.Rd
Method to map the variants in a gdb
to a set of ranges or features.
The input can be a set of ranges (CHROM, start, end), a bed-file or a gff/gtf-file.
Variants in the gdb will be mapped onto those ranges and annotated with the features/columns
included in the input file.
For example, variants can be easily mapped upon genomic features downloaded in gff format from ensembl.
The output can be written to disk (output
parameter) or directly uploaded to the gdb
(uploadName
parameter).
mapVariants(
object,
ranges = NULL,
gff = NULL,
bed = NULL,
bedCols = character(),
fields = NULL,
uploadName = NULL,
output = NULL,
sep = "\t",
skipIndexes = FALSE,
overWrite = FALSE,
verbose = TRUE
)
a gdb
object
Can be 1) a data.frame, including at least 'CHROM','start', and 'end' columns.
2) a GenomicRanges::GRanges
object. 3) a filepath to a ranges file containing at least 'CHROM','start', and 'end' columns.
Separator can be specified using the sep
parameter (defaults to \\t
).
Path to a gff- or gtf-file.
Path to a bed-file. Specify extra columns using the bedCols
parameter.
A character vector of names of the extra columns to read from the BED-file. Optionally the vector can be a named vector to indicate the classes of the columns (i.e. c("gene_id" = "character", "gene_name"="character")). If not named, all extra columns will be read as character columns (see examples).
Feature fields to keep. Defaults to NULL
in which case all fields are kept.
Name of table to upload to the gdb.
If not specified, either specifiy output
to
write the results to disk, or otherwise the results will be returned in the R session.
Optionally, an output file path. Can be used instead of uploadName
to write the results to disk.
Field separator, relevant if ranges
is a filepath. Defaults to \\t
.
Flag indicating whether to skip indexing of imported table.
Relevant if uploadName
is specified, and thus the output table is imported in the gdb.
Defaults to FALSE
.
if uploadName
is specified, should an existing table in the gdb with the same name be overwitten?
Defaults to FALSE
.
Should the method be verbose? Defaults to TRUE
.
library(rvatData)
library(rtracklayer)
library(GenomicRanges)
gdb <- create_example_gdb()
# map variants to gene models
ranges <- GRanges(
seqnames = c("chr21", "chr4"),
ranges = IRanges(
start = c(31659666, 169369704),
end = c(31668931, 169612632)
),
gene_name = c("SOD1", "NEK1")
)
mapVariants(gdb,
ranges = ranges,
uploadName = "gene",
verbose = FALSE)
# similarly, ranges can be a data.frame
ranges <- data.frame(
CHROM = c("chr21", "chr4"),
start = c(31659666, 169369704),
end = c(31668931, 169612632),
gene_name = c("SOD1", "NEK1")
)
mapVariants(gdb,
ranges = ranges,
uploadName = "gene",
verbose = FALSE,
overWrite = TRUE)
#> Table 'gene' already exists, it will be overwritten (as `overWrite=TRUE`)
#> Table 'gene' removed from gdb
# often you'd want to map variants to a large set of ranges, such as ensembl models
# mapVariants supports several file formats, including gff/gtf, bed and ranges
# map variants using a gtf file
gtffile <- tempfile(fileext = ".gtf")
rtracklayer::export(makeGRangesFromDataFrame(ranges),
con = gtffile,
format = "gtf")
mapVariants(gdb,
gff = gtffile,
uploadName = "gene",
verbose = FALSE,
overWrite = TRUE)
#> Table 'gene' already exists, it will be overwritten (as `overWrite=TRUE`)
#> Table 'gene' removed from gdb
# map variants using a bed file
bedfile <- tempfile(fileext = ".bed")
rtracklayer::export(makeGRangesFromDataFrame(ranges),
con = bedfile,
format = "bed")
mapVariants(gdb,
bed = bedfile,
uploadName = "gene",
verbose = FALSE,
overWrite = TRUE)
#> Table 'gene' already exists, it will be overwritten (as `overWrite=TRUE`)
#> Table 'gene' removed from gdb
# see the variant annotation tutorial on the rvat website for more details