Method to map the variants in a gdb to a set of ranges or features. The input can be a set of ranges (CHROM, start, end), a bed-file or a gff/gtf-file. Variants in the gdb will be mapped onto those ranges and annotated with the features/columns included in the input file. For example, variants can be easily mapped upon genomic features downloaded in gff format from ensembl. The output can be written to disk (output parameter) or directly uploaded to the gdb (uploadName parameter).

mapVariants(
  object,
  ranges = NULL,
  gff = NULL,
  bed = NULL,
  bedCols = character(),
  fields = NULL,
  uploadName = NULL,
  output = NULL,
  sep = "\t",
  skipIndexes = FALSE,
  overWrite = FALSE,
  verbose = TRUE
)

Arguments

object

a gdb object

ranges

Can be 1) a data.frame, including at least 'CHROM','start', and 'end' columns. 2) a GenomicRanges::GRanges object. 3) a filepath to a ranges file containing at least 'CHROM','start', and 'end' columns. Separator can be specified using the sep parameter (defaults to \\t).

gff

Path to a gff- or gtf-file.

bed

Path to a bed-file. Specify extra columns using the bedCols parameter.

bedCols

A character vector of names of the extra columns to read from the BED-file. Optionally the vector can be a named vector to indicate the classes of the columns (i.e. c("gene_id" = "character", "gene_name"="character")). If not named, all extra columns will be read as character columns (see examples).

fields

Feature fields to keep. Defaults to NULL in which case all fields are kept.

uploadName

Name of table to upload to the gdb. If not specified, either specifiy output to write the results to disk, or otherwise the results will be returned in the R session.

output

Optionally, an output file path. Can be used instead of uploadName to write the results to disk.

sep

Field separator, relevant if ranges is a filepath. Defaults to \\t.

skipIndexes

Flag indicating whether to skip indexing of imported table. Relevant if uploadName is specified, and thus the output table is imported in the gdb. Defaults to FALSE.

overWrite

if uploadName is specified, should an existing table in the gdb with the same name be overwitten? Defaults to FALSE.

verbose

Should the method be verbose? Defaults to TRUE.

Examples


library(rvatData)
library(rtracklayer)
library(GenomicRanges)
gdb <- create_example_gdb()

# map variants to gene models
ranges <- GRanges(
  seqnames = c("chr21", "chr4"),
  ranges = IRanges(
    start = c(31659666, 169369704),
    end = c(31668931, 169612632)
  ),
  gene_name = c("SOD1", "NEK1")
)

mapVariants(gdb,
            ranges = ranges,
            uploadName = "gene",
            verbose = FALSE)

# similarly, ranges can be a data.frame
ranges <- data.frame(
  CHROM = c("chr21", "chr4"),
  start = c(31659666, 169369704),
  end = c(31668931, 169612632),
  gene_name = c("SOD1", "NEK1")
)

mapVariants(gdb,
            ranges = ranges,
            uploadName = "gene",
            verbose = FALSE,
            overWrite = TRUE)
#> Table 'gene' already exists, it will be overwritten (as `overWrite=TRUE`)
#> Table 'gene' removed from gdb

# often you'd want to map variants to a large set of ranges, such as ensembl models
# mapVariants supports several file formats, including gff/gtf, bed and ranges

# map variants using a gtf file
gtffile <- tempfile(fileext = ".gtf")
rtracklayer::export(makeGRangesFromDataFrame(ranges),
                    con = gtffile, 
                    format = "gtf")

mapVariants(gdb,
            gff = gtffile,
            uploadName = "gene",
            verbose = FALSE,
            overWrite = TRUE)
#> Table 'gene' already exists, it will be overwritten (as `overWrite=TRUE`)
#> Table 'gene' removed from gdb

# map variants using a bed file

bedfile <- tempfile(fileext = ".bed")
rtracklayer::export(makeGRangesFromDataFrame(ranges),
                    con = bedfile, 
                    format = "bed")
mapVariants(gdb,
            bed = bedfile,
            uploadName = "gene",
            verbose = FALSE,
            overWrite = TRUE)
#> Table 'gene' already exists, it will be overwritten (as `overWrite=TRUE`)
#> Table 'gene' removed from gdb

# see the variant annotation tutorial on the rvat website for more details