Method to retrieve a genoMatrix for variants specified by varSet or a vector of VAR_ids, and samples as specified by the cohort parameter. The checkPloidy parameter can be set to GRCh37 (or hg19) or GRCh38 (or hg38) to assign variant ploidy (diploid,XnonPAR,YnonPAR). If not specified, the genome build in the gdb will be used, if available (included in the genomeBuild parameter was set in buildGdb).

getGT(
  object,
  varSet = NULL,
  VAR_id = NULL,
  ranges = NULL,
  cohort = NULL,
  anno = NULL,
  annoFields = NULL,
  includeVarInfo = FALSE,
  checkPloidy = NULL,
  varSetName = "unnamed",
  unit = "unnamed",
  padding = 250,
  verbose = TRUE,
  strict = TRUE
)

Arguments

object

an object of class gdb

varSet

varSet object. If specified, the VAR_id, varSetName and unit parameters will be ignored.

VAR_id

Character vector containing target VAR_id.

ranges

Extract variants within specified ranges. Ranges can be specified as a data.frame, including at least 'CHROM','start', and 'end' columns, or can be a GenomicRanges::GRanges object.

cohort

Optional use of cohort data previously uploaded to the gdb (see uploadCohort). If a valid cohort name is provided, then the uploaded data for this cohort is used to filter and annotate the returned genoMatrix object. If a dataframe is provided, then this is assumed to conform to the SM table constraints required for genoMatrix objects (see genoMatrix).

anno

Optional use of variant annotation data previously uplodated to the gdb (see uploadAnno). If a valid annotation table name is provided, the variant annotations will be included in the rowData of the genoMatrix. Note that currently only annotation tables that include one row per variant can be included. The annoFields parameter can be used to retain only specified fields from the annotation table.

annoFields

A vector of field names to retain if the anno parameter is set.

includeVarInfo

Include variant info ('var' table from the gdb) in the genoMatrix? Defaults to FALSE. Note that setting this parameter to TRUE will override the anno/annoFields parameters.

checkPloidy

Version of the human genome to use when assigning variant ploidy (diploid, XnonPAR, YnonPAR). Accepted inputs are GRCh37, hg19, GRCh38, hg38. If not specified, the genome build in the gdb will be used, if available (included in the genomeBuild parameter was set in buildGdb). Otherwise, if the genome build is not included in the gdb metadata, and no value is provided, then all variants are assigned the default ploidy of "diploid"

varSetName

Optional name for the set of variants, for example: 'missense or 'LOF' (ignored if varSet is specified.)

unit

Optional 'unit' name, for example: 'SOD1' or 'ENSG00000142168' (ignored if varSet is specified.)

padding

Number of basepairs to extend the search region beyond the specified genomic ranges to capture variants where the reference allele (REF) overlaps the input ranges, but the POS of the variant falls outside the ranges. This accounts for variants where the REF allele spans multiple base pairs.

verbose

Should the method be verbose? Defaults to TRUE.

strict

Should strict checks be performed? Defaults to TRUE. Strict tests currently includes checking whether supplied varSetFile/varSetList/varSet was generated from the same gdb as specified in object.

Value

A genoMatrix object.

Examples

library(rvatData)
gdb <- gdb(rvat_example("rvatData.gdb"))

# retrieve genotypes of a set of variants based on their VAR_ids
varinfo <- getAnno(gdb, table = "varinfo", where = "gene_name = 'SOD1' and ModerateImpact = 1")
GT <- getGT(
  gdb,
  VAR_id = varinfo$VAR_id,
  cohort = "pheno")
#> Retrieved genotypes for 38 variants

# retrieve genotypes of a set of variants in a varSet
varsetfile <- varSetFile(rvat_example("rvatData_varsetfile.txt.gz"))
varset <- getVarSet(varsetfile, unit = "NEK1", varSetName = "High")
GT <- getGT(
  gdb,
  varSet = varset,
  cohort = "pheno")
#> Retrieved genotypes for 33 variants
# see ?varSetFile and ?getVarSet for more details

# retrieve genotypes for a genomic interval
GT <- getGT(
  gdb,
  ranges = data.frame(CHROM = "chr21", start = 31659666, end = 31668931),
  cohort = "pheno")
#> Retrieved genotypes for 49 variants

# the `anno` parameter can be specified to include variant annotations in the rowData of the genoMatrix
GT <- getGT(
  gdb,
  ranges = data.frame(CHROM = "chr21", start = 31659666, end = 31668931),
  cohort = "pheno",
  anno = "varInfo",
  annoFields = c("VAR_id", "CHROM", "POS", "REF", "ALT", "HighImpact", "ModerateImpact", "Synonymous")
)
#> Retrieved genotypes for 49 variants
head(rowData(GT))
#> DataFrame with 6 rows and 9 columns
#>           ploidy         w       CHROM         POS         REF         ALT
#>      <character> <numeric> <character> <character> <character> <character>
#> 1268     diploid         1       chr21    31659783           C           T
#> 1269     diploid         1       chr21    31659796           G           A
#> 1270     diploid         1       chr21    31659799           G           C
#> 1271     diploid         1       chr21    31659819        GCAT           G
#> 1272     diploid         1       chr21    31659820           C           A
#> 1273     diploid         1       chr21    31659828           A           G
#>       HighImpact ModerateImpact  Synonymous
#>      <character>    <character> <character>
#> 1268           0              1           0
#> 1269           0              0           1
#> 1270           0              1           0
#> 1271           0              1           0
#> 1272           0              0           1
#> 1273           0              1           0

# the includeVarInfo parameter is a shorthand for include the "var" table
GT <- getGT(
  gdb,
  ranges = data.frame(CHROM = "chr21", start = 31659666, end = 31668931),
  cohort = "pheno",
  includeVarInfo = TRUE
)
#> Retrieved genotypes for 49 variants
head(rowData(GT))
#> DataFrame with 6 rows and 11 columns
#>           ploidy         w       CHROM       POS          ID         REF
#>      <character> <numeric> <character> <integer> <character> <character>
#> 1268     diploid         1       chr21  31659783 rs121912442           C
#> 1269     diploid         1       chr21  31659796 rs772764888           G
#> 1270     diploid         1       chr21  31659799           .           G
#> 1271     diploid         1       chr21  31659819           .        GCAT
#> 1272     diploid         1       chr21  31659820 rs200454724           C
#> 1273     diploid         1       chr21  31659828 rs768029813           A
#>              ALT        QUAL      FILTER                   INFO      FORMAT
#>      <character> <character> <character>            <character> <character>
#> 1268           T     15317.2           . AC=6;AN=47612;AF=0.0..          GT
#> 1269           A     14053.5           . AC=1;AN=45320;AF=2.2..          GT
#> 1270           C      732.18           . AC=1;AN=49466;AF=2.0..          GT
#> 1271           G      241.15           . AC=1;AN=44404;AF=2.2..          GT
#> 1272           A     1218.18           . AC=1;AN=48150;AF=2.0..          GT
#> 1273           G     15989.6           . AC=5;AN=49638;AF=0.0..          GT

# The `checkPloidy` parameter can be set to the version of the human genome to use
# to assign variant ploidy. (diploid, XnonPAR, YnonPAR). Accepted inputs are GRCh37, hg19, GRCh38, hg38.
# We recommend, however, to set the genome build when building the gdb: the genome build will theb
# be included in the gdb metadata and used automatically. see ?buildGdb for details
varinfo <- getAnno(gdb, table = "varinfo", where = "gene_name = 'UBQLN2' and ModerateImpact = 1")
GT <- getGT(
  gdb,
  VAR_id = varinfo$VAR_id,
  cohort = "pheno",
  includeVarInfo = TRUE,
  checkPloidy = "GRCh38"
)
#> Ploidy of non-pseudoautosomal regions of the sex chromosomes are being set based on build GRCh38
#> Retrieved genotypes for 24 variants

# see ?genoMatrix for more details on the genoMatric class.