getGT.Rd
Method to retrieve a genoMatrix
for variants specified by varSet
or a vector of VAR_ids,
and samples as specified by the cohort
parameter.
The checkPloidy
parameter can be set to GRCh37
(or hg19
) or GRCh38
(or hg38
) to
assign variant ploidy (diploid,XnonPAR,YnonPAR). If not specified, the genome build in the gdb
will be used, if available (included in the genomeBuild
parameter was set in buildGdb
).
getGT(
object,
varSet = NULL,
VAR_id = NULL,
ranges = NULL,
cohort = NULL,
anno = NULL,
annoFields = NULL,
includeVarInfo = FALSE,
checkPloidy = NULL,
varSetName = "unnamed",
unit = "unnamed",
padding = 250,
verbose = TRUE,
strict = TRUE
)
an object of class gdb
varSet
object. If specified, the VAR_id
, varSetName
and unit
parameters will be ignored.
Character vector containing target VAR_id.
Extract variants within specified ranges.
Ranges can be specified as a data.frame, including at least 'CHROM','start', and 'end' columns, or
can be a GenomicRanges::GRanges
object.
Optional use of cohort data previously uploaded to the gdb (see uploadCohort
).
If a valid cohort name is provided, then the uploaded data for this cohort is used to filter and annotate the returned genoMatrix object.
If a dataframe is provided, then this is assumed to conform to the SM table constraints required for genoMatrix objects (see genoMatrix).
Optional use of variant annotation data previously uplodated to the gdb (see uploadAnno
).
If a valid annotation table name is provided, the variant annotations will be included in the rowData
of the genoMatrix
.
Note that currently only annotation tables that include one row per variant can be included.
The annoFields
parameter can be used to retain only specified fields from the annotation table.
A vector of field names to retain if the anno
parameter is set.
Include variant info ('var' table from the gdb) in the genoMatrix
? Defaults to FALSE
.
Note that setting this parameter to TRUE
will override the anno
/annoFields
parameters.
Version of the human genome to use when assigning variant ploidy (diploid, XnonPAR, YnonPAR).
Accepted inputs are GRCh37, hg19, GRCh38, hg38.
If not specified, the genome build in the gdb
will be used, if available (included in the genomeBuild
parameter was set in buildGdb
).
Otherwise, if the genome build is not included in the gdb metadata, and no value is provided, then all variants are assigned the default ploidy of "diploid"
Optional name for the set of variants, for example: 'missense or 'LOF' (ignored if varSet
is specified.)
Optional 'unit' name, for example: 'SOD1' or 'ENSG00000142168' (ignored if varSet
is specified.)
Number of basepairs to extend the search region beyond the specified genomic ranges to capture variants where the reference allele (REF) overlaps the input ranges, but the POS of the variant falls outside the ranges. This accounts for variants where the REF allele spans multiple base pairs.
Should the method be verbose? Defaults to TRUE
.
Should strict checks be performed? Defaults to TRUE
. Strict tests currently includes
checking whether supplied varSetFile/varSetList/varSet was generated from the same gdb as specified in object
.
A genoMatrix
object.
library(rvatData)
gdb <- gdb(rvat_example("rvatData.gdb"))
# retrieve genotypes of a set of variants based on their VAR_ids
varinfo <- getAnno(gdb, table = "varinfo", where = "gene_name = 'SOD1' and ModerateImpact = 1")
GT <- getGT(
gdb,
VAR_id = varinfo$VAR_id,
cohort = "pheno")
#> Retrieved genotypes for 38 variants
# retrieve genotypes of a set of variants in a varSet
varsetfile <- varSetFile(rvat_example("rvatData_varsetfile.txt.gz"))
varset <- getVarSet(varsetfile, unit = "NEK1", varSetName = "High")
GT <- getGT(
gdb,
varSet = varset,
cohort = "pheno")
#> Retrieved genotypes for 33 variants
# see ?varSetFile and ?getVarSet for more details
# retrieve genotypes for a genomic interval
GT <- getGT(
gdb,
ranges = data.frame(CHROM = "chr21", start = 31659666, end = 31668931),
cohort = "pheno")
#> Retrieved genotypes for 49 variants
# the `anno` parameter can be specified to include variant annotations in the rowData of the genoMatrix
GT <- getGT(
gdb,
ranges = data.frame(CHROM = "chr21", start = 31659666, end = 31668931),
cohort = "pheno",
anno = "varInfo",
annoFields = c("VAR_id", "CHROM", "POS", "REF", "ALT", "HighImpact", "ModerateImpact", "Synonymous")
)
#> Retrieved genotypes for 49 variants
head(rowData(GT))
#> DataFrame with 6 rows and 9 columns
#> ploidy w CHROM POS REF ALT
#> <character> <numeric> <character> <character> <character> <character>
#> 1268 diploid 1 chr21 31659783 C T
#> 1269 diploid 1 chr21 31659796 G A
#> 1270 diploid 1 chr21 31659799 G C
#> 1271 diploid 1 chr21 31659819 GCAT G
#> 1272 diploid 1 chr21 31659820 C A
#> 1273 diploid 1 chr21 31659828 A G
#> HighImpact ModerateImpact Synonymous
#> <character> <character> <character>
#> 1268 0 1 0
#> 1269 0 0 1
#> 1270 0 1 0
#> 1271 0 1 0
#> 1272 0 0 1
#> 1273 0 1 0
# the includeVarInfo parameter is a shorthand for include the "var" table
GT <- getGT(
gdb,
ranges = data.frame(CHROM = "chr21", start = 31659666, end = 31668931),
cohort = "pheno",
includeVarInfo = TRUE
)
#> Retrieved genotypes for 49 variants
head(rowData(GT))
#> DataFrame with 6 rows and 11 columns
#> ploidy w CHROM POS ID REF
#> <character> <numeric> <character> <integer> <character> <character>
#> 1268 diploid 1 chr21 31659783 rs121912442 C
#> 1269 diploid 1 chr21 31659796 rs772764888 G
#> 1270 diploid 1 chr21 31659799 . G
#> 1271 diploid 1 chr21 31659819 . GCAT
#> 1272 diploid 1 chr21 31659820 rs200454724 C
#> 1273 diploid 1 chr21 31659828 rs768029813 A
#> ALT QUAL FILTER INFO FORMAT
#> <character> <character> <character> <character> <character>
#> 1268 T 15317.2 . AC=6;AN=47612;AF=0.0.. GT
#> 1269 A 14053.5 . AC=1;AN=45320;AF=2.2.. GT
#> 1270 C 732.18 . AC=1;AN=49466;AF=2.0.. GT
#> 1271 G 241.15 . AC=1;AN=44404;AF=2.2.. GT
#> 1272 A 1218.18 . AC=1;AN=48150;AF=2.0.. GT
#> 1273 G 15989.6 . AC=5;AN=49638;AF=0.0.. GT
# The `checkPloidy` parameter can be set to the version of the human genome to use
# to assign variant ploidy. (diploid, XnonPAR, YnonPAR). Accepted inputs are GRCh37, hg19, GRCh38, hg38.
# We recommend, however, to set the genome build when building the gdb: the genome build will theb
# be included in the gdb metadata and used automatically. see ?buildGdb for details
varinfo <- getAnno(gdb, table = "varinfo", where = "gene_name = 'UBQLN2' and ModerateImpact = 1")
GT <- getGT(
gdb,
VAR_id = varinfo$VAR_id,
cohort = "pheno",
includeVarInfo = TRUE,
checkPloidy = "GRCh38"
)
#> Ploidy of non-pseudoautosomal regions of the sex chromosomes are being set based on build GRCh38
#> Retrieved genotypes for 24 variants
# see ?genoMatrix for more details on the genoMatric class.