genoMatrix.Rd
The genoMatrix class is specifically designed to represent genotype data and associated sample and variant info.
It extends the Bioconductor SummarizedExperiment
class to accommodate genotype data.
The accessors are fully described in SummarizedExperiment::SummarizedExperiment
, and
include assays(x)
, rowData(x)
, colData(x)
, metadata(x)
, dim(x)
, nrow(x)
, ncol(x)
, colnames(x)
and rownames(x)
.
In the following code snippets, x is a genoMatrix object
getAF(x)
: Returns alternate allele frequencies. These will be equal to the minor allele frequences after applying flipToMinor()
.
getMAF(x)
: Returns variant minor allele frequencies.
getAC(x)
: Returns alternate allele counts. These will be equal to the minor allele counts after applying flipToMinor()
.
getMAC(x)
: Returns minor allele counts.
getCR(x, var=TRUE)
: Returns call-rates for variants or samples.
var=TRUE
by default, set to FALSE
to return sample call-rates.
getNCarriers(x)
: Returns number of carriers of the alternate allele for each variant.
Note that when geneticModel = 'recessive' it will return the number of homozygous carriers.
summariseGeno(x)
: Returns a per variant summary of genotype counts and hwe testing.
getCarriers(x, VAR_id=NULL,colDataFields=NULL,rowDataFields=NULL,groupBy=NULL)
:
Return sample IDs for carriers of each of the variants in the genoMatrix.
VAR_id
can be specified to return for the specified subset of variants.
colDataFields
and rowDataFields
can be specified include additional fields from colData(x)
or rowData(x)
in the output.
The groupBy
parameter can be set to calculate carrier frequency among groups (such as cohort or phenotype), multiple groups can be set.
The aggregate
parameter can be set to TRUE to return mean burden scores among the groupings.
note: Variant ploidy (diploid, XnonPAR, YnonPAR) are handled according to sample sex. Samples for which sex are not provided are excluded during AF calculation at non-diploid variants.
Subsetting a genoMatrix object is equivalent to subsetting a SummarizedExperiment object,
and is fully described in fully described in SummarizedExperiment::SummarizedExperiment
.
Please see the example section for examples.
In the following code snippets, x is a genoMatrix object
flipToMinor(x)
: Function to flip genotype dosages such that all GT values represent minor allele counts.
recode(x, geneticModel,imputeMethod,weights,MAFweights)
: Returns a recoded genoMatrix object, genetic model, imputation, and weights can be recoded. See recode()
for details.
updateGT(x, SM = NULL, anno = NULL)
: Safe replacement of the colData or rowData table within a genoMatrix with a new table.
In the following code snippets, x is a genoMatrix object
aggregate()
: Returns per sample aggregate dosage for a genoMatrix object.
The genoMatrix shouldn't contain missing values, use recode()
to impute missing values.
Aggregation is dependent on geneticModel
, MAFweights
and weights
, all can be set using the recode()
method.
By default, an updated genoMatrix is returned with an aggregate
field in colData
. Set returnGT
to FALSE
to return a vector of aggregates.
assocTest()
: Perform aggregate (burden) and single variant assocation test. See assocTest()
for details.
library(rvatData)
data(GT)
# Basic operations -------------------------------------------------------------
# retrieve rowData (i.e. variant info)
rowData(GT)
#> DataFrame with 42 rows and 4 columns
#> ploidy w REF ALT
#> <character> <numeric> <character> <character>
#> 1268 diploid 1 C T
#> 1270 diploid 1 G C
#> 1271 diploid 1 GCAT G
#> 1273 diploid 1 A G
#> 1275 diploid 1 A T
#> ... ... ... ... ...
#> 1312 diploid 1 G C
#> 1313 diploid 1 TG T
#> 1314 diploid 1 T G
#> 1315 diploid 1 A G
#> 1316 diploid 1 T C
# retrieve colData (i.e. sample info)
colData(GT)
#> DataFrame with 25000 rows and 10 columns
#> IID sex pheno pop superPop PC1
#> <character> <numeric> <integer> <character> <character> <numeric>
#> ALS1 ALS1 2 1 PJL SAS 22.5875
#> ALS2 ALS2 2 1 BEB SAS 26.3737
#> ALS3 ALS3 1 1 TSI EUR 30.7550
#> ALS4 ALS4 2 1 GWD AFR -87.4804
#> ALS5 ALS5 1 1 MSL AFR -87.7931
#> ... ... ... ... ... ... ...
#> CTRL19996 CTRL19996 1 0 ACB AFR -82.9449
#> CTRL19997 CTRL19997 1 0 FIN EUR 28.4759
#> CTRL19998 CTRL19998 1 0 ACB AFR -75.3378
#> CTRL19999 CTRL19999 2 0 PEL AMR 38.2611
#> CTRL20000 CTRL20000 1 0 STU SAS 25.1909
#> PC2 PC3 PC4 age
#> <numeric> <numeric> <numeric> <numeric>
#> ALS1 4.73803 38.915756 -10.166734 66.0174
#> ALS2 -1.44590 31.971299 -6.022256 66.8431
#> ALS3 48.86684 -5.409240 14.947440 54.3575
#> ALS4 -6.19363 -0.950413 -1.202993 65.0978
#> ALS5 -5.32393 -1.035330 -0.740323 58.9466
#> ... ... ... ... ...
#> CTRL19996 -6.99645 -0.145528 0.646073 52.7380
#> CTRL19997 42.83731 -13.434231 8.139021 52.3501
#> CTRL19998 -2.48278 -2.023737 0.389070 66.6893
#> CTRL19999 -16.33764 -41.853517 -76.090642 60.5823
#> CTRL20000 7.86327 38.511045 -13.762434 62.7233
# retrieve sample IDs
samples <- colnames(GT)
head(samples)
#> [1] "ALS1" "ALS2" "ALS3" "ALS4" "ALS5" "ALS6"
# retrieve VAR ids
vars <- rownames(GT)
head(vars)
#> [1] "1268" "1270" "1271" "1273" "1275" "1276"
# check dimensions
nrow(GT)
#> [1] 42
ncol(GT)
#> [1] 25000
dim(GT)
#> [1] 42 25000
# A genoMatrix object can be subsetted similarly as a data.frame:
GT[1:5, 1:5]
#> rvat genoMatrix Object
#> class: genoMatrix
#> dim: 5 5
#> metadata(11): gdb gdbId ... geneticModel imputeMethod
#> assays(1): GT
#> rownames(5): 1268 1270 1271 1273 1275
#> rowData names(4): ploidy w REF ALT
#> colnames(5): ALS1 ALS2 ALS3 ALS4 ALS5
#> colData names(10): IID sex ... PC4 age
# Subset samples based on sample info
GT[ ,GT$pheno == 1]
#> rvat genoMatrix Object
#> class: genoMatrix
#> dim: 42 5000
#> metadata(11): gdb gdbId ... geneticModel imputeMethod
#> assays(1): GT
#> rownames(42): 1268 1270 ... 1315 1316
#> rowData names(4): ploidy w REF ALT
#> colnames(5000): ALS1 ALS2 ... ALS4999 ALS5000
#> colData names(10): IID sex ... PC4 age
# Subset first two variants
GT[1:2,]
#> rvat genoMatrix Object
#> class: genoMatrix
#> dim: 2 25000
#> metadata(11): gdb gdbId ... geneticModel imputeMethod
#> assays(1): GT
#> rownames(2): 1268 1270
#> rowData names(4): ploidy w REF ALT
#> colnames(25000): ALS1 ALS2 ... CTRL19999 CTRL20000
#> colData names(10): IID sex ... PC4 age
# Extract variant/sample summaries --------------------------------------------------
# calculate allele frequencies, allele counts etc.
af <- getAF(GT)
maf <- getMAF(GT)
ac <- getAC(GT)
mac <- getMAC(GT)
carriers <- getNCarriers(GT)
# generate call-rates
var_cr <- getCR(GT) # variant call-rates
sample_cr <- getCR(GT, var = FALSE) # sample call-rates
# generate variant summaries
varsummary <- summariseGeno(GT)
# Recode genotypes --------------------------------------------------
# flip variants with AF > 0.5 to the minor allele
GT <- flipToMinor(GT)
# recode genotypes to domiant/recessive models
recode(GT, geneticModel = "dominant")
#> rvat genoMatrix Object
#> class: genoMatrix
#> dim: 42 25000
#> metadata(11): gdb gdbId ... geneticModel imputeMethod
#> assays(1): GT
#> rownames(42): 1268 1270 ... 1315 1316
#> rowData names(6): ploidy w ... effectAllele otherAllele
#> colnames(25000): ALS1 ALS2 ... CTRL19999 CTRL20000
#> colData names(10): IID sex ... PC4 age
recode(GT, geneticModel = "recessive")
#> rvat genoMatrix Object
#> class: genoMatrix
#> dim: 42 25000
#> metadata(11): gdb gdbId ... geneticModel imputeMethod
#> assays(1): GT
#> rownames(42): 1268 1270 ... 1315 1316
#> rowData names(6): ploidy w ... effectAllele otherAllele
#> colnames(25000): ALS1 ALS2 ... CTRL19999 CTRL20000
#> colData names(10): IID sex ... PC4 age
# see ?recode for details
# recode missing genotypes
recode(GT, imputeMethod = "meanImpute")
#> rvat genoMatrix Object
#> class: genoMatrix
#> dim: 42 25000
#> metadata(11): gdb gdbId ... geneticModel imputeMethod
#> assays(1): GT
#> rownames(42): 1268 1270 ... 1315 1316
#> rowData names(6): ploidy w ... effectAllele otherAllele
#> colnames(25000): ALS1 ALS2 ... CTRL19999 CTRL20000
#> colData names(10): IID sex ... PC4 age
recode(GT, imputeMethod = "missingToRef")
#> rvat genoMatrix Object
#> class: genoMatrix
#> dim: 42 25000
#> metadata(11): gdb gdbId ... geneticModel imputeMethod
#> assays(1): GT
#> rownames(42): 1268 1270 ... 1315 1316
#> rowData names(6): ploidy w ... effectAllele otherAllele
#> colnames(25000): ALS1 ALS2 ... CTRL19999 CTRL20000
#> colData names(10): IID sex ... PC4 age
# see ?recode for details
# generate aggregate (burden) scores
# by default, scores will be added to the colData
aggregate(recode(GT, imputeMethod = "meanImpute"))
#> rvat genoMatrix Object
#> class: genoMatrix
#> dim: 42 25000
#> metadata(11): gdb gdbId ... geneticModel imputeMethod
#> assays(1): GT
#> rownames(42): 1268 1270 ... 1315 1316
#> rowData names(6): ploidy w ... effectAllele otherAllele
#> colnames(25000): ALS1 ALS2 ... CTRL19999 CTRL20000
#> colData names(11): IID sex ... age aggregate
# set `returnGT = FALSE` to return aggregates as a vector instead
aggregate <- aggregate(recode(GT, imputeMethod = "meanImpute"), returnGT = FALSE)
head(aggregate)
#> ALS1 ALS2 ALS3 ALS4 ALS5 ALS6
#> 4.267668e-05 0.000000e+00 1.248082e-04 0.000000e+00 8.505593e-05 9.297136e-05
# Update cohort and variant info in genoMatrix --------------------------
gdb <- gdb(rvat_example("rvatData.gdb"))
anno <- getAnno(gdb, "varInfo", fields = c("VAR_id", "CADDphred", "PolyPhen"))
updateGT(GT, anno = anno )
#> rvat genoMatrix Object
#> class: genoMatrix
#> dim: 42 25000
#> metadata(11): gdb gdbId ... geneticModel imputeMethod
#> assays(1): GT
#> rownames(42): 1268 1270 ... 1315 1316
#> rowData names(4): ploidy w CADDphred PolyPhen
#> colnames(25000): ALS1 ALS2 ... CTRL19999 CTRL20000
#> colData names(10): IID sex ... PC4 age
pheno <- colData(GT)
updateGT(GT, SM = colData(GT)[,1:4])
#> rvat genoMatrix Object
#> class: genoMatrix
#> dim: 42 25000
#> metadata(11): gdb gdbId ... geneticModel imputeMethod
#> assays(1): GT
#> rownames(42): 1268 1270 ... 1315 1316
#> rowData names(6): ploidy w ... effectAllele otherAllele
#> colnames(25000): ALS1 ALS2 ... CTRL19999 CTRL20000
#> colData names(4): IID sex pheno pop
# Get variant carriers
# retrieve a data.frame that lists the samples carrying each respective
# variant in the genoMatrix. Additional variant and sample info can be included
# using the `rowDataFields` and `colDataFields` respectively.
carriers <- getCarriers(
GT,
rowDataFields = c("REF", "ALT"),
colDataFields = c("superPop")
)
head(carriers)
#> VAR_id IID genotype superPop REF ALT
#> 1 1268 ALS1811 1 AFR C T
#> 2 1268 ALS2109 1 AFR C T
#> 3 1268 ALS2155 1 SAS C T
#> 4 1268 ALS3076 1 AFR C T
#> 5 1268 ALS4396 1 AFR C T
#> 6 1268 ALS4476 1 EAS C T
# Perform rare variant tests:
# burden test (firth)
rvb <- assocTest(
GT,
pheno = "pheno",
covar = c("sex", "PC1", "PC2", "PC3", "PC4"),
test = "firth"
)
#> Keeping 25000/25000 available samples for analysis.
#> 42/42 variants are retained for analysis
# single variant tests
sv <- assocTest(
GT,
pheno = "pheno",
covar = c("sex", "PC1", "PC2", "PC3", "PC4"),
test = "scoreSPA",
singlevar = TRUE
)
#> Keeping 25000/25000 available samples for analysis.
#> 42/42 variants are retained for analysis
#> Warning: 33/42 variants have less than 2 carriers, tests will be skipped for these variants.
# see ?assocTest for details