The genoMatrix class is specifically designed to represent genotype data and associated sample and variant info. It extends the Bioconductor SummarizedExperiment class to accommodate genotype data.

Construct a genoMatrix

getGT(): Construct a genoMatrix, see getGT() for details.

Accessors

The accessors are fully described in SummarizedExperiment::SummarizedExperiment, and include assays(x), rowData(x), colData(x), metadata(x), dim(x), nrow(x), ncol(x), colnames(x) and rownames(x).

Getters

In the following code snippets, x is a genoMatrix object

  • getAF(x): Returns alternate allele frequencies. These will be equal to the minor allele frequences after applying flipToMinor().

  • getMAF(x): Returns variant minor allele frequencies.

  • getAC(x): Returns alternate allele counts. These will be equal to the minor allele counts after applying flipToMinor().

  • getMAC(x): Returns minor allele counts.

  • getCR(x, var=TRUE): Returns call-rates for variants or samples. var=TRUE by default, set to FALSE to return sample call-rates.

  • getNCarriers(x): Returns number of carriers of the alternate allele for each variant. Note that when geneticModel = 'recessive' it will return the number of homozygous carriers.

  • summariseGeno(x): Returns a per variant summary of genotype counts and hwe testing.

  • getCarriers(x, VAR_id=NULL,colDataFields=NULL,rowDataFields=NULL,groupBy=NULL): Return sample IDs for carriers of each of the variants in the genoMatrix. VAR_id can be specified to return for the specified subset of variants. colDataFields and rowDataFields can be specified include additional fields from colData(x) or rowData(x) in the output. The groupBy parameter can be set to calculate carrier frequency among groups (such as cohort or phenotype), multiple groups can be set. The aggregate parameter can be set to TRUE to return mean burden scores among the groupings.

note: Variant ploidy (diploid, XnonPAR, YnonPAR) are handled according to sample sex. Samples for which sex are not provided are excluded during AF calculation at non-diploid variants.

Subsetting

Subsetting a genoMatrix object is equivalent to subsetting a SummarizedExperiment object, and is fully described in fully described in SummarizedExperiment::SummarizedExperiment. Please see the example section for examples.

Recode

In the following code snippets, x is a genoMatrix object

  • flipToMinor(x): Function to flip genotype dosages such that all GT values represent minor allele counts.

  • recode(x, geneticModel,imputeMethod,weights,MAFweights): Returns a recoded genoMatrix object, genetic model, imputation, and weights can be recoded. See recode() for details.

  • updateGT(x, SM = NULL, anno = NULL): Safe replacement of the colData or rowData table within a genoMatrix with a new table.

Rare variant testing

In the following code snippets, x is a genoMatrix object

  • aggregate(): Returns per sample aggregate dosage for a genoMatrix object. The genoMatrix shouldn't contain missing values, use recode() to impute missing values. Aggregation is dependent on geneticModel, MAFweights and weights, all can be set using the recode() method. By default, an updated genoMatrix is returned with an aggregate field in colData. Set returnGT to FALSE to return a vector of aggregates.

  • assocTest(): Perform aggregate (burden) and single variant assocation test. See assocTest() for details.

See also

Examples



library(rvatData)
data(GT)

# Basic operations -------------------------------------------------------------

# retrieve rowData (i.e. variant info)
rowData(GT)
#> DataFrame with 42 rows and 4 columns
#>           ploidy         w         REF         ALT
#>      <character> <numeric> <character> <character>
#> 1268     diploid         1           C           T
#> 1270     diploid         1           G           C
#> 1271     diploid         1        GCAT           G
#> 1273     diploid         1           A           G
#> 1275     diploid         1           A           T
#> ...          ...       ...         ...         ...
#> 1312     diploid         1           G           C
#> 1313     diploid         1          TG           T
#> 1314     diploid         1           T           G
#> 1315     diploid         1           A           G
#> 1316     diploid         1           T           C

# retrieve colData (i.e. sample info)
colData(GT)
#> DataFrame with 25000 rows and 10 columns
#>                   IID       sex     pheno         pop    superPop       PC1
#>           <character> <numeric> <integer> <character> <character> <numeric>
#> ALS1             ALS1         2         1         PJL         SAS   22.5875
#> ALS2             ALS2         2         1         BEB         SAS   26.3737
#> ALS3             ALS3         1         1         TSI         EUR   30.7550
#> ALS4             ALS4         2         1         GWD         AFR  -87.4804
#> ALS5             ALS5         1         1         MSL         AFR  -87.7931
#> ...               ...       ...       ...         ...         ...       ...
#> CTRL19996   CTRL19996         1         0         ACB         AFR  -82.9449
#> CTRL19997   CTRL19997         1         0         FIN         EUR   28.4759
#> CTRL19998   CTRL19998         1         0         ACB         AFR  -75.3378
#> CTRL19999   CTRL19999         2         0         PEL         AMR   38.2611
#> CTRL20000   CTRL20000         1         0         STU         SAS   25.1909
#>                 PC2        PC3        PC4       age
#>           <numeric>  <numeric>  <numeric> <numeric>
#> ALS1        4.73803  38.915756 -10.166734   66.0174
#> ALS2       -1.44590  31.971299  -6.022256   66.8431
#> ALS3       48.86684  -5.409240  14.947440   54.3575
#> ALS4       -6.19363  -0.950413  -1.202993   65.0978
#> ALS5       -5.32393  -1.035330  -0.740323   58.9466
#> ...             ...        ...        ...       ...
#> CTRL19996  -6.99645  -0.145528   0.646073   52.7380
#> CTRL19997  42.83731 -13.434231   8.139021   52.3501
#> CTRL19998  -2.48278  -2.023737   0.389070   66.6893
#> CTRL19999 -16.33764 -41.853517 -76.090642   60.5823
#> CTRL20000   7.86327  38.511045 -13.762434   62.7233

# retrieve sample IDs
samples <- colnames(GT)
head(samples)
#> [1] "ALS1" "ALS2" "ALS3" "ALS4" "ALS5" "ALS6"

# retrieve VAR ids
vars <- rownames(GT)
head(vars)
#> [1] "1268" "1270" "1271" "1273" "1275" "1276"

# check dimensions
nrow(GT)
#> [1] 42
ncol(GT)
#> [1] 25000
dim(GT)
#> [1]    42 25000

# A genoMatrix object can be subsetted similarly as a data.frame:
GT[1:5, 1:5]
#> rvat genoMatrix Object
#> class: genoMatrix 
#> dim: 5 5 
#> metadata(11): gdb gdbId ... geneticModel imputeMethod
#> assays(1): GT
#> rownames(5): 1268 1270 1271 1273 1275
#> rowData names(4): ploidy w REF ALT
#> colnames(5): ALS1 ALS2 ALS3 ALS4 ALS5
#> colData names(10): IID sex ... PC4 age

# Subset samples based on sample info
GT[ ,GT$pheno == 1]
#> rvat genoMatrix Object
#> class: genoMatrix 
#> dim: 42 5000 
#> metadata(11): gdb gdbId ... geneticModel imputeMethod
#> assays(1): GT
#> rownames(42): 1268 1270 ... 1315 1316
#> rowData names(4): ploidy w REF ALT
#> colnames(5000): ALS1 ALS2 ... ALS4999 ALS5000
#> colData names(10): IID sex ... PC4 age

# Subset first two variants
GT[1:2,]
#> rvat genoMatrix Object
#> class: genoMatrix 
#> dim: 2 25000 
#> metadata(11): gdb gdbId ... geneticModel imputeMethod
#> assays(1): GT
#> rownames(2): 1268 1270
#> rowData names(4): ploidy w REF ALT
#> colnames(25000): ALS1 ALS2 ... CTRL19999 CTRL20000
#> colData names(10): IID sex ... PC4 age

# Extract variant/sample summaries --------------------------------------------------

# calculate allele frequencies, allele counts etc.
af <- getAF(GT)
maf <- getMAF(GT)
ac <- getAC(GT)
mac <- getMAC(GT)
carriers <- getNCarriers(GT)

# generate call-rates
var_cr <- getCR(GT) # variant call-rates
sample_cr <- getCR(GT, var = FALSE) # sample call-rates

# generate variant summaries
varsummary <- summariseGeno(GT)

# Recode genotypes  --------------------------------------------------

# flip variants with AF > 0.5 to the minor allele
GT <- flipToMinor(GT)

# recode genotypes to domiant/recessive models
recode(GT, geneticModel = "dominant")
#> rvat genoMatrix Object
#> class: genoMatrix 
#> dim: 42 25000 
#> metadata(11): gdb gdbId ... geneticModel imputeMethod
#> assays(1): GT
#> rownames(42): 1268 1270 ... 1315 1316
#> rowData names(6): ploidy w ... effectAllele otherAllele
#> colnames(25000): ALS1 ALS2 ... CTRL19999 CTRL20000
#> colData names(10): IID sex ... PC4 age
recode(GT, geneticModel = "recessive")
#> rvat genoMatrix Object
#> class: genoMatrix 
#> dim: 42 25000 
#> metadata(11): gdb gdbId ... geneticModel imputeMethod
#> assays(1): GT
#> rownames(42): 1268 1270 ... 1315 1316
#> rowData names(6): ploidy w ... effectAllele otherAllele
#> colnames(25000): ALS1 ALS2 ... CTRL19999 CTRL20000
#> colData names(10): IID sex ... PC4 age
# see ?recode for details

# recode missing genotypes 
recode(GT, imputeMethod = "meanImpute")
#> rvat genoMatrix Object
#> class: genoMatrix 
#> dim: 42 25000 
#> metadata(11): gdb gdbId ... geneticModel imputeMethod
#> assays(1): GT
#> rownames(42): 1268 1270 ... 1315 1316
#> rowData names(6): ploidy w ... effectAllele otherAllele
#> colnames(25000): ALS1 ALS2 ... CTRL19999 CTRL20000
#> colData names(10): IID sex ... PC4 age
recode(GT, imputeMethod = "missingToRef")
#> rvat genoMatrix Object
#> class: genoMatrix 
#> dim: 42 25000 
#> metadata(11): gdb gdbId ... geneticModel imputeMethod
#> assays(1): GT
#> rownames(42): 1268 1270 ... 1315 1316
#> rowData names(6): ploidy w ... effectAllele otherAllele
#> colnames(25000): ALS1 ALS2 ... CTRL19999 CTRL20000
#> colData names(10): IID sex ... PC4 age
# see ?recode for details

# generate aggregate (burden) scores

# by default, scores will be added to the colData
aggregate(recode(GT, imputeMethod = "meanImpute"))
#> rvat genoMatrix Object
#> class: genoMatrix 
#> dim: 42 25000 
#> metadata(11): gdb gdbId ... geneticModel imputeMethod
#> assays(1): GT
#> rownames(42): 1268 1270 ... 1315 1316
#> rowData names(6): ploidy w ... effectAllele otherAllele
#> colnames(25000): ALS1 ALS2 ... CTRL19999 CTRL20000
#> colData names(11): IID sex ... age aggregate

# set `returnGT = FALSE` to return aggregates as a vector instead
aggregate <- aggregate(recode(GT, imputeMethod = "meanImpute"), returnGT = FALSE)
head(aggregate)
#>         ALS1         ALS2         ALS3         ALS4         ALS5         ALS6 
#> 4.267668e-05 0.000000e+00 1.248082e-04 0.000000e+00 8.505593e-05 9.297136e-05 

# Update cohort and variant info in genoMatrix --------------------------
gdb <- gdb(rvat_example("rvatData.gdb"))
anno <- getAnno(gdb, "varInfo", fields = c("VAR_id", "CADDphred", "PolyPhen"))
updateGT(GT, anno = anno )
#> rvat genoMatrix Object
#> class: genoMatrix 
#> dim: 42 25000 
#> metadata(11): gdb gdbId ... geneticModel imputeMethod
#> assays(1): GT
#> rownames(42): 1268 1270 ... 1315 1316
#> rowData names(4): ploidy w CADDphred PolyPhen
#> colnames(25000): ALS1 ALS2 ... CTRL19999 CTRL20000
#> colData names(10): IID sex ... PC4 age
pheno <- colData(GT)
updateGT(GT, SM = colData(GT)[,1:4])
#> rvat genoMatrix Object
#> class: genoMatrix 
#> dim: 42 25000 
#> metadata(11): gdb gdbId ... geneticModel imputeMethod
#> assays(1): GT
#> rownames(42): 1268 1270 ... 1315 1316
#> rowData names(6): ploidy w ... effectAllele otherAllele
#> colnames(25000): ALS1 ALS2 ... CTRL19999 CTRL20000
#> colData names(4): IID sex pheno pop

# Get variant carriers

# retrieve a data.frame that lists the samples carrying each respective 
# variant in the genoMatrix. Additional variant and sample info can be included
# using the `rowDataFields` and `colDataFields` respectively.
carriers <- getCarriers(
  GT,
  rowDataFields = c("REF", "ALT"),
  colDataFields = c("superPop")
)
head(carriers)
#>   VAR_id     IID genotype superPop REF ALT
#> 1   1268 ALS1811        1      AFR   C   T
#> 2   1268 ALS2109        1      AFR   C   T
#> 3   1268 ALS2155        1      SAS   C   T
#> 4   1268 ALS3076        1      AFR   C   T
#> 5   1268 ALS4396        1      AFR   C   T
#> 6   1268 ALS4476        1      EAS   C   T

# Perform rare variant tests:

# burden test (firth)
rvb <- assocTest(
  GT,
  pheno = "pheno",
  covar = c("sex", "PC1", "PC2", "PC3", "PC4"),
  test = "firth"
)
#> Keeping 25000/25000 available samples for analysis.
#> 42/42 variants are retained for analysis

# single variant tests
sv <- assocTest(
  GT,
  pheno = "pheno",
  covar = c("sex", "PC1", "PC2", "PC3", "PC4"),
  test = "scoreSPA",
  singlevar = TRUE
)
#> Keeping 25000/25000 available samples for analysis.
#> 42/42 variants are retained for analysis
#> Warning: 33/42 variants have less than 2 carriers, tests will be skipped for these variants.
# see ?assocTest for details