Run assocTest on a genoMatrix object. See the main assocTest page for details.

# S4 method for genoMatrix
assocTest(
  object,
  pheno,
  test,
  name = "none",
  continuous = FALSE,
  singlevar = FALSE,
  covar = NULL,
  offset = NULL,
  overwriteAggregate = TRUE,
  geneticModel = c("allelic", "recessive", "dominant"),
  imputeMethod = NULL,
  MAFweights = "none",
  maxitFirth = 1000,
  keep = NULL,
  output = NULL,
  append = FALSE,
  returnDF = FALSE,
  methodResampling = NULL,
  resamplingMatrix = NULL,
  nResampling = 1000,
  outputResampling = FALSE,
  memlimitResampling = NULL,
  minCallrateVar = 0,
  maxCallrateVar = Inf,
  minCallrateSM = 0,
  maxCallrateSM = Inf,
  minMAF = 0,
  maxMAF = 1,
  minMAC = 0,
  maxMAC = Inf,
  minCarriers = 0,
  maxCarriers = Inf,
  minCarrierFreq = 0,
  maxCarrierFreq = Inf,
  verbose = TRUE
)

Arguments

object

a genoMatrix object

pheno

colData field to test as response variable, the response variable can either be binary (0/1) or continuous. If the response variable is continuous set continuous to TRUE.

test

Vector of statistical tests to run, options include firth,glm,lm,scoreSPA,skat,skat_burden,skato,skat_fwe,skat_burden_fwe, skato_fwe,skat_robust,skato_robust,skat_burden_robust, acatv, acatvSPA. See assocTest for details.

name

Optional name for the analysis, defaults to "none".

continuous

Is the response variable continuous? (TRUE/FALSE). Defaults to FALSE.

singlevar

Run single variant tests? (TRUE/FALSE). Defaults to FALSE, in which case collapsing tests are ran.

covar

Character vector of covariates. These should be present in the colData slot of the genoMatrix.

offset

Optional model offset, can be used to account for regenie LOCO predictions.

overwriteAggregate

In case there is already an aggregate column in the colData of the genoMatrix (i.e. aggregate has been run on the genoMatrix), should it be overwitten? Defaults to TRUE.

geneticModel

Which genetic model to apply? ('allelic', 'recessive' or 'dominant'). Defaults to allelic.

imputeMethod

Which imputation method to apply? ('meanImpute' or 'missingToRef'). Defaults to meanImpute

MAFweights

Apply MAF weighting? Currently Madsen-Browning ('mb') is implemented. Defaults to 'none'.

maxitFirth

Maximum number of iterations to use for estimating firth confidence intervals.

keep

Vector of sample IDs to keep, defaults to NULL, in which case all samples are kept.

output

Output file path for results. Defaults to NULL, in which case results are not written to disk, but returned as an rvatResult object.

append

Relevant if the output parameter is not NULL. Should results be appended to output?

returnDF

Return a data.frame rather than a rvatResult. Defaults to FALSE.

methodResampling

Which method to use for resampling? ('permutation' currently implemented) Defaults to NULL, in which case no resampling is performed.

resamplingMatrix

Pre-calculated resampling matrix (n x p), where n = number of samples, and p number of resamplings. Can be generated using buildResamplingFile.

nResampling

Number of resamplings to perform if methodResampling is specified.

outputResampling

If TRUE or a filepath, results for each resampling are returned (or saved to the filepath). This can be useful if permutations are used to calculated to estimate correlations among genes for example. Defaults to FALSE in which case resampling is used to calculate resampled P-values, results for individual resamplings are not returned.

memlimitResampling

Maximum number of resamplings to perform at a time. Resampling generates a matrix of n x p, where n is the number of samples and p the number of resamplings thus, for large number of resamplings it can be more efficient to split the permutations in chunks of size memlimitResampling. Defaults to NULL in which case all permutations are performed.

minCallrateVar

Minimum genotype rate for variant retention.

maxCallrateVar

Maximum genotype rate for variant retention.

minCallrateSM

Minimum genotype rate for sample retention.

maxCallrateSM

Maximum genotype rate for sample retention.

minMAF

Minimum minor allele frequency for variant retention.

maxMAF

Maximum minor allele frequency for variant retention.

minMAC

Minimum minor allele count for variant retention.

maxMAC

Maximum minor allele count for variant retention.

minCarriers

Minimum carrier count for variant retention.

maxCarriers

Maximum carrier count for variant retention.

minCarrierFreq

Minimum carrier frequency for variant retention.

maxCarrierFreq

Maximum carrier frequency for variant retention.

verbose

Should the function be verbose? (TRUE/FALSE), defaults to TRUE.

Examples


library(rvatData)
data(GTsmall)

# run a firth burden test on a binary phenotype
rvb <- assocTest(GTsmall,
                 pheno = "pheno",
                 covar = c("PC1", "PC2", "PC3", "PC4"),
                 test = "firth",
                 name = "example")
#> Keeping 5000/5000 available samples for analysis.
#> 42/42 variants are retained for analysis
#> 30/42 variants have zero carriers, these are dropped.

# run ACAT-v and SKAT tests
rvb <- assocTest(GTsmall,
                 pheno = "pheno",
                 covar = c("PC1", "PC2", "PC3", "PC4"),
                 test = c("acatvfirth", "skat_burden_robust", "skato_robust"),
                 name = "example")
#> Keeping 5000/5000 available samples for analysis.
#> 42/42 variants are retained for analysis
#> 30/42 variants have zero carriers, these are dropped.

# run a burden test on a continuous phenotype
rvb <- assocTest(GTsmall,
                 pheno = "age",
                 continuous = TRUE,
                 covar = c("PC1", "PC2", "PC3", "PC4"),
                 test = c("lm", "skat", "acatv"),
                 name = "example")
#> Keeping 5000/5000 available samples for analysis.
#> 42/42 variants are retained for analysis
#> 30/42 variants have zero carriers, these are dropped.

# run single variant tests on a binary phenotype 
sv <- assocTest(GTsmall,
                pheno = "pheno",
                singlevar = TRUE,
                covar = c("PC1", "PC2", "PC3", "PC4"),
                test = c("firth", "glm", "scoreSPA"),
                name = "example",
                minCarriers = 1
                )
#> Keeping 5000/5000 available samples for analysis.
#> 12/42 variants are retained for analysis
#> Warning: 10/12 variants have less than 2 carriers, tests will be skipped for these variants.

# apply variant filters
rvb <- assocTest(GTsmall,
                 pheno = "pheno",
                 covar = c("PC1", "PC2", "PC3", "PC4"),
                 test = c("firth", "skat_robust", "acatv"),
                 name = "example", 
                 maxMAF = 0.001,
                 minCarriers = 2,
                 minCallrateVar = 0.9,
                 minCallrateSM = 0.95)
#> 854 samples don't pass callRate filters.
#> Keeping 4146/5000 available samples for analysis.
#> 1/42 variants are retained for analysis

# Perform MAF-weighted burden tests (madsen-browning)
rvb <- assocTest(GTsmall,
                 pheno = "pheno",
                 covar = c("PC1", "PC2", "PC3", "PC4"),
                 test = c("firth", "skat_robust", "acatv"),
                 MAFweights = "mb")
#> Keeping 5000/5000 available samples for analysis.
#> 42/42 variants are retained for analysis
#> 30/42 variants have zero carriers, these are dropped.

# Perform weighted burden tests with custom weights
gdb <- gdb(rvat_example("rvatData.gdb"))
# add cadd scores
caddscores <- getAnno(gdb, "varInfo", VAR_id = rownames(GTsmall))
caddscores <- caddscores[match(rownames(GTsmall), caddscores$VAR_id),]
# CADD-weighted burden test
rvb <- assocTest(recode(GTsmall, weights = caddscores$CADDphred),
                 pheno = "pheno",
                 covar = c("PC1", "PC2", "PC3", "PC4"),
                 test = c("firth", "skat_robust", "acatv"))
#> Warning: NAs introduced by coercion
#> Keeping 5000/5000 available samples for analysis.
#> Warning: For 4 variants weights are missing, these variants are excluded.
#> 38/42 variants are retained for analysis
#> 26/38 variants have zero carriers, these are dropped.

# Perform recessive burden test
rvb <- assocTest(GTsmall,
                 pheno = "pheno",
                 covar = c("PC1", "PC2", "PC3", "PC4"),
                 test = c("firth", "skat_robust", "acatv"),
                 geneticModel = "recessive")
#> Keeping 5000/5000 available samples for analysis.
#> 42/42 variants are retained for analysis
#> 41/42 variants have zero carriers, these are dropped.

# Resampled burden test 
rvb <- assocTest(GTsmall,
                 pheno = "pheno",
                 covar = c("PC1", "PC2", "PC3", "PC4"),
                 test = c("skat", "skat_burden", "acatv"),
                 name = "example",
                 methodResampling = "permutation",
                 nResampling = 100)
#> Keeping 5000/5000 available samples for analysis.
#> 42/42 variants are retained for analysis
#> 30/42 variants have zero carriers, these are dropped.