assocTest-genoMatrix — assocTest-genoMatrix • rvat

Run assocTest on a genoMatrix object. See the main assocTest page for details.

# S4 method for class 'genoMatrix'
assocTest(
  object,
  pheno,
  test,
  name = "none",
  continuous = FALSE,
  singlevar = FALSE,
  covar = NULL,
  offset = NULL,
  geneticModel = "allelic",
  imputeMethod = NULL,
  MAFweights = "none",
  maxitFirth = 1000L,
  keep = NULL,
  output = NULL,
  append = FALSE,
  returnDF = FALSE,
  methodResampling = NULL,
  resamplingMatrix = NULL,
  resamplingFile = NULL,
  nResampling = 1000L,
  outputResampling = FALSE,
  memlimitResampling = NULL,
  minCallrateVar = 0,
  maxCallrateVar = 1,
  minCallrateSM = 0,
  maxCallrateSM = 1,
  minMAF = 0,
  maxMAF = 1,
  minMAC = 0,
  maxMAC = Inf,
  minCarriers = 0,
  maxCarriers = Inf,
  minCarrierFreq = 0,
  maxCarrierFreq = 1,
  verbose = TRUE
)

Arguments

object: a genoMatrix object
pheno: colData field to test as response variable, the response variable can either be binary (0/1) or continuous. If the response variable is continuous set continuous to TRUE.
test: Vector of statistical tests to run, options include firth,glm,lm,scoreSPA,skat,skat_burden,skato,skat_robust,skato_robust,skat_burden_robust, acatv, acatvSPA, acatvfirth. See assocTest for details.
name: Optional name for the analysis, defaults to "none".
continuous: Is the response variable continuous? (TRUE/FALSE). Defaults to FALSE.
singlevar: Run single variant tests? (TRUE/FALSE). Defaults to FALSE, in which case collapsing tests are run
covar: Character vector of covariates. These should be present in the colData slot of the genoMatrix.
offset: Optional model offset, can be used to account for regenie LOCO predictions.
geneticModel: Which genetic model to apply? ('allelic', 'recessive' or 'dominant'). Defaults to allelic.
imputeMethod: Which imputation method to apply? ('meanImpute' or 'missingToRef'). Defaults to meanImpute
MAFweights: Apply MAF weighting? Currently Madsen-Browning ('mb') is implemented. Defaults to 'none'.
maxitFirth: Maximum number of iterations to use for estimating firth confidence intervals.
keep: Vector of sample IDs to keep, defaults to NULL, in which case all samples are kept.
output: Output file path for results. Defaults to NULL, in which case results are not written to disk, but returned as an rvatResult object.
append: Relevant if the output parameter is not NULL. Should results be appended to output?
returnDF: Return a data.frame rather than a rvatResult. Defaults to FALSE.
methodResampling: Which method to use for resampling? ('permutation' currently implemented) Defaults to NULL, in which case no resampling is performed.
resamplingMatrix: Pre-calculated resampling matrix (n x p), where n = number of samples, and p number of resamplings. Can be generated using buildResamplingFile.
resamplingFile: A resamplingFile object.
nResampling: Number of resamplings to perform if methodResampling is specified.
outputResampling: If TRUE or a filepath, results for each resampling are returned (or saved to the filepath). This can be useful if permutations are used to calculated to estimate correlations among genes for example. Defaults to FALSE in which case resampling is used to calculate resampled P-values, results for individual resamplings are not returned.
memlimitResampling: Maximum number of resamplings to perform at a time. Resampling generates a matrix of n x p, where n is the number of samples and p the number of resamplings thus, for large number of resamplings it can be more efficient to split the permutations in chunks of size memlimitResampling. Defaults to NULL in which case all permutations are performed.
minCallrateVar: Minimum genotype rate for variant retention.
maxCallrateVar: Maximum genotype rate for variant retention.
minCallrateSM: Minimum genotype rate for sample retention.
maxCallrateSM: Maximum genotype rate for sample retention.
minMAF: Minimum minor allele frequency for variant retention.
maxMAF: Maximum minor allele frequency for variant retention.
minMAC: Minimum minor allele count for variant retention.
maxMAC: Maximum minor allele count for variant retention.
minCarriers: Minimum carrier count for variant retention.
maxCarriers: Maximum carrier count for variant retention.
minCarrierFreq: Minimum carrier frequency for variant retention.
maxCarrierFreq: Maximum carrier frequency for variant retention.
verbose: Should the function be verbose? (TRUE/FALSE), defaults to TRUE.

Examples

library(rvatData)
data(GTsmall)

# run a firth burden test on a binary phenotype
rvb <- assocTest(
  GTsmall,
  pheno = "pheno",
  covar = c("PC1", "PC2", "PC3", "PC4"),
  test = "firth",
  name = "example"
)
#>   30/42 variants have zero carriers, these are dropped.
#> [1/1] (5000 samples, 12 variants) pheno | PC1,PC2,PC3,PC4 | allelic

# run ACAT-v and SKAT tests
rvb <- assocTest(
  GTsmall,
  pheno = "pheno",
  covar = c("PC1", "PC2", "PC3", "PC4"),
  test = c("acatvfirth", "skat_burden_robust", "skato_robust"),
  name = "example"
)
#>   30/42 variants have zero carriers, these are dropped.
#> [1/1] (5000 samples, 12 variants) pheno | PC1,PC2,PC3,PC4 | allelic

# run a burden test on a continuous phenotype
rvb <- assocTest(
  GTsmall,
  pheno = "age",
  continuous = TRUE,
  covar = c("PC1", "PC2", "PC3", "PC4"),
  test = c("lm", "skat", "acatv"),
  name = "example"
)
#>   30/42 variants have zero carriers, these are dropped.
#> [1/1] (5000 samples, 12 variants) age | PC1,PC2,PC3,PC4 | allelic

# run single variant tests on a binary phenotype
sv <- assocTest(
  GTsmall,
  pheno = "pheno",
  singlevar = TRUE,
  covar = c("PC1", "PC2", "PC3", "PC4"),
  test = c("firth", "glm", "scoreSPA"),
  name = "example",
  minCarriers = 1
)
#> [1/1] (5000 samples, 12 variants) pheno | PC1,PC2,PC3,PC4 | allelic
#> Warning: 10/12 variants have less than 2 carriers, tests will be skipped for these variants.

# apply variant filters
rvb <- assocTest(
  GTsmall,
  pheno = "pheno",
  covar = c("PC1", "PC2", "PC3", "PC4"),
  test = c("firth", "skat_robust", "acatv"),
  name = "example",
  maxMAF = 0.001,
  minCarriers = 2,
  minCallrateVar = 0.9,
  minCallrateSM = 0.95
)
#> [1/1] (4146 samples, 1 variants) pheno | PC1,PC2,PC3,PC4 | allelic

# Perform MAF-weighted burden tests (madsen-browning)
rvb <- assocTest(
  GTsmall,
  pheno = "pheno",
  covar = c("PC1", "PC2", "PC3", "PC4"),
  test = c("firth", "skat_robust", "acatv"),
  MAFweights = "mb"
)
#>   30/42 variants have zero carriers, these are dropped.
#> [1/1] (5000 samples, 12 variants) pheno | PC1,PC2,PC3,PC4 | allelic (mb-weighted)

# Perform weighted burden tests with custom weights
gdb <- create_example_gdb()
# add cadd scores
caddscores <- getAnno(gdb, "varInfo", VAR_id = rownames(GTsmall))
caddscores <- caddscores[match(rownames(GTsmall), caddscores$VAR_id), ]
# CADD-weighted burden test
rvb <- assocTest(
  recode(GTsmall, weights = caddscores$CADDphred),
  pheno = "pheno",
  covar = c("PC1", "PC2", "PC3", "PC4"),
  test = c("firth", "skat_robust", "acatv")
)
#> Warning: NAs introduced by coercion
#> Warning: For 4 variants weights are missing, these variants are excluded.
#>   26/38 variants have zero carriers, these are dropped.
#> [1/1] (5000 samples, 12 variants) pheno | PC1,PC2,PC3,PC4 | allelic

# Perform recessive burden test
rvb <- assocTest(
  GTsmall,
  pheno = "pheno",
  covar = c("PC1", "PC2", "PC3", "PC4"),
  test = c("firth", "skat_robust", "acatv"),
  geneticModel = "recessive"
)
#>   41/42 variants have zero carriers, these are dropped.
#> [1/1] (5000 samples, 1 variants) pheno | PC1,PC2,PC3,PC4 | recessive

# Resampled burden test
rvb <- assocTest(
  GTsmall,
  pheno = "pheno",
  covar = c("PC1", "PC2", "PC3", "PC4"),
  test = c("skat", "skat_burden", "acatv"),
  name = "example",
  methodResampling = "permutation",
  nResampling = 100
)
#>   30/42 variants have zero carriers, these are dropped.
#> [1/1] (5000 samples, 12 variants) pheno | PC1,PC2,PC3,PC4 | allelic