assocTest-genoMatrix.Rd
Run assocTest
on a genoMatrix
object. See the main assocTest
page for details.
# S4 method for genoMatrix
assocTest(
object,
pheno,
test,
name = "none",
continuous = FALSE,
singlevar = FALSE,
covar = NULL,
offset = NULL,
overwriteAggregate = TRUE,
geneticModel = c("allelic", "recessive", "dominant"),
imputeMethod = NULL,
MAFweights = "none",
maxitFirth = 1000,
keep = NULL,
output = NULL,
append = FALSE,
returnDF = FALSE,
methodResampling = NULL,
resamplingMatrix = NULL,
nResampling = 1000,
outputResampling = FALSE,
memlimitResampling = NULL,
minCallrateVar = 0,
maxCallrateVar = Inf,
minCallrateSM = 0,
maxCallrateSM = Inf,
minMAF = 0,
maxMAF = 1,
minMAC = 0,
maxMAC = Inf,
minCarriers = 0,
maxCarriers = Inf,
minCarrierFreq = 0,
maxCarrierFreq = Inf,
verbose = TRUE
)
a genoMatrix
object
colData field to test as response variable, the response variable
can either be binary (0/1) or continuous. If the response variable is continuous set
continuous
to TRUE
.
Vector of statistical tests to run,
options include firth,glm,lm,scoreSPA,skat,skat_burden,skato,skat_fwe,skat_burden_fwe,
skato_fwe,skat_robust,skato_robust,skat_burden_robust, acatv, acatvSPA. See assocTest
for details.
Optional name for the analysis, defaults to "none".
Is the response variable continuous? (TRUE/FALSE). Defaults to FALSE
.
Run single variant tests? (TRUE/FALSE).
Defaults to FALSE
, in which case collapsing tests are ran.
Character vector of covariates. These should be present in the colData slot of the genoMatrix.
Optional model offset, can be used to account for regenie LOCO predictions.
In case there is already an aggregate
column in the colData
of the genoMatrix
(i.e. aggregate
has been run on the genoMatrix), should it be overwitten? Defaults to TRUE
.
Which genetic model to apply? ('allelic', 'recessive' or 'dominant').
Defaults to allelic
.
Which imputation method to apply? ('meanImpute' or 'missingToRef').
Defaults to meanImpute
Apply MAF weighting? Currently Madsen-Browning ('mb') is implemented. Defaults to 'none'.
Maximum number of iterations to use for estimating firth confidence intervals.
Vector of sample IDs to keep, defaults to NULL
, in which case all samples are kept.
Output file path for results.
Defaults to NULL
, in which case results are not written to disk, but returned as an rvatResult
object.
Relevant if the output
parameter is not NULL
. Should results be appended to output
?
Return a data.frame rather than a rvatResult. Defaults to FALSE
.
Which method to use for resampling? ('permutation' currently implemented)
Defaults to NULL
, in which case no resampling is performed.
Pre-calculated resampling matrix (n x p), where n = number of samples, and p number of resamplings.
Can be generated using buildResamplingFile
.
Number of resamplings to perform if methodResampling is specified.
If TRUE
or a filepath, results for each resampling are returned (or saved to the filepath).
This can be useful if permutations are used to calculated to estimate correlations among genes for example.
Defaults to FALSE
in which case resampling is used to calculate resampled P-values,
results for individual resamplings are not returned.
Maximum number of resamplings to perform at a time.
Resampling generates a matrix of n x p, where n is the number of samples and p the number of resamplings
thus, for large number of resamplings it can be more efficient to split the permutations in chunks of size memlimitResampling
.
Defaults to NULL
in which case all permutations are performed.
Minimum genotype rate for variant retention.
Maximum genotype rate for variant retention.
Minimum genotype rate for sample retention.
Maximum genotype rate for sample retention.
Minimum minor allele frequency for variant retention.
Maximum minor allele frequency for variant retention.
Minimum minor allele count for variant retention.
Maximum minor allele count for variant retention.
Minimum carrier count for variant retention.
Maximum carrier count for variant retention.
Minimum carrier frequency for variant retention.
Maximum carrier frequency for variant retention.
Should the function be verbose? (TRUE/FALSE), defaults to TRUE
.
library(rvatData)
data(GTsmall)
# run a firth burden test on a binary phenotype
rvb <- assocTest(GTsmall,
pheno = "pheno",
covar = c("PC1", "PC2", "PC3", "PC4"),
test = "firth",
name = "example")
#> Keeping 5000/5000 available samples for analysis.
#> 42/42 variants are retained for analysis
#> 30/42 variants have zero carriers, these are dropped.
# run ACAT-v and SKAT tests
rvb <- assocTest(GTsmall,
pheno = "pheno",
covar = c("PC1", "PC2", "PC3", "PC4"),
test = c("acatvfirth", "skat_burden_robust", "skato_robust"),
name = "example")
#> Keeping 5000/5000 available samples for analysis.
#> 42/42 variants are retained for analysis
#> 30/42 variants have zero carriers, these are dropped.
# run a burden test on a continuous phenotype
rvb <- assocTest(GTsmall,
pheno = "age",
continuous = TRUE,
covar = c("PC1", "PC2", "PC3", "PC4"),
test = c("lm", "skat", "acatv"),
name = "example")
#> Keeping 5000/5000 available samples for analysis.
#> 42/42 variants are retained for analysis
#> 30/42 variants have zero carriers, these are dropped.
# run single variant tests on a binary phenotype
sv <- assocTest(GTsmall,
pheno = "pheno",
singlevar = TRUE,
covar = c("PC1", "PC2", "PC3", "PC4"),
test = c("firth", "glm", "scoreSPA"),
name = "example",
minCarriers = 1
)
#> Keeping 5000/5000 available samples for analysis.
#> 12/42 variants are retained for analysis
#> Warning: 10/12 variants have less than 2 carriers, tests will be skipped for these variants.
# apply variant filters
rvb <- assocTest(GTsmall,
pheno = "pheno",
covar = c("PC1", "PC2", "PC3", "PC4"),
test = c("firth", "skat_robust", "acatv"),
name = "example",
maxMAF = 0.001,
minCarriers = 2,
minCallrateVar = 0.9,
minCallrateSM = 0.95)
#> 854 samples don't pass callRate filters.
#> Keeping 4146/5000 available samples for analysis.
#> 1/42 variants are retained for analysis
# Perform MAF-weighted burden tests (madsen-browning)
rvb <- assocTest(GTsmall,
pheno = "pheno",
covar = c("PC1", "PC2", "PC3", "PC4"),
test = c("firth", "skat_robust", "acatv"),
MAFweights = "mb")
#> Keeping 5000/5000 available samples for analysis.
#> 42/42 variants are retained for analysis
#> 30/42 variants have zero carriers, these are dropped.
# Perform weighted burden tests with custom weights
gdb <- gdb(rvat_example("rvatData.gdb"))
# add cadd scores
caddscores <- getAnno(gdb, "varInfo", VAR_id = rownames(GTsmall))
caddscores <- caddscores[match(rownames(GTsmall), caddscores$VAR_id),]
# CADD-weighted burden test
rvb <- assocTest(recode(GTsmall, weights = caddscores$CADDphred),
pheno = "pheno",
covar = c("PC1", "PC2", "PC3", "PC4"),
test = c("firth", "skat_robust", "acatv"))
#> Warning: NAs introduced by coercion
#> Keeping 5000/5000 available samples for analysis.
#> Warning: For 4 variants weights are missing, these variants are excluded.
#> 38/42 variants are retained for analysis
#> 26/38 variants have zero carriers, these are dropped.
# Perform recessive burden test
rvb <- assocTest(GTsmall,
pheno = "pheno",
covar = c("PC1", "PC2", "PC3", "PC4"),
test = c("firth", "skat_robust", "acatv"),
geneticModel = "recessive")
#> Keeping 5000/5000 available samples for analysis.
#> 42/42 variants are retained for analysis
#> 41/42 variants have zero carriers, these are dropped.
# Resampled burden test
rvb <- assocTest(GTsmall,
pheno = "pheno",
covar = c("PC1", "PC2", "PC3", "PC4"),
test = c("skat", "skat_burden", "acatv"),
name = "example",
methodResampling = "permutation",
nResampling = 100)
#> Keeping 5000/5000 available samples for analysis.
#> 42/42 variants are retained for analysis
#> 30/42 variants have zero carriers, these are dropped.