assocTest-gdb.Rd# S4 method for class 'gdb'
assocTest(
object,
pheno,
test,
cohort = "SM",
varSet = NULL,
VAR_id = NULL,
name = "none",
continuous = FALSE,
singlevar = FALSE,
covar = NULL,
offset = NULL,
geneticModel = "allelic",
imputeMethod = NULL,
MAFweights = "none",
maxitFirth = 1000L,
checkPloidy = NULL,
keep = NULL,
output = NULL,
append = FALSE,
returnDF = FALSE,
methodResampling = NULL,
resamplingMatrix = NULL,
resamplingFile = NULL,
nResampling = 1000L,
outputResampling = FALSE,
memlimitResampling = NULL,
minCallrateVar = 0,
maxCallrateVar = 1,
minCallrateSM = 0,
maxCallrateSM = 1,
minMAF = 0,
maxMAF = 1,
minMAC = 0,
maxMAC = Inf,
minCarriers = 0,
maxCarriers = Inf,
minCarrierFreq = 0,
maxCarrierFreq = 1,
memlimit = 1000L,
verbose = TRUE,
strict = TRUE
)A gdb object
colData field to test as response variable, the response variable
can either be binary (0/1) or continuous.
If the response variable is continuous set continuous to TRUE.
Multiple phenotypes can be specified, which will then be tested separately.
Vector of statistical tests to run,
options include firth,glm,lm,nbinom,skat,skat_burden,skato,skat_robust,skato_robust,skat_burden_robust, acatv, acatvSPA, acatvfirth.
See assocTest for details.
If a valid cohort name is provided, then the uploaded data for this cohort is used to filter and annotate the genoMatrix object. If not specified, all samples in the gdb will be loaded.
A varSetFile or varSetList object.
Alternatively a vector of VAR_ids can be specified using the VAR_id parameter.
A vector of VAR_ids, alternatively the varSet parameter can be specified.
If single variant tests are run, the memlimit argument controls how many variants to analyze at a time.
Optional name for the analysis, defaults to "none".
Is the response variable continuous? (TRUE/FALSE). Defaults to FALSE.
Run single variant tests? (TRUE/FALSE).
Defaults to FALSE, in which case aggregate tests are run.
Character vector of covariates, or a list of character vectors of covariates in which case each covariate set will be tested separately.
Optional model offset, can be used to account for regenie LOCO predictions.
Which genetic model to apply? ('allelic', 'recessive' or 'dominant').
Defaults to allelic.
Multiple geneticModels can be specified, in which case each will be analyzed separately.
Which imputation method to apply? ('meanImpute' or 'missingToRef').
Defaults to meanImpute.
MAF weighting method. Currently Madsen-Browning ('mb') is implemented, by default no MAF weighting is applied. Multiple MAFweights can be specified, in which case each will be analyzed separately.
Maximum number of iterations to use for estimating firth confidence intervals. Defaults to 1000.
Version of the human genome to use when assigning variant ploidy (diploid, XnonPAR, YnonPAR).
Accepted inputs are 'GRCh37', 'hg19', 'GRCh38', 'hg38'.
If not specified, the genome build in the gdb will be used, if available.
(included if the genomeBuild parameter was set in buildGdb).
Otherwise, if the genome build is not included in the gdb metadata,
and no value is provided, then all variants are assigned the default ploidy of "diploid".
Vector of sample IDs to keep, defaults to NULL,
in which case all samples are kept.
Output file path for results.
Defaults to NULL, in which case results are not written.
Relevant if the output parameter is not NULL.
Should results be appended to output?
Return a data.frame rather than a rvatResult. Defaults to FALSE.
Which method to use for resampling? ('permutation' currently implemented).
Defaults to NULL, in which case no resampling is performed.
Pre-calculated resampling matrix (n x p),
where n = number of samples, and p number of resamplings.
Can be generated using buildResamplingFile.
A resamplingFile object.
Number of resamplings to perform if methodResampling is specified.
If TRUE or a filepath,
results for each resampling are returned (or saved to the filepath).
This can be useful if permutations are used to estimate correlations among genes for example.
Defaults to FALSE in which case resampling is used to calculate resampled P-values,
results for individual resamplings are not returned.
Maximum number of resamplings to perform at a time.
Resampling generates a matrix of n x p, where n is the number of samples and p the number of resamplings
thus, for a large number of resamplings it can be more efficient to split the permutations in chunks of size memlimitResampling.
Defaults to NULL in which case all permutations are performed.
Minimum genotype rate for variant retention.
Maximum genotype rate for variant retention.
Minimum genotype rate for sample retention.
Maximum genotype rate for sample retention.
Minimum minor allele frequency for variant retention.
Maximum minor allele frequency for variant retention.
Minimum minor allele count for variant retention.
Maximum minor allele count for variant retention.
Minimum carrier count for variant retention.
Maximum carrier count for variant retention.
Minimum carrier frequency for variant retention.
Maximum carrier frequency for variant retention.
Maximum number of variants to load at once (if VAR_id is specified).
Should the function be verbose? (TRUE/FALSE), defaults to TRUE.
Should strict checks be performed? Defaults to TRUE. Strict tests currently includes
checking whether supplied varSetFile/varSetList was generated from the same gdb as specified in object.
library(rvatData)
gdb <- create_example_gdb()
varsetfile <- varSetFile(rvat_example("rvatData_varsetfile.txt.gz"))
# for example purposes, upload a small cohort
cohort <- getCohort(gdb, "pheno")
cohort <- cohort[cohort$IID %in% colnames(GTsmall), ]
uploadCohort(gdb, name = "phenosmall", value = cohort)
#> Loading cohort 'phenosmall' from interactive R session
#> 10 fields detected (IID,sex,pheno,pop,superPop,PC1,PC2,PC3,PC4,age)
#> 2602 males, 2398 females and 0 unknown sex
# run a firth burden test on a binary phenotype
varsetlist <- getVarSet(
varsetfile,
unit = c("SOD1", "NEK1"),
varSetName = "High"
)
rvb <- assocTest(
gdb,
varSet = varsetlist,
cohort = "phenosmall",
pheno = "pheno",
covar = c("PC1", "PC2", "PC3", "PC4"),
test = "firth",
name = "example"
)
#> Analysing NEK1 -------------------
#> Retrieved genotypes for 33 variants
#> 5000/25000 samples in the gdb are present in cohort 'phenosmall'
#> varSet: High -------------------
#> 20/33 variants have zero carriers, these are dropped.
#> [1/1] (5000 samples, 13 variants) pheno | PC1,PC2,PC3,PC4 | allelic
#> Analysing SOD1 -------------------
#> Retrieved genotypes for 4 variants
#> 5000/25000 samples in the gdb are present in cohort 'phenosmall'
#> varSet: High -------------------
#> 2/4 variants have zero carriers, these are dropped.
#> [1/1] (5000 samples, 2 variants) pheno | PC1,PC2,PC3,PC4 | allelic
# run ACAT-v and SKAT tests
rvb <- assocTest(
gdb,
varSet = varsetlist,
cohort = "phenosmall",
pheno = "pheno",
covar = c("PC1", "PC2", "PC3", "PC4"),
test = c("acatvfirth", "skat_burden_robust", "skato_robust"),
name = "example"
)
#> Analysing NEK1 -------------------
#> Retrieved genotypes for 33 variants
#> 5000/25000 samples in the gdb are present in cohort 'phenosmall'
#> varSet: High -------------------
#> 20/33 variants have zero carriers, these are dropped.
#> [1/1] (5000 samples, 13 variants) pheno | PC1,PC2,PC3,PC4 | allelic
#> Analysing SOD1 -------------------
#> Retrieved genotypes for 4 variants
#> 5000/25000 samples in the gdb are present in cohort 'phenosmall'
#> varSet: High -------------------
#> 2/4 variants have zero carriers, these are dropped.
#> [1/1] (5000 samples, 2 variants) pheno | PC1,PC2,PC3,PC4 | allelic
# run a burden test on a continuous phenotype
rvb <- assocTest(
gdb,
varSet = varsetlist,
cohort = "phenosmall",
pheno = "age",
continuous = TRUE,
covar = c("PC1", "PC2", "PC3", "PC4"),
test = c("lm", "skat", "acatv"),
name = "example"
)
#> Analysing NEK1 -------------------
#> Retrieved genotypes for 33 variants
#> 5000/25000 samples in the gdb are present in cohort 'phenosmall'
#> varSet: High -------------------
#> 20/33 variants have zero carriers, these are dropped.
#> [1/1] (5000 samples, 13 variants) age | PC1,PC2,PC3,PC4 | allelic
#> Analysing SOD1 -------------------
#> Retrieved genotypes for 4 variants
#> 5000/25000 samples in the gdb are present in cohort 'phenosmall'
#> varSet: High -------------------
#> 2/4 variants have zero carriers, these are dropped.
#> [1/1] (5000 samples, 2 variants) age | PC1,PC2,PC3,PC4 | allelic
# run single variant tests on a binary phenotype
sv <- assocTest(
gdb,
varSet = varsetlist,
cohort = "phenosmall",
pheno = "pheno",
singlevar = TRUE,
covar = c("PC1", "PC2", "PC3", "PC4"),
test = c("firth", "glm", "scoreSPA"),
name = "example",
minCarriers = 1
)
#> Analysing NEK1 -------------------
#> Retrieved genotypes for 33 variants
#> 5000/25000 samples in the gdb are present in cohort 'phenosmall'
#> varSet: High -------------------
#> [1/1] (5000 samples, 13 variants) pheno | PC1,PC2,PC3,PC4 | allelic
#> Warning: 13/13 variants have less than 2 carriers, tests will be skipped for these variants.
#> Analysing SOD1 -------------------
#> Retrieved genotypes for 4 variants
#> 5000/25000 samples in the gdb are present in cohort 'phenosmall'
#> varSet: High -------------------
#> [1/1] (5000 samples, 2 variants) pheno | PC1,PC2,PC3,PC4 | allelic
#> Warning: 2/2 variants have less than 2 carriers, tests will be skipped for these variants.
# similarly a list of VAR_ids can be specified instead of a varSetList/varSetFile
sv <- assocTest(
gdb,
VAR_id = 1:20,
cohort = "phenosmall",
pheno = "pheno",
singlevar = TRUE,
covar = c("PC1", "PC2", "PC3", "PC4"),
test = c("firth", "glm", "scoreSPA"),
name = "example",
minCarriers = 1
)
#> Analysing chunk1 -------------------
#> Retrieved genotypes for 20 variants
#> 5000/25000 samples in the gdb are present in cohort 'phenosmall'
#> varSet: none -------------------
#> [1/1] (5000 samples, 8 variants) pheno | PC1,PC2,PC3,PC4 | allelic
#> Warning: 4/8 variants have less than 2 carriers, tests will be skipped for these variants.
# apply variant filters
rvb <- assocTest(
gdb,
varSet = varsetlist,
cohort = "phenosmall",
pheno = "pheno",
covar = c("PC1", "PC2", "PC3", "PC4"),
test = c("firth", "skat_robust", "acatv"),
name = "example",
maxMAF = 0.05,
minCarriers = 1,
minCallrateVar = 0.9,
minCallrateSM = 0.95
)
#> Analysing NEK1 -------------------
#> Retrieved genotypes for 33 variants
#> 5000/25000 samples in the gdb are present in cohort 'phenosmall'
#> varSet: High -------------------
#> [1/1] (3391 samples, 10 variants) pheno | PC1,PC2,PC3,PC4 | allelic
#> Analysing SOD1 -------------------
#> Retrieved genotypes for 4 variants
#> 5000/25000 samples in the gdb are present in cohort 'phenosmall'
#> varSet: High -------------------
#> [1/1] (4745 samples, 2 variants) pheno | PC1,PC2,PC3,PC4 | allelic
# Perform MAF-weighted burden tests (madsen-browning)
rvb <- assocTest(
gdb,
varSet = varsetlist,
cohort = "phenosmall",
pheno = "pheno",
covar = c("PC1", "PC2", "PC3", "PC4"),
test = c("firth", "skat_robust", "acatv"),
MAFweights = "mb"
)
#> Analysing NEK1 -------------------
#> Retrieved genotypes for 33 variants
#> 5000/25000 samples in the gdb are present in cohort 'phenosmall'
#> varSet: High -------------------
#> 20/33 variants have zero carriers, these are dropped.
#> [1/1] (5000 samples, 13 variants) pheno | PC1,PC2,PC3,PC4 | allelic (mb-weighted)
#> Analysing SOD1 -------------------
#> Retrieved genotypes for 4 variants
#> 5000/25000 samples in the gdb are present in cohort 'phenosmall'
#> varSet: High -------------------
#> 2/4 variants have zero carriers, these are dropped.
#> [1/1] (5000 samples, 2 variants) pheno | PC1,PC2,PC3,PC4 | allelic (mb-weighted)
# CADD-weighted burden test
varsetlist_cadd <- getVarSet(
varsetfile,
unit = c("SOD1", "NEK1"),
varSetName = "CADD"
)
rvb <- assocTest(
gdb,
varSet = varsetlist_cadd,
cohort = "phenosmall",
pheno = "pheno",
covar = c("PC1", "PC2", "PC3", "PC4"),
test = c("firth", "skat_robust", "acatv")
)
#> Analysing NEK1 -------------------
#> Retrieved genotypes for 190 variants
#> 5000/25000 samples in the gdb are present in cohort 'phenosmall'
#> varSet: CADD -------------------
#> Warning: For 71 variants weights are missing, these variants are excluded.
#> 75/119 variants have zero carriers, these are dropped.
#> [1/1] (5000 samples, 44 variants) pheno | PC1,PC2,PC3,PC4 | allelic
#> Analysing SOD1 -------------------
#> Retrieved genotypes for 49 variants
#> 5000/25000 samples in the gdb are present in cohort 'phenosmall'
#> varSet: CADD -------------------
#> Warning: For 11 variants weights are missing, these variants are excluded.
#> 26/38 variants have zero carriers, these are dropped.
#> [1/1] (5000 samples, 12 variants) pheno | PC1,PC2,PC3,PC4 | allelic
# Perform recessive burden test
rvb <- assocTest(
gdb,
varSet = varsetlist,
cohort = "phenosmall",
pheno = "pheno",
covar = c("PC1", "PC2", "PC3", "PC4"),
test = c("firth", "skat_robust", "acatv"),
geneticModel = "recessive"
)
#> Analysing NEK1 -------------------
#> Retrieved genotypes for 33 variants
#> 5000/25000 samples in the gdb are present in cohort 'phenosmall'
#> varSet: High -------------------
#> 33/33 variants have zero carriers, these are dropped.
#> Analysing SOD1 -------------------
#> Retrieved genotypes for 4 variants
#> 5000/25000 samples in the gdb are present in cohort 'phenosmall'
#> varSet: High -------------------
#> 4/4 variants have zero carriers, these are dropped.
# Resampled burden test
rvb <- assocTest(
gdb,
varSet = varsetlist[1],
cohort = "phenosmall",
pheno = "pheno",
covar = c("PC1", "PC2", "PC3", "PC4"),
test = c("skat", "skat_burden", "acatv"),
name = "example",
methodResampling = "permutation",
nResampling = 100
)
#> Analysing NEK1 -------------------
#> Retrieved genotypes for 33 variants
#> 5000/25000 samples in the gdb are present in cohort 'phenosmall'
#> varSet: High -------------------
#> 20/33 variants have zero carriers, these are dropped.
#> [1/1] (5000 samples, 13 variants) pheno | PC1,PC2,PC3,PC4 | allelic