assocTest-gdb — assocTest-gdb • rvat

Run assocTest on a gdb object. See the main assocTest page for details.

# S4 method for gdb
assocTest(
  object,
  pheno,
  test,
  cohort = "SM",
  varSet = NULL,
  VAR_id = NULL,
  name = "none",
  continuous = FALSE,
  singlevar = FALSE,
  covar = NULL,
  offset = NULL,
  geneticModel = "allelic",
  imputeMethod = NULL,
  MAFweights = "none",
  maxitFirth = 1000,
  checkPloidy = NULL,
  keep = NULL,
  output = NULL,
  methodResampling = NULL,
  resamplingFile = NULL,
  nResampling = 1000,
  outputResampling = FALSE,
  memlimitResampling = NULL,
  minCallrateVar = 0,
  maxCallrateVar = Inf,
  minCallrateSM = 0,
  maxCallrateSM = Inf,
  minMAF = 0,
  maxMAF = 1,
  minMAC = 0,
  maxMAC = Inf,
  minCarriers = 0,
  maxCarriers = Inf,
  minCarrierFreq = 0,
  maxCarrierFreq = Inf,
  memlimit = 1000,
  verbose = TRUE,
  strict = TRUE
)

Arguments

object: a gdb object
pheno: colData field to test as response variable, the response variable can either be binary (0/1) or continuous. If the response variable is continuous set continuous to TRUE. Multiple phenotypes can be specified, which will then be tested separately.
test: Vector of statistical tests to run, options include firth,glm,lm,nbinom,skat,skat_burden,skato,skat_fwe,skat_burden_fwe, skato_fwe,skat_robust,skato_robust,skat_burden_robust, acatv, acatvSPA. See assocTest for details.
cohort: If a valid cohort name is provided, then the uploaded data for this cohort is used to filter and annotate the genoMatrix object. If not specified, all samples in the gdb will be loaded.
varSet: a varSetFile or varSetList object, alternatively a vector of VAR_ids can be specified using the VAR_id parameter.
VAR_id: a vector of VAR_ids, alternatively the varSet parameter can be specified. If single variant tests are ran, the memlimit argument controls how many variants to analyze at a time.
name: Optional name for the analysis, defaults to "none".
continuous: If the response variable continuous? (TRUE/FALSE). Defaults to FALSE.
singlevar: Run single variant tests? (TRUE/FALSE). Defaults to FALSE, in which case aggregate tests are ran.
covar: Character vector of covariates, or a list of character vectors of covariates in which case each covariate set will be tested separately.
offset: Optional model offset, can be used to account for regenie LOCO predictions.
geneticModel: Which genetic model to apply? ('allelic', 'recessive' or 'dominant'). Defaults to allelic. Multiple geneticModels can be specified, in which case each will be analyzed separately.
imputeMethod: Which imputation method to apply? ('meanImpute' or 'missingToRef'). Defaults to meanImpute.
MAFweights: MAF weighting method. Currently Madsen-Browning ('mb') is implemented, by default no MAF weighting is applied. Multiple MAFweights can be specified, in which case each will be analyzed separately.
maxitFirth: Maximum number of iterations to use for estimating firth confidence intervals. Defaults to 1000.
checkPloidy: Version of the human genome to use when assigning variant ploidy (diploid, XnonPAR, YnonPAR). Accepted inputs are GRCh37, hg19, GRCh38, hg38. If not specified, the genome build in the gdb will be used, if available (included if the genomeBuild parameter was set in buildGdb). Otherwise, if the genome build is not included in the gdb metadata, and no value is provided, then all variants are assigned the default ploidy of "diploid"
keep: Vector of sample IDs to keep, defaults to NULL, in which case all samples are kept.
output: Output file path for results. Defaults to NULL, in which case results are not written.
methodResampling: Which method to use for resampling? ('permutation' currently implemented) Defaults to NULL, in which case no resampling is performed.
resamplingFile: A resamplingFile object.
nResampling: Number of resamplings to perform if methodResampling is specified.
outputResampling: If TRUE or a filepath, results for each resampling are returned (or saved to the filepath). This can be useful if permutations are used to calculated to estimate correlations among genes for example. Defaults to FALSE in which case resampling is used to calculate resampled P-values, results for individual resamplings are not returned.
memlimitResampling: Maximum number of resamplings to perform at a time. Resampling generates a matrix of n x p, where n is the number of samples and p the number of resamplings thus, for large number of resamplings it can be more efficient to split the permutations in chunks of size memlimitResampling. Defaults to NULL in which case all permutations are performed.
minCallrateVar: Minimum genotype rate for variant retention.
maxCallrateVar: Maximum genotype rate for variant retention.
minCallrateSM: Minimum genotype rate for sample retention.
maxCallrateSM: Maximum genotype rate for sample retention.
minMAF: Minimum minor allele frequency for variant retention.
maxMAF: Maximum minor allele frequency for variant retention.
minMAC: Minimum minor allele count for variant retention.
maxMAC: Maximum minor allele count for variant retention.
minCarriers: Minimum carrier count for variant retention.
maxCarriers: Maximum carrier count for variant retention.
minCarrierFreq: Minimum carrier frequency for variant retention.
maxCarrierFreq: Maximum carrier frequency for variant retention.
memlimit: Maximum number of variants to load at once (if VAR_id is specified).
verbose: Should the function be verbose? (TRUE/FALSE), defaults to TRUE.
strict: Should strict checks be performed? Defaults to TRUE. Strict tests currently includes checking whether supplied varSetFile/varSetList was generated from the same gdb as specified in object.

Examples


library(rvatData)
gdb <- create_example_gdb()
varsetfile <- varSetFile(rvat_example("rvatData_varsetfile.txt.gz"))

# for example purposes, upload a small cohort
cohort <- getCohort(gdb, "pheno")
cohort <- cohort[cohort$IID %in% colnames(GTsmall),]
uploadCohort(gdb, name = "phenosmall", value = cohort)
#> Loading cohort 'phenosmall' from interactive R session
#> 10 fields detected (IID,sex,pheno,pop,superPop,PC1,PC2,PC3,PC4,age)
#> 2602 males, 2398 females and 0 unknown gender
#> Retained 5000 of 25000 uploaded samples that could be mapped to dosage matrix
#> [1] 1

# run a firth burden test on a binary phenotype
varsetlist <- getVarSet(varsetfile, unit = c("SOD1", "NEK1"), varSetName = "High")
rvb <- assocTest(gdb,
                 varSet = varsetlist,
                 cohort = "phenosmall",
                 pheno = "pheno",
                 covar = c("PC1", "PC2", "PC3", "PC4"),
                 test = "firth",
                 name = "example")
#> Analysing NEK1
#> Retrieved genotypes for 33 variants
#> 5000/25000 samples in the gdb are present in cohort 'phenosmall'
#> Analysing unit NEK1; varSet High
#> Keeping 5000/5000 available samples for analysis.
#> 33/33 variants are retained for analysis
#> 20/33 variants have zero carriers, these are dropped.
#> Analysing SOD1
#> Retrieved genotypes for 4 variants
#> 5000/25000 samples in the gdb are present in cohort 'phenosmall'
#> Analysing unit SOD1; varSet High
#> Keeping 5000/5000 available samples for analysis.
#> 4/4 variants are retained for analysis
#> 2/4 variants have zero carriers, these are dropped.


# run ACAT-v and SKAT tests
rvb <- assocTest(gdb,
                 varSet = varsetlist,
                 cohort = "phenosmall",
                 pheno = "pheno",
                 covar = c("PC1", "PC2", "PC3", "PC4"),
                 test = c("acatvfirth", "skat_burden_robust", "skato_robust"),
                 name = "example")
#> Analysing NEK1
#> Retrieved genotypes for 33 variants
#> 5000/25000 samples in the gdb are present in cohort 'phenosmall'
#> Analysing unit NEK1; varSet High
#> Keeping 5000/5000 available samples for analysis.
#> 33/33 variants are retained for analysis
#> 20/33 variants have zero carriers, these are dropped.
#> Analysing SOD1
#> Retrieved genotypes for 4 variants
#> 5000/25000 samples in the gdb are present in cohort 'phenosmall'
#> Analysing unit SOD1; varSet High
#> Keeping 5000/5000 available samples for analysis.
#> 4/4 variants are retained for analysis
#> 2/4 variants have zero carriers, these are dropped.

# run a burden test on a continuous phenotype
rvb <- assocTest(gdb,
                 varSet = varsetlist,
                 cohort = "phenosmall",
                 pheno = "age",
                 continuous = TRUE,
                 covar = c("PC1", "PC2", "PC3", "PC4"),
                 test = c("lm", "skat", "acatv"),
                 name = "example")
#> Analysing NEK1
#> Retrieved genotypes for 33 variants
#> 5000/25000 samples in the gdb are present in cohort 'phenosmall'
#> Analysing unit NEK1; varSet High
#> Keeping 5000/5000 available samples for analysis.
#> 33/33 variants are retained for analysis
#> 20/33 variants have zero carriers, these are dropped.
#> Analysing SOD1
#> Retrieved genotypes for 4 variants
#> 5000/25000 samples in the gdb are present in cohort 'phenosmall'
#> Analysing unit SOD1; varSet High
#> Keeping 5000/5000 available samples for analysis.
#> 4/4 variants are retained for analysis
#> 2/4 variants have zero carriers, these are dropped.

# run single variant tests on a binary phenotype 
sv <- assocTest(gdb,
                varSet = varsetlist,
                cohort = "phenosmall",
                pheno = "pheno",
                singlevar = TRUE,
                covar = c("PC1", "PC2", "PC3", "PC4"),
                test = c("firth", "glm", "scoreSPA"),
                name = "example",
                minCarriers = 1)
#> Analysing NEK1
#> Retrieved genotypes for 33 variants
#> 5000/25000 samples in the gdb are present in cohort 'phenosmall'
#> Analysing unit NEK1; varSet High
#> Keeping 5000/5000 available samples for analysis.
#> 13/13 variants are retained for analysis
#> Warning: 13/13 variants have less than 2 carriers, tests will be skipped for these variants.
#> Analysing SOD1
#> Retrieved genotypes for 4 variants
#> 5000/25000 samples in the gdb are present in cohort 'phenosmall'
#> Analysing unit SOD1; varSet High
#> Keeping 5000/5000 available samples for analysis.
#> 2/2 variants are retained for analysis
#> Warning: 2/2 variants have less than 2 carriers, tests will be skipped for these variants.

# similarly a list of VAR_ids can be specified instead of a varSetList/varSetFile
sv <- assocTest(gdb,
                VAR_id = 1:20,
                cohort = "phenosmall",
                pheno = "pheno",
                singlevar = TRUE,
                covar = c("PC1", "PC2", "PC3", "PC4"),
                test = c("firth", "glm", "scoreSPA"),
                name = "example",
                minCarriers = 1)
#> Analysing chunk1
#> Retrieved genotypes for 20 variants
#> 5000/25000 samples in the gdb are present in cohort 'phenosmall'
#> Analysing unit chunk1; varSet none
#> Keeping 5000/5000 available samples for analysis.
#> 8/8 variants are retained for analysis
#> Warning: 4/8 variants have less than 2 carriers, tests will be skipped for these variants.

# apply variant filters
rvb <- assocTest(gdb,
                 varSet = varsetlist,
                 cohort = "phenosmall",
                 pheno = "pheno",
                 covar = c("PC1", "PC2", "PC3", "PC4"),
                 test = c("firth", "skat_robust", "acatv"),
                 name = "example", 
                 maxMAF = 0.05,
                 minCarriers = 1,
                 minCallrateVar = 0.9,
                 minCallrateSM = 0.95)
#> Analysing NEK1
#> Retrieved genotypes for 33 variants
#> 5000/25000 samples in the gdb are present in cohort 'phenosmall'
#> Analysing unit NEK1; varSet High
#> 1609 samples don't pass callRate filters.
#> Keeping 3391/3391 available samples for analysis.
#> 10/10 variants are retained for analysis
#> Analysing SOD1
#> Retrieved genotypes for 4 variants
#> 5000/25000 samples in the gdb are present in cohort 'phenosmall'
#> Analysing unit SOD1; varSet High
#> 255 samples don't pass callRate filters.
#> Keeping 4745/4745 available samples for analysis.
#> 2/2 variants are retained for analysis

# Perform MAF-weighted burden tests (madsen-browning)
rvb <- assocTest(gdb,
                 varSet = varsetlist,
                 cohort = "phenosmall",
                 pheno = "pheno",
                 covar = c("PC1", "PC2", "PC3", "PC4"),
                 test = c("firth", "skat_robust", "acatv"),
                 MAFweights = "mb")
#> Analysing NEK1
#> Retrieved genotypes for 33 variants
#> 5000/25000 samples in the gdb are present in cohort 'phenosmall'
#> Analysing unit NEK1; varSet High
#> Keeping 5000/5000 available samples for analysis.
#> 13/13 variants are retained for analysis
#> Analysing SOD1
#> Retrieved genotypes for 4 variants
#> 5000/25000 samples in the gdb are present in cohort 'phenosmall'
#> Analysing unit SOD1; varSet High
#> Keeping 5000/5000 available samples for analysis.
#> 2/2 variants are retained for analysis

# CADD-weighted burden test
varsetlist_cadd <- getVarSet(varsetfile, unit = c("SOD1", "NEK1"), varSetName = "CADD")
rvb <- assocTest(gdb,
                 varSet = varsetlist_cadd,
                 cohort = "phenosmall",
                 pheno = "pheno",
                 covar = c("PC1", "PC2", "PC3", "PC4"),
                 test = c("firth", "skat_robust", "acatv"))
#> Analysing NEK1
#> Warning: NAs introduced by coercion
#> Retrieved genotypes for 190 variants
#> 5000/25000 samples in the gdb are present in cohort 'phenosmall'
#> Warning: NAs introduced by coercion
#> Analysing unit NEK1; varSet CADD
#> Keeping 5000/5000 available samples for analysis.
#> 119/119 variants are retained for analysis
#> 75/119 variants have zero carriers, these are dropped.
#> Analysing SOD1
#> Warning: NAs introduced by coercion
#> Retrieved genotypes for 49 variants
#> 5000/25000 samples in the gdb are present in cohort 'phenosmall'
#> Warning: NAs introduced by coercion
#> Analysing unit SOD1; varSet CADD
#> Keeping 5000/5000 available samples for analysis.
#> 38/38 variants are retained for analysis
#> 26/38 variants have zero carriers, these are dropped.

# Perform recessive burden test
rvb <- assocTest(gdb,
                 varSet = varsetlist,
                 cohort = "phenosmall",
                 pheno = "pheno",
                 covar = c("PC1", "PC2", "PC3", "PC4"),
                 test = c("firth", "skat_robust", "acatv"),
                 geneticModel = "recessive")
#> Analysing NEK1
#> Retrieved genotypes for 33 variants
#> 5000/25000 samples in the gdb are present in cohort 'phenosmall'
#> Analysing unit NEK1; varSet High
#> Keeping 5000/5000 available samples for analysis.
#> 33/33 variants are retained for analysis
#> 33/33 variants have zero carriers, these are dropped.
#> Warning: Less than two samples have a non-zero burden score, skipping tests.
#> Analysing SOD1
#> Retrieved genotypes for 4 variants
#> 5000/25000 samples in the gdb are present in cohort 'phenosmall'
#> Analysing unit SOD1; varSet High
#> Keeping 5000/5000 available samples for analysis.
#> 4/4 variants are retained for analysis
#> 4/4 variants have zero carriers, these are dropped.
#> Warning: Less than two samples have a non-zero burden score, skipping tests.

# Resampled burden test 
rvb <- assocTest(gdb,
                 varSet = varsetlist[1],
                 cohort = "phenosmall",
                 pheno = "pheno",
                 covar = c("PC1", "PC2", "PC3", "PC4"),
                 test = c("skat", "skat_burden", "acatv"),
                 name = "example",
                 methodResampling = "permutation",
                 nResampling = 100)
#> Analysing NEK1
#> Retrieved genotypes for 33 variants
#> 5000/25000 samples in the gdb are present in cohort 'phenosmall'
#> Analysing unit NEK1; varSet High
#> Keeping 5000/5000 available samples for analysis.
#> 33/33 variants are retained for analysis
#> 20/33 variants have zero carriers, these are dropped.