geneSetAssoc.RdPerform self-contained or competitive gene set analyses.
geneSetAssoc(
object,
geneSet = NULL,
scoreMatrix = NULL,
cormatrix = NULL,
condition = NULL,
covar = NULL,
test = c("lm", "glm", "mlm", "fisher", "ttest", "ztest", "ACAT"),
threshold = NULL,
Zcutoffs = NULL,
INT = FALSE,
scoreCutoffs = NULL,
minSetSize = 1,
maxSetSize = Inf,
oneSided = TRUE,
memlimit = 1000,
ID = "unit",
output = NULL,
verbose = TRUE
)an rvbResult object.
a geneSetList or geneSetFile object.
A matrix (rows = genes, columns = features) can be provided to perform enrichment analyses on continuous values. These can be used to perform e.g. cell-type enrichment analyses.
a correlation matrix with row and column names corresponding to the units in the rvbResult.
Needs to be specified in order to run the 'mlm' (mixed linear model) test.
A burden score correlation matrix can be generated using the buildCorMatrix method.
The mixed mixed linear model is run using the GENESIS R package.
Perform conditional analyses. Input can be a 1) a geneSetList or geneSetFile, in which case
the genesets specified in the geneSet parameter will all be conditioned on the gene sets provided here.
2) a vector of gene set names present in geneSet, all genesets specified in the geneSetList/geneSetFile
will be conditioned on the genesets specified here.
a vector of covariates
Vector of tests to perform. Currently implemented tests are the competitive tests lm,mlm,fisher and self-contained tests ttest,ztest and ACAT.
A vector of thresholds for cutoff-based tests (fisher's exact test / glm).
A vector (length=2, minimum and maximum) of cutoffs to apply to the Z-scores. Z scores below/above these cutoffs will be set equal to the cutoff.
Apply inverse normal transformation to Z-scores? Defaults to FALSE.
If scoreMatrix is specified, this parameter can be set to cap scores in the scorematrix.
It should be a vector of length 2 (minimum and maximum sd). Defaults to NULL, in which case no score cutoffs are applied.
Exclude genesets with size < minSetSize
Exclude genesets with size > maxSetSize
Calculate a one-sided P-value? Defaults to TRUE.
Maximum number of genesets to process in one go.
ID column in the rvbResult that corresponds with the IDs used in the geneSetList. Defaults to 'unit'.
Optional: save results to specified path
Should the function be verbose? Defaults to TRUE.
Gogarten SM, Sofer T, Chen H, Yu C, Brody JA, Thornton TA, Rice KM, Conomos MP. Genetic association testing using the GENESIS R/Bioconductor package. Bioinformatics. 2019 Dec 15;35(24):5346-5348
library(rvatData)
data(rvbresults)
res <- rvbresults[rvbresults$test == "firth" &
rvbresults$varSetName == "ModerateImpact", ]
# example genesetlist used in examples below (see ?buildGeneSet on build geneSetLists/geneSetFiles)
genesetlist <- buildGeneSet(
list("geneset1" = c("SOD1", "NEK1"),
"geneset2" = c("ABCA4", "SOD1", "NEK1"),
"geneset3" = c("FUS", "NEK1")
))
# Perform competitive gene set analysis using a linear model
GSAresults <- geneSetAssoc(
res,
genesetlist,
covar = c("nvar"),
test = c("lm")
)
#> 1 Z-scores are +Inf, these are set to the maximum observed Z-score: 4.219.
#> 3 out of 3 sets are kept.
# Outlying gene association scores can be remedied by either setting Z-score cutoffs (i.e. all Z-scores exceeding these values will be set to the respective cutoff),
# or inverse normal transforming the Z-scores:
GSAresults <- geneSetAssoc(
res,
genesetlist,
covar = c("nvar"),
test = c("lm"),
maxSetSize = 500,
Zcutoffs = c(-4, 4) # lower and upper bounds
)
#> 0 Z-scores <-4 are set to -4
#> 3 Z-scores >4 are set to 4
#> 3 out of 3 sets are kept.
GSAresults <- geneSetAssoc(
res,
genesetlist,
covar = c("nvar"),
test = c("lm"),
maxSetSize = 500,
INT = TRUE # perform inverse normal transformation
)
#> 3 out of 3 sets are kept.
# Conditional gene set analyses can be performed to test whether gene sets are associated independently with the phenotype of interest.
# In the example below we test whether gene sets are independent of geneset1
GSAresults <- geneSetAssoc(
res,
condition = getGeneSet(genesetlist, "geneset1"),
geneSet = genesetlist,
covar = c("nvar"),
test = c("lm"),
maxSetSize = 500
)
#> 1 Z-scores are +Inf, these are set to the maximum observed Z-score: 4.219.
#> 3 out of 3 sets are kept.
# perform two-sided tests by setting `oneSided = FALSE`
GSAresults <- geneSetAssoc(
res,
genesetlist,
covar = c("nvar"),
test = c("lm"),
maxSetSize = 500,
oneSided = TRUE
)
#> 1 Z-scores are +Inf, these are set to the maximum observed Z-score: 4.219.
#> 3 out of 3 sets are kept.
# Test whether the proportion of P-values below a specified threshold is greater than the proportion outside of it.
# Using Fisher's exact test
# The `threshold` parameter specifies the P-value cutoff to define significant genes:
GSAresults <- geneSetAssoc(
res,
genesetlist,
test = c("fisher"),
threshold = 1e-4,
maxSetSize = 500
)
#> 1 Z-scores are +Inf, these are set to the maximum observed Z-score: 4.219.
#> 3 out of 3 sets are kept.