geneSetAssoc.RdPerform self-contained or competitive gene set analyses.
geneSetAssoc(
object,
geneSet = NULL,
scoreMatrix = NULL,
cormatrix = NULL,
condition = NULL,
covar = NULL,
test = c("lm", "mlm", "fisher", "ttest", "ztest", "ACAT"),
threshold = NULL,
Zcutoffs = NULL,
INT = FALSE,
scoreCutoffs = NULL,
minSetSize = 1L,
maxSetSize = Inf,
oneSided = TRUE,
memlimit = 1000L,
ID = "unit",
output = NULL,
verbose = TRUE
)An rvbResult object.
A geneSetList or geneSetFile object.
A matrix (rows = genes, columns = features) can be provided to perform enrichment analyses on continuous values. These can be used to perform e.g. cell-type enrichment analyses.
a correlation matrix with row and column names corresponding to the units in the rvbResult.
Needs to be specified in order to run the 'mlm' (mixed linear model) test.
A burden score correlation matrix can be generated using the buildCorMatrix method.
The mixed linear model is run using the GENESIS R package.
Perform conditional analyses.
Input can be 1) a geneSetList or geneSetFile, in which case
the genesets specified in the geneSet parameter will all be conditioned
on the gene sets provided here.
2) a vector of gene set names present in geneSet,
all genesets specified in the geneSetList/geneSetFile will be conditioned
on the genesets specified here.
A vector of covariates
Vector of tests to perform. Currently implemented tests are the competitive tests lm,mlm,fisher and self-contained tests ttest,ztest and ACAT.
A vector of thresholds for cutoff-based tests (fisher's exact test).
A vector (length=2, minimum and maximum) of cutoffs to apply to the Z-scores. Z scores below/above these cutoffs will be set equal to the cutoff.
Apply inverse normal transformation to Z-scores? Defaults to FALSE.
If scoreMatrix is specified,
this parameter can be set to cap scores in the scorematrix.
It should be a vector of length 2 (minimum and maximum sd).
Defaults to NULL, in which case no score cutoffs are applied.
Exclude genesets with size < minSetSize. Defaults to 1.
Exclude genesets with size > maxSetSize. Defaults to Inf.
Calculate a one-sided P-value? Defaults to TRUE.
Maximum number of genesets to process in one go.
ID column in the rvbResult that corresponds with the IDs used in the geneSetList. Defaults to 'unit'.
Optional: save results to specified file path.
Should the function be verbose? Defaults to TRUE.
An object of class gsaResult or a data.frame if scoreMatrix is used.
The columns are described below:
geneSetName: Name of the gene set.
test: Statistical test used.
covar: Covariates included.
threshold: Threshold used for cutoff-based tests.
geneSetSize: Number of genes in the gene set.
genesObs: Number of genes observed in the input results.
effect: Effect size estimate.
effectSE: Standard error of the effect size estimate.
effectCIlower: Lower confidence interval of the effect size estimate.
effectCIupper: Upper confidence interval of the effect size estimate.
P: P-value.
Gogarten SM, Sofer T, Chen H, Yu C, Brody JA, Thornton TA, Rice KM, Conomos MP. Genetic association testing using the GENESIS R/Bioconductor package. Bioinformatics. 2019 Dec 15;35(24):5346-5348
library(rvatData)
data(rvbresults)
res <- rvbresults[
rvbresults$test == "firth" &
rvbresults$varSetName == "ModerateImpact",
]
# example genesetlist used in examples below (see ?buildGeneSet on build geneSetLists/geneSetFiles)
genesetlist <- buildGeneSet(
list(
"geneset1" = c("SOD1", "NEK1"),
"geneset2" = c("ABCA4", "SOD1", "NEK1"),
"geneset3" = c("FUS", "NEK1")
)
)
# Perform competitive gene set analysis using a linear model
GSAresults <- geneSetAssoc(
res,
genesetlist,
covar = c("nvar"),
test = c("lm")
)
#> 1 Z-scores are +Inf, these are set to the maximum observed Z-score: 4.219.
#> 3 out of 3 sets are kept.
# Outlying gene association scores can be remedied by either setting Z-score cutoffs (i.e. all Z-scores exceeding these values will be set to the respective cutoff),
# or inverse normal transforming the Z-scores:
GSAresults <- geneSetAssoc(
res,
genesetlist,
covar = c("nvar"),
test = c("lm"),
maxSetSize = 500,
Zcutoffs = c(-4, 4) # lower and upper bounds
)
#> 0 Z-scores <-4 are set to -4
#> 3 Z-scores >4 are set to 4
#> 3 out of 3 sets are kept.
GSAresults <- geneSetAssoc(
res,
genesetlist,
covar = c("nvar"),
test = c("lm"),
maxSetSize = 500,
INT = TRUE # perform inverse normal transformation
)
#> 3 out of 3 sets are kept.
# Conditional gene set analyses can be performed to test whether gene sets are associated independently with the phenotype of interest.
# In the example below we test whether gene sets are independent of geneset1
GSAresults <- geneSetAssoc(
res,
condition = getGeneSet(genesetlist, "geneset1"),
geneSet = genesetlist,
covar = c("nvar"),
test = c("lm"),
maxSetSize = 500
)
#> 1 Z-scores are +Inf, these are set to the maximum observed Z-score: 4.219.
#> 3 out of 3 sets are kept.
# perform two-sided tests by setting `oneSided = FALSE`
GSAresults <- geneSetAssoc(
res,
genesetlist,
covar = c("nvar"),
test = c("lm"),
maxSetSize = 500,
oneSided = TRUE
)
#> 1 Z-scores are +Inf, these are set to the maximum observed Z-score: 4.219.
#> 3 out of 3 sets are kept.
# Test whether the proportion of P-values below a specified threshold is greater than the proportion outside of it.
# Using Fisher's exact test
# The `threshold` parameter specifies the P-value cutoff to define significant genes:
GSAresults <- geneSetAssoc(
res,
genesetlist,
test = c("fisher"),
threshold = 1e-4,
maxSetSize = 500
)
#> 1 Z-scores are +Inf, these are set to the maximum observed Z-score: 4.219.
#> 3 out of 3 sets are kept.