geneSetAssoc.Rd
Perform self-contained or competitive gene set analyses.
geneSetAssoc(
object,
geneSet = NULL,
scoreMatrix = NULL,
cormatrix = NULL,
condition = NULL,
covar = NULL,
test = c("lm", "glm", "mlm", "fisher", "ttest", "ztest", "ACAT"),
threshold = NULL,
Zcutoffs = NULL,
INT = FALSE,
scoreCutoffs = NULL,
minSetSize = 1,
maxSetSize = Inf,
oneSided = TRUE,
memlimit = 1000,
ID = "unit",
output = NULL,
verbose = TRUE
)
an rvbResult
object.
a geneSetList
or geneSetFile
object.
A matrix (rows = genes, columns = features) can be provided to perform enrichment analyses on continuous values. These can be used to perform e.g. cell-type enrichment analyses.
a correlation matrix with row and column names corresponding to the units in the rvbResult.
Needs to be specified in order to run the 'mlm' (mixed linear model) test.
A burden score correlation matrix can be generated using the buildCorMatrix
method.
The mixed mixed linear model is run using the GENESIS R package.
Perform conditional analyses. Input can be a 1) a geneSetList
or geneSetFile
, in which case
the genesets specified in the geneSet
parameter will all be conditioned on the gene sets provided here.
2) a vector of gene set names present in geneSet
, all genesets specified in the geneSetList/geneSetFile
will be conditioned on the genesets specified here.
a vector of covariates
Vector of tests to perform. Currently implemented tests are the competitive tests lm,mlm,fisher and self-contained tests ttest,ztest and ACAT.
A vector of thresholds for cutoff-based tests (fisher's exact test / glm).
A vector (length=2, minimum and maximum) of cutoffs to apply to the Z-scores. Z scores below/above these cutoffs will be set equal to the cutoff.
Apply inverse normal transformation to Z-scores? Defaults to FALSE
.
If scoreMatrix
is specified, this parameter can be set to cap scores in the scorematrix.
It should be a vector of length 2 (minimum and maximum sd). Defaults to NULL
, in which case no score cutoffs are applied.
Exclude genesets with size < minSetSize
Exclude genesets with size > maxSetSize
Calculate a one-sided P-value? Defaults to TRUE
.
Maximum number of genesets to process in one go.
ID column in the rvbResult that corresponds with the IDs used in the geneSetList. Defaults to 'unit'.
Optional: save results to specified path
Should the function be verbose? Defaults to TRUE
.
Gogarten SM, Sofer T, Chen H, Yu C, Brody JA, Thornton TA, Rice KM, Conomos MP. Genetic association testing using the GENESIS R/Bioconductor package. Bioinformatics. 2019 Dec 15;35(24):5346-5348
library(rvatData)
data(rvbresults)
res <- rvbresults[rvbresults$test == "firth" &
rvbresults$varSetName == "ModerateImpact", ]
# example genesetlist used in examples below (see ?buildGeneSet on build geneSetLists/geneSetFiles)
genesetlist <- buildGeneSet(
list("geneset1" = c("SOD1", "NEK1"),
"geneset2" = c("ABCA4", "SOD1", "NEK1"),
"geneset3" = c("FUS", "NEK1")
))
# Perform competitive gene set analysis using a linear model
GSAresults <- geneSetAssoc(
res,
genesetlist,
covar = c("nvar"),
test = c("lm")
)
#> 1 Z-scores are +Inf, these are set to the maximum observed Z-score: 4.219.
#> 3 out of 3 sets are kept.
# Outlying gene association scores can be remedied by either setting Z-score cutoffs (i.e. all Z-scores exceeding these values will be set to the respective cutoff),
# or inverse normal transforming the Z-scores:
GSAresults <- geneSetAssoc(
res,
genesetlist,
covar = c("nvar"),
test = c("lm"),
maxSetSize = 500,
Zcutoffs = c(-4, 4) # lower and upper bounds
)
#> 0 Z-scores <-4 are set to -4
#> 3 Z-scores >4 are set to 4
#> 3 out of 3 sets are kept.
GSAresults <- geneSetAssoc(
res,
genesetlist,
covar = c("nvar"),
test = c("lm"),
maxSetSize = 500,
INT = TRUE # perform inverse normal transformation
)
#> 3 out of 3 sets are kept.
# Conditional gene set analyses can be performed to test whether gene sets are associated independently with the phenotype of interest.
# In the example below we test whether gene sets are independent of geneset1
GSAresults <- geneSetAssoc(
res,
condition = getGeneSet(genesetlist, "geneset1"),
geneSet = genesetlist,
covar = c("nvar"),
test = c("lm"),
maxSetSize = 500
)
#> 1 Z-scores are +Inf, these are set to the maximum observed Z-score: 4.219.
#> 3 out of 3 sets are kept.
# perform two-sided tests by setting `oneSided = FALSE`
GSAresults <- geneSetAssoc(
res,
genesetlist,
covar = c("nvar"),
test = c("lm"),
maxSetSize = 500,
oneSided = TRUE
)
#> 1 Z-scores are +Inf, these are set to the maximum observed Z-score: 4.219.
#> 3 out of 3 sets are kept.
# Test whether the proportion of P-values below a specified threshold is greater than the proportion outside of it.
# Using Fisher's exact test
# The `threshold` parameter specifies the P-value cutoff to define significant genes:
GSAresults <- geneSetAssoc(
res,
genesetlist,
test = c("fisher"),
threshold = 1e-4,
maxSetSize = 500
)
#> 1 Z-scores are +Inf, these are set to the maximum observed Z-score: 4.219.
#> 3 out of 3 sets are kept.