Perform self-contained or competitive gene set analyses.

geneSetAssoc(
  object,
  geneSet = NULL,
  scoreMatrix = NULL,
  cormatrix = NULL,
  condition = NULL,
  covar = NULL,
  test = c("lm", "glm", "mlm", "fisher", "ttest", "ztest", "ACAT"),
  threshold = NULL,
  Zcutoffs = NULL,
  INT = FALSE,
  scoreCutoffs = NULL,
  minSetSize = 1,
  maxSetSize = Inf,
  oneSided = TRUE,
  memlimit = 1000,
  ID = "unit",
  output = NULL,
  verbose = TRUE
)

Arguments

object

an rvbResult object.

geneSet

a geneSetList or geneSetFile object.

scoreMatrix

A matrix (rows = genes, columns = features) can be provided to perform enrichment analyses on continuous values. These can be used to perform e.g. cell-type enrichment analyses.

cormatrix

a correlation matrix with row and column names corresponding to the units in the rvbResult. Needs to be specified in order to run the 'mlm' (mixed linear model) test. A burden score correlation matrix can be generated using the buildCorMatrix method. The mixed mixed linear model is run using the GENESIS R package.

condition

Perform conditional analyses. Input can be a 1) a geneSetList or geneSetFile, in which case the genesets specified in the geneSet parameter will all be conditioned on the gene sets provided here. 2) a vector of gene set names present in geneSet, all genesets specified in the geneSetList/geneSetFile will be conditioned on the genesets specified here.

covar

a vector of covariates

test

Vector of tests to perform. Currently implemented tests are the competitive tests lm,mlm,fisher and self-contained tests ttest,ztest and ACAT.

threshold

A vector of thresholds for cutoff-based tests (fisher's exact test / glm).

Zcutoffs

A vector (length=2, minimum and maximum) of cutoffs to apply to the Z-scores. Z scores below/above these cutoffs will be set equal to the cutoff.

INT

Apply inverse normal transformation to Z-scores? Defaults to FALSE.

scoreCutoffs

If scoreMatrix is specified, this parameter can be set to cap scores in the scorematrix. It should be a vector of length 2 (minimum and maximum sd). Defaults to NULL, in which case no score cutoffs are applied.

minSetSize

Exclude genesets with size < minSetSize

maxSetSize

Exclude genesets with size > maxSetSize

oneSided

Calculate a one-sided P-value? Defaults to TRUE.

memlimit

Maximum number of genesets to process in one go.

ID

ID column in the rvbResult that corresponds with the IDs used in the geneSetList. Defaults to 'unit'.

output

Optional: save results to specified path

verbose

Should the function be verbose? Defaults to TRUE.

References

Gogarten SM, Sofer T, Chen H, Yu C, Brody JA, Thornton TA, Rice KM, Conomos MP. Genetic association testing using the GENESIS R/Bioconductor package. Bioinformatics. 2019 Dec 15;35(24):5346-5348

Examples

library(rvatData)
data(rvbresults)
res <- rvbresults[rvbresults$test == "firth" & 
                    rvbresults$varSetName == "ModerateImpact", ]

# example genesetlist used in examples below (see ?buildGeneSet on build geneSetLists/geneSetFiles)
genesetlist <- buildGeneSet(
  list("geneset1" = c("SOD1", "NEK1"),
       "geneset2" = c("ABCA4", "SOD1", "NEK1"),
       "geneset3" = c("FUS", "NEK1")
       ))

# Perform competitive gene set analysis using a linear model
GSAresults <- geneSetAssoc(
  res,
  genesetlist,
  covar = c("nvar"),
  test = c("lm")
)
#> 1 Z-scores are +Inf, these are set to the maximum observed Z-score: 4.219.
#> 3 out of 3 sets are kept.

# Outlying gene association scores can be remedied by either setting Z-score cutoffs (i.e. all Z-scores exceeding these values will be set to the respective cutoff), 
# or inverse normal transforming the Z-scores:
GSAresults <- geneSetAssoc(
  res,
  genesetlist,
  covar = c("nvar"),
  test = c("lm"), 
  maxSetSize = 500,
  Zcutoffs = c(-4, 4) # lower and upper bounds
)
#> 0 Z-scores <-4 are set to -4
#> 3 Z-scores >4 are set to 4
#> 3 out of 3 sets are kept.

GSAresults <- geneSetAssoc(
  res,
  genesetlist,
  covar = c("nvar"),
  test = c("lm"), 
  maxSetSize = 500,
  INT = TRUE # perform inverse normal transformation
)
#> 3 out of 3 sets are kept.


# Conditional gene set analyses can be performed to test whether gene sets are associated independently with the phenotype of interest. 
# In the example below we test whether gene sets are independent of geneset1
GSAresults <- geneSetAssoc(
  res,
  condition = getGeneSet(genesetlist, "geneset1"),
  geneSet = genesetlist,
  covar = c("nvar"),
  test = c("lm"), 
  maxSetSize = 500
)
#> 1 Z-scores are +Inf, these are set to the maximum observed Z-score: 4.219.
#> 3 out of 3 sets are kept.

# perform two-sided tests by setting `oneSided = FALSE`
GSAresults <- geneSetAssoc(
  res,
  genesetlist,
  covar = c("nvar"),
  test = c("lm"), 
  maxSetSize = 500,
  oneSided = TRUE
)
#> 1 Z-scores are +Inf, these are set to the maximum observed Z-score: 4.219.
#> 3 out of 3 sets are kept.

# Test whether the proportion of P-values below a specified threshold is greater than the proportion outside of it.
# Using Fisher's exact test
# The `threshold` parameter specifies the P-value cutoff to define significant genes:
GSAresults <- geneSetAssoc(
  res,
  genesetlist,
  test = c("fisher"), 
  threshold = 1e-4,
  maxSetSize = 500
)
#> 1 Z-scores are +Inf, these are set to the maximum observed Z-score: 4.219.
#> 3 out of 3 sets are kept.