Class to facilitate merging aggdbs

Class to facilitate merging aggdbs. By providing a vector of aggdb filepaths, aggdbList will check whether identical samples are included and if duplicated units are included. The aggdbList can then be used to merge the aggdbs into either a new aggdb, or merge all included aggregates into a single aggregate score per sample (mergeAggDbs).

aggdbList(filelist, checkDups = TRUE)

Arguments

filelist: Character vector of file paths pointing to aggdb files.
checkDups: Logical. If TRUE (the default), an error is raised when unit names are duplicated across the provided aggdbs. If FALSE, duplicated unit names are replaced with unique identifiers.

Value

An aggdbList object.

Initialize an aggdbList object

aggdbList(filelist, checkDups=TRUE): Here filelist is a vector of aggdb filepaths. checkDups is set to TRUE by default, in which case an error raised if unit names are duplicated across aggdbs

Getters

In the following code snippets, x is an aggdbList object.

listUnits(x): Return a vector of all units included across aggdbs in the aggdbList
listSamples(x): Return a vector of all samples included across aggdbs in the aggdbList
listParams(x): Return a list of parameters.
metadata(x): Return metadata.
length(x): Return the number of aggdbs included.

Merge or collapse aggdbs

mergeAggDbs(object, output = NULL, verbose = TRUE): Merge aggDbs, this will generate a new aggdb including all aggregates across provided aggdbs. See mergeAggDbs for details.
collapseAggDbs(object, output = NULL, verbose = TRUE): Collapse aggdbs by aggregating values across aggdbs. This will result in one aggregate score for each sample, representing the aggregate value across aggdbs The output will be a two-column matrix including sample IDs and aggregate scores respectively. See collapseAggDbs for details.

Examples

library(rvatData)
gdb <- create_example_gdb()

# generate two aggregate files
varsetfile <- varSetFile(rvat_example("rvatData_varsetfile.txt.gz"))
aggdb1 <- tempfile()
aggregate(
  x = gdb,
  varSet = getVarSet(varsetfile, unit = c("SOD1", "FUS"), varSetName = "High"),
  maxMAF = 0.001,
  output = aggdb1,
  verbose = FALSE
)

aggdb2 <- tempfile()
aggregate(
  x = gdb,
  varSet = getVarSet(varsetfile, unit = c("NEK1"), varSetName = "High"),
  maxMAF = 0.001,
  output = aggdb2,
  verbose = FALSE
)

# merge using mergeAggDbs
aggdb <- tempfile()
agglist <- aggdbList(c(aggdb1, aggdb2))
mergeAggDbs(
  agglist,
  output = aggdb
)
#> Initializing new aggregate database at: /tmp/RtmpvjdKWE/file1539342599ea7
#>   Writing metadata...
#>   Writing analysis parameters...
#>   Writing sample manifest (SM)...
#> Aggregate database initialized successfully.
#> Merging 'file153934ed89b28'
#> Merging 'file1539373dbdc5a'
#> Merge complete. New aggregate database created at: /tmp/RtmpvjdKWE/file1539342599ea7