changelog.RmdThis release is primarily an internal refactoring focused on improved
robustness: - large source files are split into focused modules.
- almost all methods and functions have been refactored significantly
for clarity, maintainability and robustness.
- test coverage is expanded to >90% for most code.
- there are also several small bug fixes and performance
improvements.
The main user-facing change is the replacement of
aggregateFile with a new SQLite-backed aggdb
class.
The aggregateFile class is replaced by
aggdb, an SQLite-backed class (extending
SQLiteConnection, matching the gdb pattern).
This provides fast queries via SQL instead of sequential file scanning.
Related methods:
collapseAggregateFiles ->
collapseAggDbs
mergeAggregateFiles -> mergeAggDbs
aggregateFileList -> aggdbList
The aggdb stores results in structured tables
(meta, params, SM,
aggregates) and supports efficient merging.
writeVcf, spatialClust: new
memlimit parameter for batch processing, improving
performance on large datasets.concatGdb, uploadAnno,
uploadCohort: new overWrite parameter
(defaults to FALSE). Existing code that relied on implicit
overwrites now needs to pass overWrite = TRUE.concatGdb: targets now accepts a character
vector in addition to a file path. New sample-identity validation across
input gdbs.assocTest.getAnno
method: if the ranges parameter was specified, but none of
the variants in the specified table overlap with those respective
ranges, all rows in the specified table were returned.varSetList
and varSetFile that retrieves genomic ranges for variant
sets. Can be used to retrieve either genomic coordinates or other
user-defined coordinates such as CDS coordinates genearated with the
mapCDS method.Large update! In particular, changes have been made to:
OR field indicating the
odds-ratio for logistic tests (firth and glm).
Importantly: For firth and flm tests
effect/effectSE/effectCIlower/effectCIupper
are now on the log-scale!getGenomeBuild and
getGdbId). The genome build can be set when building the
gdb (genomeBuild parameter), and will then be used
correctly assign ploidies on the sex-chromosomes in downstream analyses
(so no need to set the checkPloidy parameter anymore,
although this is still an option). The gdb identifier can be useful to
track which gdb was used to generate results, as the id will be included
in the metadata of downstream files such as results files and
varSetFiles (see below). Also, it is used methods such as
assocTest and summariseGeno to check whether
supplied varSetFile/varSetList files were generated from the specified
gdb.rvbResult(<file>) and
varSetFile(<file>)). Note that if you read in RVAT
results using non-RVAT functions (e.g. read.table), then
you will have to skip lines that start with ‘#’.--gdb <path to gdb> rather than
--gdb=<path to gdb>
ranges parameter in getGT and
getAnno).getGT()by setting the anno parameter, or
setting inludeVarInfo = TRUE to include the ‘var’ table. If
‘REF’ and ‘ALT’ alleles are included in the annotations, ‘effectAllele’
and ‘otherAllele’ are assigned in the genoMatrix and
automatically updated when alleles are flipped. effectAllele and
otherAllele are passed on as a fields in single variant results.subsetGdb: Now has a VAR_id parameter to
directly subset VAR_ids and a tables parameter to specify
which tables to keep.getCohort now returns records only for the samples
included in the respective cohort, rather than running NA fields for all
excluded samples. The keepAll parameter can be set to
TRUE to return all records, like in previous versions.uploadAnno: now has a keepUnmapped flag,
if set to TRUE (default) variants that do not map to gdb are
discarded.writeVcf now includes a includeVarId
parameter that controls whether VAR_ids should be included in the ‘ID’
field. This parameter defaults to FALSE, in which case the
‘ID’ field from the ‘var’ table is included. Note: this is a
change relative to the previous version, in which VAR_ids were written
to the ‘ID’ field by default. Also, FORMAT field not included anymore
when generating a sites-only vcf (includeGeno=TRUE).mergeAggregateFiles is now split into two methods:
mergeAggregateFiles and
collapseAggregateFiles. mergeAggregateFiles
behaves identically to former mergeAggregateFiles method
with collapse = FALSE, whereas
collapseAggregateFiles behaved identically to former
mergeAggregateFiles method with
collapse = TRUE.