subsetGdb.RdFunction to allow for generation of a child gdb from a parent gdb, with the option to filter retained variants through table intersection operations and SQL where statements.
subsetGdb(
object,
output,
intersection = NULL,
where = NULL,
VAR_id = NULL,
tables = NULL,
skipIndexes = FALSE,
overWrite = FALSE,
verbose = TRUE
)A gdb object.
Output gdb path (output will be a new gdb file).
Additional tables to filter through intersection (i.e. variants absent from intersection tables will not appear in output). Can be a character vector of table names or a single comma-delimited string.
An SQL compliant where clause to filter output; e.g.: "CHROM=2 AND POS between 5000 AND 50000 AND AF<0.01 AND (cadd.caddPhred>15 OR snpEff.SIFT='D')".
Character vector of VAR_ids to retain.
Optional, character vector of tables to retain from the gdb. By default all tables will be included in the output gdb.
Flag to skip generation of indexes for var and dosage table (VAR_id;CHROM,POS,REF,ALT).
Typically only required if you plan to use concatGdb to concatenate a series of
separately generated gdb files before use. Defaults to FALSE.
Flag indicating whether output should be overwritten if it already exists.
Defaults to FALSE.
Should the method be verbose? Defaults to TRUE.
library(rvatData)
gdb <- create_example_gdb()
# Make a gdb subset that includes only variants annotated to SOD1
output <- tempfile()
subsetGdb(
gdb,
intersection = "varInfo",
where = "gene_name = 'SOD1'",
output = output
)
#> 2026-03-30 18:38:38 Complete
gdb_subset <- gdb(output)
# Specific tables can be selected to include.
# all other user-uploaded annotation and cohort tables will be excluded
subsetGdb(
gdb,
intersection = "varInfo",
where = "gene_name = 'SOD1'",
tables = "varInfo",
output = output,
overWrite = TRUE
)
#> Output file '/tmp/RtmpvjdKWE/file153931d83a032' already exists and is overwritten (`overWrite = TRUE`)
#> 2026-03-30 18:38:39 Complete
gdb_subset <- gdb(output)
# subset gdbs based on list of VAR ids
anno <- getAnno(
gdb,
"var",
range = data.frame(CHROM = "chr16", start = 31191399, end = 31191605)
)
subsetGdb(
gdb,
VAR_id = anno$VAR_id,
output = output,
overWrite = TRUE
)
#> Output file '/tmp/RtmpvjdKWE/file153931d83a032' already exists and is overwritten (`overWrite = TRUE`)
#> 2026-03-30 18:38:39 Complete
gdb_subset <- gdb(output)