Function to allow for generation of a child gdb from a parent gdb, with the option to filter retained variants through table intersection operations and SQL where statements.

subsetGdb(
  object,
  output,
  intersection = NULL,
  where = NULL,
  VAR_id = NULL,
  tables = NULL,
  skipIndexes = FALSE,
  overWrite = FALSE,
  verbose = TRUE
)

Arguments

object

A gdb object.

output

Output gdb path (output will be a new gdb file).

intersection

Additional tables to filter through intersection (i.e. variants absent from intersection tables will not appear in output). Can be a character vector of table names or a single comma-delimited string.

where

An SQL compliant where clause to filter output; e.g.: "CHROM=2 AND POS between 5000 AND 50000 AND AF<0.01 AND (cadd.caddPhred>15 OR snpEff.SIFT='D')".

VAR_id

Character vector of VAR_ids to retain.

tables

Optional, character vector of tables to retain from the gdb. By default all tables will be included in the output gdb.

skipIndexes

Flag to skip generation of indexes for var and dosage table (VAR_id;CHROM,POS,REF,ALT). Typically only required if you plan to use concatGdb to concatenate a series of separately generated gdb files before use. Defaults to FALSE.

overWrite

Flag indicating whether output should be overwritten if it already exists. Defaults to FALSE.

verbose

Should the method be verbose? Defaults to TRUE.

Examples

library(rvatData)
gdb <- create_example_gdb()

# Make a gdb subset that includes only variants annotated to SOD1
output <- tempfile()
subsetGdb(
  gdb,
  intersection = "varInfo",
  where = "gene_name = 'SOD1'",
  output = output
)
#> 2026-03-30 18:38:38	Complete
gdb_subset <- gdb(output)

# Specific tables can be selected to include.
# all other user-uploaded annotation and cohort tables will be excluded
subsetGdb(
  gdb,
  intersection = "varInfo",
  where = "gene_name = 'SOD1'",
  tables = "varInfo",
  output = output,
  overWrite = TRUE
)
#> Output file '/tmp/RtmpvjdKWE/file153931d83a032' already exists and is overwritten (`overWrite = TRUE`)
#> 2026-03-30 18:38:39	Complete
gdb_subset <- gdb(output)

# subset gdbs based on list of VAR ids
anno <- getAnno(
  gdb,
  "var",
  range = data.frame(CHROM = "chr16", start = 31191399, end = 31191605)
)

subsetGdb(
  gdb,
  VAR_id = anno$VAR_id,
  output = output,
  overWrite = TRUE
)
#> Output file '/tmp/RtmpvjdKWE/file153931d83a032' already exists and is overwritten (`overWrite = TRUE`)
#> 2026-03-30 18:38:39	Complete
gdb_subset <- gdb(output)