Generate subset of gdb, retaining all tables.

Function to allow for generation of a child gdb from a parent gdb, with the option to filter retained variants through table intersection operations and SQL where statements

subsetGdb(
  object,
  output,
  intersection = NULL,
  where = NULL,
  VAR_id = NULL,
  tables = NULL,
  skipIndexes = FALSE,
  overWrite = FALSE,
  verbose = TRUE
)

Arguments

object: A gdb object.
output: Output gdb path (output will be a new gdb file).
intersection: Additional tables to filter through intersection (i.e. variants absent from intersection tables will not appear in output). Multiple tables should be ',' delimited.
where: An SQL compliant where clause to filter output; eg: "CHROM=2 AND POS between 5000 AND 50000 AND AF<0.01 AND (cadd.caddPhred>15 OR snpEff.SIFT='D')".
VAR_id: Retain only variants with matching VAR_id.
tables: Optional, vector of tables to retain from the gdb. By default all tables will be included in the output gdb.
skipIndexes: Flag to skip generation of indexes for var and dosage table (VAR_id;CHROM, POS,REF,ALT). Typically only required if you plan to use gdbConcat to concatenate a series of separately generated gdb files before use. Defaults to FALSE.
overWrite: Flag indicating whether output should be overwritten if it already exists. Defaults to FALSE.
verbose: Should the method be verbose? Defaults to TRUE.

Examples


library(rvatData)
gdb <- create_example_gdb()

# Make a gdb subset that includes only variants annotated to SOD1
output <- tempfile()
subsetGdb(
  gdb,
  intersection = "varInfo",
  where = "gene_name = 'SOD1'",
  output = output
)
#> 2025-02-12 12:29:04	Complete
gdb_subset <- gdb(output)

# Specific tables can be selected to include.
# all other user-uploaded annotation and cohort tables will be excluded
subsetGdb(
  gdb,
  intersection = "varInfo",
  where = "gene_name = 'SOD1'",
  tables = "varInfo",
  output = output,
  overWrite = TRUE
)
#> 2025-02-12 12:29:04	Complete
gdb_subset <- gdb(output)

# subset gdbs based on list of VAR ids
anno <- getAnno(gdb, 
                "var",
                range = data.frame(CHROM = "chr16", start = 31191399, end = 31191605)
                )

subsetGdb(
  gdb,
  VAR_id = anno$VAR_id,
  output = output,
  overWrite = TRUE
)
#> 2025-02-12 12:29:04	Complete
gdb_subset <- gdb(output)