buildGdb.RdCreates a new gdb file.
The gdb can be structured and populated using a provided vcf file.
buildGdb(
vcf,
output,
skipIndexes = FALSE,
skipVarRanges = FALSE,
overWrite = FALSE,
genomeBuild = NULL,
memlimit = 1000L,
verbose = TRUE
)Input vcf file used to structure and populate gdb. Warning, this function makes the following assumptions:
strict adherence to vcf format (GT subfield first element in genotype fields),
desired genotype QC has already been applied (DP,GQ filters),
GT values conform to the set 0/0,0/1,1/0,1/1,./.,0|0,0|1,1|0,1|1,.|.. Multiallelic parsing and genotype QC can be performed using vcftools and/or accompanying parser scripts included in the rvat repository.
Path for output gdb file.
Flag to skip generation of indexes for var and dosage table (VAR_id;CHROM,POS,REF,ALT).
Typically only required if you plan to use concatGdb to concatenate a series of separately generated gdb files.
Flag to skip generation of ranged var table.
Typically only useful (i.e., faster) if you plan to use concatGdb to concatenate a series of separately generated gdb files.
Overwrite if output already exists? Defaults to FALSE, in which case an error is raised.
Optional genome build to include in the gdb metadata. If specified, it will be used to set ploidies (diploid, XnonPAR, YnonPAR) if the genome build is implemented in RVAT (currently: GRCh37, hg19, GRCh38, hg38).
Maximum number of vcf records to parse at a time, defaults to 1000.
Should the function be verbose? (TRUE/FALSE), defaults to TRUE.
library(rvatData)
vcfpath <- rvat_example("rvatData.vcf.gz")
gdbpath <- tempfile()
# build a gdb from vcf.
# the genomeBuild parameters stores the genome build in the gdb metadata
# this will be used to assign ploidies on sex chromosomes (diploid, XnonPAR, YnonPAR)
buildGdb(
vcf = vcfpath,
output = gdbpath,
genomeBuild = "GRCh38"
)
#> 2026-03-30 18:36:43 Creating gdb tables
#> 25000 sample IDs detected
#> 2026-03-30 18:36:43 Parsing vcf records
#> 2026-03-30 18:36:58 Processing completed for 1000 records. Committing to db.
#> 2026-03-30 18:37:09 Processing completed for 1802 records. Committing to db.
#> 2026-03-30 18:37:09 Creating var table indexes
#> 2026-03-30 18:37:09 Creating SM table index
#> 2026-03-30 18:37:09 Creating dosage table index
#> 2026-03-30 18:37:09 Creating ranged var table
#> 2026-03-30 18:37:09 Complete
# for large vcfs, the memlimit parameter can be lowered
buildGdb(
vcf = vcfpath,
output = gdbpath,
genomeBuild = "GRCh38",
memlimit = 100,
overWrite = TRUE
)
#> Output file '/tmp/RtmpvjdKWE/file1539369511fb2' already exists and is overwritten (`overWrite = TRUE`)
#> 2026-03-30 18:37:09 Creating gdb tables
#> 25000 sample IDs detected
#> 2026-03-30 18:37:09 Parsing vcf records
#> 2026-03-30 18:37:10 Processing completed for 100 records. Committing to db.
#> 2026-03-30 18:37:11 Processing completed for 200 records. Committing to db.
#> 2026-03-30 18:37:12 Processing completed for 300 records. Committing to db.
#> 2026-03-30 18:37:13 Processing completed for 400 records. Committing to db.
#> 2026-03-30 18:37:15 Processing completed for 500 records. Committing to db.
#> 2026-03-30 18:37:16 Processing completed for 600 records. Committing to db.
#> 2026-03-30 18:37:17 Processing completed for 700 records. Committing to db.
#> 2026-03-30 18:37:18 Processing completed for 800 records. Committing to db.
#> 2026-03-30 18:37:19 Processing completed for 900 records. Committing to db.
#> 2026-03-30 18:37:21 Processing completed for 1000 records. Committing to db.
#> 2026-03-30 18:37:22 Processing completed for 1100 records. Committing to db.
#> 2026-03-30 18:37:23 Processing completed for 1200 records. Committing to db.
#> 2026-03-30 18:37:25 Processing completed for 1300 records. Committing to db.
#> 2026-03-30 18:37:26 Processing completed for 1400 records. Committing to db.
#> 2026-03-30 18:37:27 Processing completed for 1500 records. Committing to db.
#> 2026-03-30 18:37:28 Processing completed for 1600 records. Committing to db.
#> 2026-03-30 18:37:29 Processing completed for 1700 records. Committing to db.
#> 2026-03-30 18:37:31 Processing completed for 1800 records. Committing to db.
#> 2026-03-30 18:37:31 Processing completed for 1802 records. Committing to db.
#> 2026-03-30 18:37:31 Creating var table indexes
#> 2026-03-30 18:37:31 Creating SM table index
#> 2026-03-30 18:37:31 Creating dosage table index
#> 2026-03-30 18:37:31 Creating ranged var table
#> 2026-03-30 18:37:31 Complete
# see ?gdb for more information on gdb-files,
# see ?concatGdb for concatenate gdb databases