buildGdb.Rd
Creates a new gdb
file and returns a connection object of type gdb-class.
The gdb can be structured and populated using a provided vcf file.
If no input variant file is provided then only an empty gdb is created.
buildGdb(
vcf,
output,
skipIndexes = FALSE,
skipVarRanges = FALSE,
overWrite = FALSE,
genomeBuild = NULL,
memlimit = 1000,
verbose = TRUE
)
Input vcf file used to structure and populate gdb. Warning this function makes the following of assumptions: 1) strict adherence to vcf format (GT subfield first element in genotype firelds), 2) multiallelic records have been split, 3) desired genotype QC has already been applied (DP,GQ filters), 4) GT values conform to the set 0/0,0/1,1/0,1/1,./.,0|0,0|1,1|0,1|1,.|.. Multiallelic parsing and genotype QC can be performed using vcftools and/or accompanying parser scripts included on the rvat github.
Path for output gdb
file
Flag to skip generation of indexes for var and dosage table (VAR_id;CHROM, POS,REF,ALT). Typically only required if you plan to use gdbConcat to concatenate a series of separately generated gdb files before use
Flag to skip generation of ranged var table. Typically only required if you plan to use gdbConcat to concatenate a series of separately generated gdb files before use
overwrite if output
already exists? Defaults to FALSE
, in which case an error is raised.
Optional genome build to include in the gdb metadata. If specified, it will be used to set ploidies (diploid, XnonPAR, YnonPAR) if the genome build is implemented in RVAT (currently: GRCh37, hg19, GRCh38, hg38).
Maximum number of vcf records to parse at a time, defaults to 1000.
Should the function be verbose? (TRUE/FALSE), defaults to TRUE
.
library(rvatData)
vcfpath <- rvat_example("rvatData.vcf.gz")
gdbpath <- tempfile()
# build a gdb from vcf.
# the genomeBuild parameters stores the genome build in the gdb metadata
# this will be used to assign ploidies on sex chromosomes (diploid, XnonPAR, YnonPAR)
buildGdb(
vcf = vcfpath,
output = gdbpath,
genomeBuild = "GRCh38"
)
#> 2025-02-12 12:22:43 Creating gdb tables
#> 25000 sample IDs detected
#> 2025-02-12 12:22:43 Parsing vcf records
#> 2025-02-12 12:23:10 Processing completed for 1000 records. Committing to db.
#> 2025-02-12 12:23:38 Processing completed for 1802 records. Committing to db.
#> 2025-02-12 12:23:38 Creating var table indexes
#> 2025-02-12 12:23:38 Creating SM table index
#> 2025-02-12 12:23:38 Creating dosage table index
#> 2025-02-12 12:23:38 Creating ranged var table
#> 2025-02-12 12:23:38 Complete
#> rvat gdb object
#> Path: /var/folders/cl/wvc0rvjx4vd5rzt2_fhpmfth0000gp/T//RtmpRC6myU/file1825c6b1e96db
# for large vcfs, the memlimit parameter can be lowered
buildGdb(
vcf = vcfpath,
output = gdbpath,
genomeBuild = "GRCh38",
memlimit = 100,
overWrite = TRUE
)
#> 2025-02-12 12:23:38 Creating gdb tables
#> 25000 sample IDs detected
#> 2025-02-12 12:23:38 Parsing vcf records
#> 2025-02-12 12:23:40 Processing completed for 100 records. Committing to db.
#> 2025-02-12 12:23:42 Processing completed for 200 records. Committing to db.
#> 2025-02-12 12:23:45 Processing completed for 300 records. Committing to db.
#> 2025-02-12 12:23:47 Processing completed for 400 records. Committing to db.
#> 2025-02-12 12:23:50 Processing completed for 500 records. Committing to db.
#> 2025-02-12 12:23:53 Processing completed for 600 records. Committing to db.
#> 2025-02-12 12:23:55 Processing completed for 700 records. Committing to db.
#> 2025-02-12 12:23:58 Processing completed for 800 records. Committing to db.
#> 2025-02-12 12:24:00 Processing completed for 900 records. Committing to db.
#> 2025-02-12 12:24:04 Processing completed for 1000 records. Committing to db.
#> 2025-02-12 12:24:07 Processing completed for 1100 records. Committing to db.
#> 2025-02-12 12:24:11 Processing completed for 1200 records. Committing to db.
#> 2025-02-12 12:24:15 Processing completed for 1300 records. Committing to db.
#> 2025-02-12 12:24:18 Processing completed for 1400 records. Committing to db.
#> 2025-02-12 12:24:23 Processing completed for 1500 records. Committing to db.
#> 2025-02-12 12:24:27 Processing completed for 1600 records. Committing to db.
#> 2025-02-12 12:24:31 Processing completed for 1700 records. Committing to db.
#> 2025-02-12 12:24:34 Processing completed for 1800 records. Committing to db.
#> 2025-02-12 12:24:34 Processing completed for 1802 records. Committing to db.
#> 2025-02-12 12:24:34 Creating var table indexes
#> 2025-02-12 12:24:34 Creating SM table index
#> 2025-02-12 12:24:34 Creating dosage table index
#> 2025-02-12 12:24:34 Creating ranged var table
#> 2025-02-12 12:24:35 Complete
#> rvat gdb object
#> Path: /var/folders/cl/wvc0rvjx4vd5rzt2_fhpmfth0000gp/T//RtmpRC6myU/file1825c6b1e96db
# see ?gdb for more information on gdb-files, see ?concatGdb for concatenate gdb databases