Generate weighted variant sets for use in association testing, with partitioning by genomic distances as described (Fier, GenetEpidemiol, 2017).

spatialClust(
  object,
  output,
  varSetName,
  unitTable,
  unitName,
  windowSize,
  overlap,
  intersection = NULL,
  where = NULL,
  weightName = "1",
  posField = "POS",
  minTry = 5,
  memlimit = 1000L,
  warning = TRUE
)

Arguments

object

a gdb object

output

Output file name (output will be gz compressed text).

varSetName

Name to assign varSet grouping. This identifier column is used to allow for subsequent merging of multiple varSet files for coordinated analysis of multiple variant filtering/weighting strategies.

unitTable

Table containing aggregation unit mappings.

unitName

Field to utilize for aggregation unit names.

windowSize

Numeric vector to indicate starting fixed window size (number of variants)

overlap

Numeric vector to indicate starting fixed window overlap (number of variants, length must match windowSize)

intersection

Additional tables to filter through intersection (i.e. variants absent from intersection tables will not appear in output). Multiple tables should be ',' delimited.

where

An SQL compliant where clause to filter output; e.g.: "CHROM=2 AND POS between 5000 AND 50000 AND AF<0.01 AND (cadd.caddPhred>15 OR snpEff.SIFT='D')".

weightName

Field name for desired variant weighting. Must be a column within unitTable or other intersection table. Default value of 1 is equivalent to no weighting.

posField

Column name to take as variants position. Default is 'POS' which typically corresponds to genomic position. Can be reset to use CDS or other coordinates. "HGVSc" is a recognized identifier and CDS coordinates will be extracted automatically.

minTry

Minimum number of variants in varset to perform clustering on. If number of variants < minTry, all variants will be returned as a single cluster. Defaults to 5.

memlimit

Chunk size used for processing rows. Defaults to 1000.

warning

Raise a warning when clusters can't be generated? Defaults to TRUE.

References

Loehlein Fier, H. et al. On the association analysis of genome-sequencing data: A spatial clustering approach for partitioning the entire genome into nonoverlapping windows. Genet. Epidemiol. 41, 332–340 (2017).