Generate weighted variant sets for use in association testing, with partitioning by genomic distances as described (Fier, GenetEpidemiol, 2017).

spatialClust(
  object,
  output,
  varSetName,
  unitTable,
  unitName,
  windowSize,
  overlap,
  intersection = NULL,
  where = NULL,
  weightName = "1",
  posField = "POS",
  minTry = 5,
  warning = TRUE
)

Arguments

object

a gdb object

output

Output file name (output will be gz compressed text).

varSetName

Name to assign varSet grouping. This identifier column is used to allow for subsequent mergeing of multiple varSet files for coordinated analysis of multiple variant filtering/weighting strategies)

unitTable

Table containing aggregation unit mappings.

unitName

Field to utilize for aggregation unit names.

windowSize

Numeric vector to indicate starting fixed window size (number of variants)

overlap

Numeric vector to indicate starting fixed window overlap (number of variants, length must match windowSize)

intersection

Additional tables to filter through intersection (ie variants absent from intersection tables will not appear in output). Multiple tables should be ',' delimited.

where

An SQL compliant where clause to filter output; eg: "CHROM=2 AND POS between 5000 AND 50000 AND AF<0.01 AND (cadd.caddPhred>15 OR snpEff.SIFT='D')".

weightName

Field name for desired variant weighting, must be a column within unitTable or other intersection table. Default value of 1 is equivalent to no weighting.

posField

Column name to take as variants position. Default is 'POS' which typically corresponds to genomics position. Can be reset to use CDS or other coordinates. "HGVSc" is a recognized identifier and CDS coordinates will be extracted automatically.

minTry

Minimum number of variants in varset to perform clustering on. If number of variants < minTry, all variants will be returned as a single cluster.

warning

Raise a warning when clusters can't be generated? Defaults to TRUE. Defaults to 5.

References

Loehlein Fier, H. et al. On the association analysis of genome-sequencing data: A spatial clustering approach for partitioning the entire genome into nonoverlapping windows: F ier et al . Genet. Epidemiol. 41, 332–340 (2017).