Set-based Test

This feature allows you to perform a simple set-based test on a GWAS dataset. Results will include the P-value from the test, as well as whether it passed or failed based on the P-value threshold.

You may provide your own LD matrix, so long as it is the same length as your provided dataset. If no LD matrix is provided, LocusFocus will generate one for you given your dataset and a selected population from the 1000 Genomes datasets.

The set-based test performed can be found under “Equation 2” in the LocusFocus Paper.

Specifying Multiple Regions

The set-based test form allows you to specify multiple regions of interest within your dataset to focus on. Regions need to be provided in 1-start, fully-closed position format <chromosome>:<start>-<end> (eg. 1:205,500,000-206,000,000 which includes all positions, including start position 205,500,000 and end position 206,000,000). Commas are optional, but help for readability.

Regions are used to subset the dataset (and LD, if provided). Any SNPs that fall outside of any provided regions will be removed.

If no regions are provided, then the entire dataset will be used.

Generating LD with multiple regions

If you do not provide an LD matrix, this tool will generate one for you using your provided dataset, one of the selected 1000 Genomes populations, and your provided list of regions.

To generate LD from multiple regions, we use the following set of steps:

  1. Combine regions into “near regions”.

    We call two regions “near” if they are on the same chromosome, and if either the gap between the regions is less than 2 Mbp (<2,000,000) or the regions are overlapping. If two regions are “near”, then they are combined into one larger region where

    • The start position is the lesser of the two regions’ start positions, and

    • The end position is the greater of the two regions’ end positions.

    This process is performed until all remaining regions are not “near”.

  2. Calculate one LD matrix for each “near region”.

    We use PLINK to generate an LD matrix for each near region. SNPs from your dataset are subsetted to each near region, then provided to PLINK with your selected 1000 Genomes population. Each LD matrix is saved.

  3. Combine multiple LDs into a sparse block-diagonal matrix.

    Before performing the set-based test, all LD matrices generated for this session are combined into a sparse, block-diagonal matrix. The SNPs in your provided dataset are re-ordered as necessary for this step.

Running multiple tests at once

If you would like to run multiple tests in one go using the same dataset, check the box labelled “Run separate tests for each region?”. This will perform a separate set-based test for each region specified. No checks for overlapping regions or “near” regions are performed when this box is checked.

If you provide your own LD matrix, then both your dataset and LD will be subsetted to each region before performing each test.

If you do not provide an LD matrix, then we generate an LD matrix for each region using your dataset and the selected 1000 Genomes population.