Estimating local genetic covariance and correlation (\(\rho\)-HESS)

This page describes the steps to estimate local genetic covariance and correlation from GWAS summary association data. In the first step, HESS computes the eigenvalues of LD matrices, the squared projections of GWAS effect size vector onto the eigenvectors of LD matrices for each trait, and the product of projections of GWAS effect size vectors onto the eigenvectors. In the second step, HESS uses the output from step 1 to obtain estimates of local SNP-heritability of each trait. In the third step, HESS, uses output from step 2 to obtain local genetic covariance estimates and their standard errors.

Step 1 - compute eigenvalues, squared projections, and product of projections

Running the tool

In this step, HESS computes eigenvalues, squared and product of projections of GWAS effect size vector of each trait onto the eigenvectors of LD matrices. The following code provides an example of how to perform this step.

for chrom in $(seq 22)
do
    python hess.py \
        --local-rhog <summary stats for trait 1> <summary stats for trait 2> \
        --chrom $chrom \
        --bfile <reference panel in PLINK format for the specific chromosome> \
        --partition <genome partition file for the specific chromosome> \
        --out step1
done

In the command above, --local-rhog tells HESS to estimate local genetic covariance, and is used to specify GWAS summary statistics data for two traits; --chrom is used to specify the chromosome number; --bfile is used to specify the reference panel for the corresponding chromsome; --partition is used to specify the genome partition file; --out is used to specify the prefix of the output file. For input file format, please refer to Input Format.

( Note: The above for loop can be parallelized by chromosome in case a computer cluster is available. In addition, users can specify the minimum minor allele frequency (MAF) of the SNPs used for estimation through the "--min-maf" flag. The default MAF threshold is 0.05. )

Interpreting the output

After executing the command above, 9 files will be created for each chromosome (i.e. 198 files for all 22 chromosomes in total). The following is an example output obtained for chromosome 22. The format of the output files are identical to the output of local SNP-heritability analysis.

The following two files will be used in the 3rd step to estimate local genetic covariance.

The following three files will be used in the 2nd step to estimate local SNP-heritability of trait 1

The following three files are the same type of files as the previous three files, and will be used in the 2nd step to estimate local SNP-heritability of trait 2

In addition, a log file step1_trait2_chr22.log will be created to document the details of each step.

Step 2 - estimate local SNP-heritability of each trait

Running the tool

In this step, we estimate local SNP-heritability for trait 1 and trait 2 using output from step 1. The following script provide an example of how to perform this step. Please see local SNP-heritability analysis for more detail.

# estimate local SNP-heritability for trait 1
python hess.py --prefix step1_trait1 --out step2_trait1

# estimate local SNP-heritability for trait 2
python hess.py --prefix step1_trait2 --out step2_trait2

Note on re-inflating \(\lambda_{GC}\)

Most GWAS summary stats are corrected for genomic control factor \(\lambda_{GC}\). This could result in a downward bias in the estimated SNP-heritability. If the GWAS summary stats has been corrected for \(\lambda_{GC}\), it is recommended to use the following code to perform step 2.

# estimate local SNP-heritability for trait 1
python hess.py --prefix step1_trait1 --reinflate-lambda-gc <lambda gc to reinflate for trait 1> \
               --out step2_trait1

# estimate local SNP-heritability for trait 2
python hess.py --prefix step1_trait2 --reinflate-lambda-gc <lambda gc to reinflate for trait 2> \
               --out step2_trait2

Interpreting the output

The above command will result in 4 files, 2 for each trait, containing local SNP-heritability estimates at each locus.

In addition, 2 log files will also be created.

Please see local SNP-heritability analysis for more detail.

Step 3 - estimate local genetic covariance and standard error

Estimate phenotypic correlation

\(\rho\)-HESS requires phenotypic correlation between a pair of traits to obtain an unbiased estimates of local genetic covariance. If phenotypic data of the GWAS is available, we recommend to obtain phenotypic correlation of the pair of traits by taking the Pearson correlation between the phenotype values of the two traits.

If individual-level phenotype data are not available, one can obtain an estimate through cross-trait LDSC. The intercept term corresponding to the genetic covariance estimates provides an approximation of phenotypic correlation. More precisely, the estimated phenotypic correlation \(r_{pheno}\) is \[ r_{pheno} = \delta \times \sqrt{N_1 N_2} / N_s, \] where \(\delta\) is the intercept term, \(N_1\) and \(N_2\) sample size for the two GWAS, and \(N_s\) number of shared samples between the two GWASs.

We provide a simple script (misc/estimate_phenocor.py) for obtaining phenotypic correlation from cross-trait LDSC log files.

python misc/estimate_phenocor.py \
    --ldsc-log <cross-trait LDSC log file> \
    --n1 <sample size for GWAS 1> --n2 <sample size for GWAS 2> \
    --ns <number of shared samples>
( Note: If there is no sample overlap between the two GWASs, then one does not need to estimate phenotypic correlation. This is because bias in local genetic covariance estimate are caused by environmental covariance coming from overlapping GWAS samples. And one needs to know the phenotypic correlation to infer environmental covariance. When there is no sample overlap, there is no need to correct for bias caused by environmental covariance. )
( Note: The cross-trait LDSC intercept here should correspond to genetic covariance and not SNP-heritability. )

Running the tool

The following script combines output from step 1 and step 2 to obtain local genetic covariance estimates.

python hess.py \
    --prefix step1 \
    --local-hsqg-est step2_trait1.txt step2_trait2.txt \
    --num-shared <number of overlapping samples in the two GWASs> \
    --pheno-cor <phenotypic correlation between the two traits> \
    --out step3
( Note: When "--num-shared" is set to zero, "--pheno-cor" can be set to any value (e.g. 0.0) and the result will not be affected. Also, note that no for loop is required here. ρ-HESS will automatically look for output from all chromosomes.)

Note on re-inflating \(\lambda_{GC}\)

Most GWAS summary stats are corrected for genomic control factor \(\lambda_{GC}\). This could result in a downward bias in the estimated SNP-heritability. If the GWAS summary stats has been corrected for \(\lambda_{GC}\), it is recommended to use the following code to perform step 2.

python hess.py \
    --prefix step1 \
    --local-hsqg-est step2_trait1.txt step2_trait2.txt \
    --reinflate-lambda-gc <lambda gc to reinflate for trait 1> <lambda gc to reinflate for trait 2> \
    --num-shared <number of overlapping samples in the two GWASs> \
    --pheno-cor <phenotypic correlation between the two traits> \
    --out step3

Other available flags

Interpreting the output

After step3, 2 files will be created. These include

chr  start     end         num_snp k    local_rhog    var          se           z            p
1    10583     1892607     1286    50   1.3906e-06    2.1507e-09   4.6375e-05   0.029985     0.97608
1    1892607   3582736     3045    50   -3.2351e-05   4.0829e-09   6.3898e-05   -0.50629     0.61265
1    3582736   4380811     1622    50   0.00011446    2.9594e-09   5.44e-05     2.104        0.035379
1    4380811   5913893     3790    50   -1.898e-06    3.3276e-09   5.7685e-05   -0.032903    0.97375
...  ...       ...         ...     ...  ...           ...          ...          ...          ...
22   46470495  47596318    2444    50   -1.8303e-05   2.7353e-09   5.23e-05     -0.34997     0.72636
22   47596318  48903703    2997    50   -4.6613e-06   3.1558e-09   5.6176e-05   -0.082977    0.93387
22   48903703  49824534    3773    50   -1.261e-06    3.2769e-09   5.7245e-05   -0.022028    0.98243
22   49824534  51243298    2789    50   6.7939e-05    3.8538e-09   6.2079e-05   1.0944       0.27378
[INFO] Command started at: Wed, 04 Oct 2017 00:29:13
[INFO] Command issued:
[INFO] Total SNP-heritability of trait 1: ...
[INFO] Total SNP-heritability of trait 2: ...
                    ...
[INFO] Genome-wide genetic covariance estimate: ...
[INFO] Genome-wide genetic correlation estimate: ...
[INFO] Command finished at: ...
( Note: We estimate the standard error of genome-wide genetic correlation through jackknife.)