Estimating enrichment of stratified squared trans-ethnic genetic correlation
This page describes the steps to estimate enrichment of stratified squared trans-ethnic genetic correlation, \(\lambda^2(C) ={ {r^2_g(C)} \over {r^2_g} }\), the ratio between squared trans-ethnic genetic correlation of annotation \( C \) and genome-wide squared trans-ethnic genetic correlation.
Typical command
S-LDXR estimates \(\lambda^2(C)\) with the following command.
python <software directory>/s-ldxr.py \
    --gcor <summary stats directory for EAS>/EAS_sumstats.gz \
           <summary stats directory for EUR>/EUR_sumstats.gz \
    --ref-ld-chr <baseline LD score directory>/EAS_EUR_baseline_chr \
                 <AVGLLD LD score directory>/EAS_EUR_avglld_chr \
                 <BSTAT LD score directory>/EAS_EUR_bstat_chr \
                 <ALLELEAGE LD score directory>/EAS_EUR_alleleage_chr \
    --w-ld-chr <regression weight directory>/EAS_EUR_weight_chr \
    --frqfile <EAS MAF directory>/1000G.EAS. \
              <EUR MAF directory>/1000G.EUR. \
    --annot <baseline annotation directory>/baseline. \
            <AVGLLD annotation directory>/avglld. \
            <BSTAT annotation directory>/bstat. \
            <ALLELEAGE annotation directory>/alleleage. \
    --apply-shrinkage 0.5 \
    --save-pseudo-coef \
    --out TRAIT_EAS_EUR.txt
This command typically takes 10 to 15 minutes to run on a stand alone computer.
Here are the meanings of the flags:
- 
--gcorspecifies the summary stats files. This flag takes 2 arguments - summary stats for population 1 and summary stats for population 2.
- 
--ref-ld-chrspecifies prefix of the LD score files. This flag takes one or more arguments – one may put as many LD score files as one wishes.
- 
--w-ld-chrspecifies prefix of the regression weights. These are standardized LD scores calculated from regression SNPs.
- 
--frqfilespecifies prefix of minor allele frequency files.
- 
--annotspecifies prefix of the annotation files. This flags also takes one or more arguments.
- 
--apply-shrinkageadjusts the level of shrinkage (the \(\alpha\) tuning parameter in the paper). This should be a number between 0 and 1.
- 
--save-pseudo-coefIf this flag is specified, jackknife pseudo values of the coefficients will be saved. This flag is optional.
- 
--outspecifies the output file name.
Output
After executing the above command, 5 files will be generated.
- 
TRAIT_EAS_EUR.txtoutput file containing the estimates.
- 
TRAIT_EAS_EUR.txt.loglog file containing information for debugging.
- 
TRAIT_EAS_EUR.txt.pseudo_tau1.gzjackknife pseudo values for \(\tau_C\) coefficients for population 1.
- 
TRAIT_EAS_EUR.txt.pseudo_tau2.gzjackknife pseudo values for \(\tau_C\) coefficients for population 2.
- 
TRAIT_EAS_EUR.txt.pseudo_theta.gzjackknife pseudo values for \(\theta_C\) genetic covariance coefficients.
Estimating \( \lambda^2(C) \) for continuous-valued annotations
The following command estimates enrichment of stratified squared trans-ethnic genetic correlation for quintiles of continuous-valued annotations.
python <software directory>/cont_annot_gcor.py \
    --coef TRAIT_EAS_EUR.txt \
    --frqfile <EAS MAF directory>/1000G.EAS. \
              <EUR MAF directory>/1000G.EUR. \
    --annot <baseline annotation directory>/baseline. \
            <AVGLLD annotation directory>/avglld. \
            <BSTAT annotation directory>/bstat. \
            <ALLELEAGE annotation directory>/alleleage. \
    --names AVGLLD BSTAT ALLELEAGE \
    --nbins 5 \
    --out TRAIT_EAS_EUR_contannot.txt
This step typically takes 2 to 5 minutes to run on a stand alone computer.
Here are the meanings of the flags.
- 
--coefspecifies the output from the previous step. The jackknife pseudo coefficients will be loaded automatically.
- 
--frqfilespecifies prefix of minor allele frequency files.
- 
--annotspecifies prefix of the annotation files. This flags also takes one or more arguments.
- 
--namesspecifies the names of the continuous annotations for which one wishes to compute enrichment at quintiles.
- 
--nbinsspecifies the number of bins to bin the SNPs based on the values of their continuous annotation. The default is 5 (i.e. quintiles).
- 
--outspecifies the output file name.
Additionally, users may use the --apply-shrinkage flag to adjust the level
of shrinkage.
After executing the above command, 2 files will be created.
- 
TRAIT_EAS_EUR_contannot.txtcontains the estimates.
- 
TRAIT_EAS_EUR_contannot.txt.logis the log file for debugging purpose.
Expected \(\lambda^2(C)\) from continuous-valued annotations
Estimating expected \(r^2_g(C)\) and \(\lambda^2(C)\) from continuous-valued annotations requires two steps.
The first step gets the coefficients (\(\tau_{1C}\), \(\tau_{2C}\), and \(\theta_{C}\)) of each continuous-valued annotations
python <software directory>/s-ldxr.py \
    --gcor <summary stats directory for EAS>/EAS_sumstats.gz \
           <summary stats directory for EUR>/EUR_sumstats.gz \
    --ref-ld-chr <base LD score directory>/EAS_EUR_allelic_chr \
                 <AVGLLD LD score directory>/EAS_EUR_allelic_chr \
                 <BSTAT LD score directory>/EAS_EUR_allelic_chr \
                 <ALLELEAGE LD score directory>/EAS_EUR_allelic_chr \
    --w-ld-chr <regression weight directory>/EAS_EUR_weight_chr \
    --frqfile <EAS MAF directory>/1000G.EAS. \
              <EUR MAF directory>/1000G.EUR. \
    --annot <base annotation directory>/base. \
            <AVGLLD annotation directory>/avglld. \
            <BSTAT annotation directory>/bstat. \
            <ALLELEAGE annotation directory>/alleleage. \
    --save-pseudo-coef \
    --out ./TRAIT_EAS_EUR_step1.txt
This command typically takes 2 to 5 minutes to run on a stand alone computer.
The output is the same as that of a typical command.
The second step uses the coefficients from the first step to obtain expected \(r^2_g(C)\) and \(\lambda^2(C)\) from continuous-valued annotations.
python <software directory>/pred_binannot_from_contannot.py \
    --coef ./TRAIT_EAS_EUR_step1.txt \
    --frqfile <EAS MAF directory>/1000G.EAS. \
              <EUR MAF directory>/1000G.EUR. \
    --cont-annot <base annotation directory>/base. \
                 <AVGLLD annotation directory>/avglld. \
                 <BSTAT annotation directory>/bstat. \
                 <ALLELEAGE annotation directory>/alleleage. \
    --bin-annot <base annotation directory>/base. \
                <binary annotation directory>/annot_name. \
    --apply-shrinkage 0.5 \
    --out ./TRAIT_EAS_EUR_step2.txt
This command typically takes 2 to 5 minutes to run on a stand alone computer.
The output is the same as that of the command for continuous-valued annotations.
Interpreting the output
The output files of S-LDXR contain the following columns.
- 
ANNOTname of the annotation
- 
NSNPnumber of SNPs for binary annotations (sum of annotation values for continuous-valued annotations)
- 
STDstandard deviation of the annotation across SNPs
- 
TAU1heritability annotation coefficient of population 1
- 
TAU1_SEstandard error heritability annotation coefficient of population 1
- 
TAU2heritability annotation coefficient of population 2
- 
TAU2_SEstandard error heritability annotation coefficient of population 2
- 
THETAtrans-ethnic genetic covariance annotation coefficient
- 
THETA_SEstandard error of trans-ethnic genetic covariance annotation coefficient
- 
HSQ1stratified heritability in population 1
- 
HSQ1_SEstandard error of stratified heritability in population 1
- 
HSQ2stratified heritability in population 2
- 
HSQ2_SEstandard error of stratified heritability in population 2
- 
GCOVstratified trans-ethnic genetic covariance
- 
GCOV_SEstandard error of stratified trans-ethnic genetic covariance
- 
GCORstratified trans-ethnic genetic correlation
- 
GCOR_SEstandard error for the estimated stratified trans-ethnic genetic correlation
- 
GCORSQstratified squared trans-ethnic genetic correlation
- 
GCORSQ_SEstandard error of stratified squared trans-ethnic genetic correlation
- 
HSQ1_ENRICHMENTheritability enrichment in population 1
- 
HSQ1_ENRICHMENT_SEstandard error of heritability enrichment in population 1
- 
HSQ2_ENRICHMENTheritability enrichment in population 2
- 
HSQ2_ENRICHMENT_SEstandard error of heritability enrichment in population 2
- 
GCOV_ENRICHMENTgenetic covariance enrichment
- 
GCOV_ENRICHMENT_SEstandard error of genetic covariance enrichment
- 
GCORSQ_ENRICHMENTestimated enrichment of stratified squared trans-ethnic genetic correlation enrichment
- 
GCORSQ_ENRICHMENT_SEstandard error of estimated enrichment of stratified squared trans-ethnic genetic correlation
- 
GCORSQ_ENRICHMENT_Pp-value for testing whether enrichment of stratified trans-ethnic genetic correlation is different from 1. Here the p-value is obtained from a t distribution with degree of freedom equal to the number of jackknife blocks minus one, where the test statistic is \( { {\hat{\lambda}^2(C)} \over {s.e.(\hat{\lambda}^2(C)) } }\).
- 
GCOVSQ_DIFFestimated \( \hat{D}^2(C) = \hat{\rho}^2_g(C) - \hat{r}^2_g \hat{h}^2_g(C) \hat{h}^2_g(C) \), the difference between stratified squared trans-ethnic genetic covariance of annotation \( C \), and \( \hat{r}^2_g \hat{h}^2_g(C) \hat{h}^2_g(C) \), the expected squared trans-ethnic genetic covariance based on genome-wide squared trans-ethnic genetic correlation and heritabilities.
- 
GCOVSQ_DIFF_SEstandard error for the estimated \( \hat{D}^2(C) \)
- 
GCOVSQ_DIFF_Pp-value for testing whether \( \hat{D}^2(C) \) is different from 0, obtained from a t distribution with degree of freedom equal to the number of jackknife blocks minus one, where the test statistic is \( { {\hat{D}^2(C)} \over {s.e.(\hat{D}^2(C)) } }\). This test is equivalent to testing whether \( \lambda^2(C) \) is different from 1. But the p-value is better calibrated.