vignettes/bmassIntro.1.SimulatedData.Rmd
bmassIntro.1.SimulatedData.Rmd
This is a very simple, introductory vignette on how to run bmass
. In this introductory example, we will use a small, simulated dataset to run bmass
and explore some basic output. The dataset is provided with the package.
The purpose of this vignette is to quickly allow users to get up and running with bmass
. For a more advanced introductory vignette, where we download and use the GlobalLipids 2013 dataset as an example, please see here.
bmass
For this introductory vignette, we will be using bmass_SimulatedData1
, bmass_SimulatedData2
, and bmass_SimulatedSigSNPs
. These are simulated datasets created for the purposes of unit testing and this vignette. To load up ‘bmass’ and the datasets, run the following:
library("bmass");
data(bmass_SimulatedData1, bmass_SimulatedData2, bmass_SimulatedSigSNPs)
Phenotypes <- c("bmass_SimulatedData1", "bmass_SimulatedData2")
First, we’ll take a look at the format of one of our simulated datasets:
## Chr BP Marker MAF A1 A2 Direction pValue N
## 1 1 1000 rs1 0.20 A G - 6.0e-13 2500
## 2 1 2000 rs2 0.10 G T - 7.6e-02 2467
## 3 2 3000 rs3 0.06 T G + 3.0e-09 2761
## 4 3 4000 rs4 0.40 C A + 5.6e-03 2310
## 5 3 15000 rs5 0.40 C T + 1.0e-07 2410
## 6 3 21000 rs6 0.37 G A + 5.0e-08 2582
Here you should see nine columns: Chr, BP, Marker, MAF, A1, A2, Direction, pValue, N. bmass
expects each input dataset (representing a single phenotype GWAS) to contain these 9 columns and with these specific headers. Their descriptions are as follows –
Now to run bmass
, all we use is the following code:
Note that at its most basic level, we only need to provide two inputs for bmass()
to properly run: a vector of strings containing the variable names of each of our univariate summary GWAS datasets and a file containing the list of previously associated univariate genome-wide significant SNPs. The former input, the vector of strings, dictates to bmass()
where to find each dataset and what to call each phenotype being analyzed (ie the name of each variable containg the GWAS summary data is expected to correspond to the phenotype represented by that data). The latter input is just a list of each previous GWAS SNP with the following two columns of information, Chr
and BP
:
## Chr BP
## 1 1 1000
## 2 2 3000
Now to begin to get a sense of what bmass()
outputs, we can run the following:
## Length Class Mode
## MarginalSNPs 3 -none- list
## PreviousSNPs 4 -none- list
## NewSNPs 3 -none- list
## LogFile 19 -none- character
## ZScoresCorMatrix 4 -none- numeric
## Models 18 -none- numeric
## ModelPriors 126 -none- numeric
## GWASlogBFMinThreshold 1 -none- numeric
As you can see, there are multiple output variables corresponding to different components of the results. For the purposes of this vignette, we will only touch on the first three variables shown from this summary()
output: PreviousSNPs
, MarginalSNPS
, and NewSNPs
. PreviousSNPs
refers to information on the input GWAS SNPs we used for the analysis, MarginalSNPs
refers to information on the the marginally significant SNPs that were used in the final steps of the analysis, and NewSNPs
refers to information on any SNPs that reach multivariate genome-wide levels of significance.
The main result most users will be immediately interested in is whether there are any new variants identified as genome-wide multivariate significant, and this can be determined by accessing the NewSNPs
sublist:
## Length Class Mode
## SNPs 22 data.frame list
## logBFs 27 -none- numeric
## Posteriors 27 -none- numeric
## ChrBP Chr BP Marker MAF A1 bmass_SimulatedData1_A2
## 9 4_7000 4 7000 rs8 0.15 G C
## bmass_SimulatedData1_Direction bmass_SimulatedData1_pValue
## 9 + 7e-08
## bmass_SimulatedData1_N bmass_SimulatedData1_ZScore
## 9 2514 5.391171
## bmass_SimulatedData2_Direction bmass_SimulatedData2_pValue
## 9 - 6e-08
## bmass_SimulatedData2_N bmass_SimulatedData2_ZScore GWASannot mvstat
## 9 2514 -5.418801 0 145.0667
## mvstat_log10pVal unistat unistat_log10pVal Nmin logBFWeightedAvg
## 9 31.50084 29.36341 7.221849 2514 29.107
As you can see in this example, we have 1 new SNP that reaches multivariate genome-wide significance. This is determined by using the smallest log BF weighted average value among the previous univariate significanct GWAS SNPs (PreviousSNPS
):
## [1] 7.41173
Any MarginalSNPs
with logBFWeightedAvg
greater than this value are designated as new multivariate associations (see eq. 1 in Turchin and Stephens bioRxiv 2019).
For further exploration and details of bmass
output, please see the second introductory vignette.