# MiSTi: set-based mixed effects score test for GxE interactions

#### 2018-12-13

This is an illustration for using the package MiSTi to implement set-based GxE interaction test under mixed effects models. To form the input data, make sure that the columns of the data are arranged in the following order [Y E X G], where Y is an outcome variable, E is a matrix of environmental risk factors, X is a matrix of confounder(s), and G is a matrix of genotype(s). The current version of MiSTi allows multiple environmental risk factors (dimension is specified by option m), multiple confounders (dimension is specified by option d), and multiple variants (dimension is specified by option p).

Several R packages are called in MiSTi and should be installed beforehand. These include:

• penalized

• Matrix

• psych

• CompQuadForm (note: built under R version 3.3.2)

• parallel

The main function to implement the integrative GxE analysis is MiSTi. The required inputs include

• data: a nxm dataframe [Y,E,X,G] with rows represent subjects and columns are Y (outcome), E (environmental risk factor), X (confounders), and G (genotypes).

• outcome_type: either “Continuous” for the values of a quantitative trait, or “Binary” for a dichotomous trait.

• p: number of genetic variants.

• m: number of environmental risk factors.

• d: number of confounders. If d is 0, there is no confounder, and the input data should be arranged as [Y E G].

• R: number of burden scores, including both unweighted and weighted ones. It is always greater than or equal to 0. When R is assigned as 0, only variance component test is performed (more than one GxE terms are required in this situation.)

Other options to customize the analysis are available, including

• glm_use: a logical value indicating if generalized linear regression is implemented to fit the null models. A ridge regression is implemented if GLMUse is assigned as FALSE (default).

• weight_method: one of “User”, “No weight”, “ScrnStat1”, and “ScrnStat2”. This specifies weights for calculating burden scores. “User”(default) allows for an equal weight across varaints in the set or user-defined weights for each variants (e.g., functional annotations). The weights are specified by the input option “user_weight”. “No weight” sets all weights as 0 and tests only the variance component. “ScrnStat1” and “ScrnStat2” use screening test statistics as weights, where “ScrnStat1” use the maximum of G-E correlation and marginal association screening test statistics (Jiao et al. 2015) and “ScrnStat2” uses the sum of squares of marginal association and correlation screening test statistics(Gauderman et al. 2013).

• user_weight: a vector or a matrix specifying an equal weight across variants or user-defined weights for calculating weighted burden score. This option only works when weight_method is set as “User”. If weight_method is set as âUserâ and no user_weight is specified, the weight is a vector of 1âs.

• ind_burden_return: a logical value (default: TRUE) indicating if testing results on individual interaction terms between each E and burden score are returned. This option only works when there are more than one ExB terms (i.e. multiple E’s or multiple weights for each varaint).

• combinations_return: a logical value (default: TRUE) indicating if the combination methods, including the data-adpative weighted combination and Fisher’s combination method, are implemented. By assigning FALSE, only the burden and variance component tests are conducted.

• combination_preference Either of “All” (default), or a vector containing “OptMin”, “AdptWt”, or “Fisher” to specify the combination method(s) to be implemented.

• mac_lower_bound The lower bound of minor allele count (MAC) for individual SNPs to be included in the analysis. Default is 6. SNPs with MAC 5 are excluded from the analysis. A warning message is given when any SNPs are removed from the set.

• chisq_app Either “3M” (default) or “4M” for the moment matching (Liu’s) method in quantile approximation in optimal linear combination method. “3M” matches the 3rd moment and “4M” matches the 4th moment of the target and approximate distributions.

• acc A numerical value indicating the precision of Davies method for p-value calculation. Default is 5e-10.

• acc_auto A logical value (default: TRUE) indicating if data adaptive precision is used in optimal linear combination. We recommend to set this as TRUE for computational efficiency.

• accurate_app_threshold A numerical value specifying the threshold to determine when Liu’s and Davies methods are used in quantile approximation in the optimal linear combination method. Default is -log10(0.05).

• max_core An integer specifying the maximum number of cores can be recruited in parallel package. Default is 4 cores.

The references for the methods are:

• Su, Y., Di, C. and Hsu, L. (2016) A unified powerful set-based test for sequencing data analysis of GxE interactions. Biostatistics, 18(1):119-131.

• Jiao, S., Peters, U., Berndt, S., BÃ©zieau, S., Brenner, H., Campbell, P.T., Chan, A.T., ChangâClaude, J., Lemire, M., Newcomb, P.A. and Potter, J.D., 2015. Powerful SetâBased GeneâEnvironment Interaction Testing Framework for Complex Diseases. Genetic epidemiology, 39(8), pp.609-618.

• Gauderman, W.J., Zhang, P., Morrison, J.L. and Lewinger, J.P., 2013. Finding Novel Genes by Testing GÃ E Interactions in a GenomeâWide Association Study. Genetic epidemiology, 37(6), pp.603-613.

The following is an example of having simple burden score (weight being 1’s for all variants) and variance component test in MiSTi.

library(MiSTi)

# Y: binary outcome variable. a vector of length n.
# E: environmental risk factors. a vector of length n or a matrix of n rows.
# X: d confounders
#    Either a vector of length n or a n*d matrix if d is greather than 1.
# G: genotypes of varaints. a n*p matrix.

# A special case with 1 environmental factor.
# Data generation
n = 2000
X = rbinom(n,size=1,prob=0.5)
E = rbinom(n,size=1,prob=0.5)
MAF = runif(10,min=0.001,max=0.01)
G = sapply(MAF,function(maf) rbinom(n,size=1,prob=maf))
eta = E*0.5 + G%*%rnorm(10,mean=0,sd=0.5) + (G*E)%*%rnorm(10,mean=0.5,sd=5)
Y = rbinom(n,size=1,prob=exp(eta)/(1+exp(eta)))
d = 1
p = 10

data = data.frame(Y=Y,E=E,X=X,G=G)
misti = MiSTi(data = data,
outcome_type = "Binary",
d = d,
p = p,
m = 1,
R = 1,
weight_method = "User"
)
misti$pvalue misti$stat
misti$rho # If only the burden and variance component tests are of interest: misti.NoCombination = MiSTi(data = data, outcome_type = "Binary", d = d, p = p, m = 1, R = 1, weight_method = "User", combination_return = FALSE ) misti.NoCombination$pvalue["BurdenComp"]
misti.NoCombination$pvalue["VarComp"] # If only the Fisher's combination method is of interests: misti.FisherCombination = MiSTi(data = data, outcome_type = "Binary", d = d, p = p, m = 1, R = 1, weight_method = "User", combination_return = TRUE, combination_preference = "Fisher" ) misti.FisherCombination$pvalue["Fisher"]

If a functional annotation for the genetic variants are available, one can incorporate the functional annotation into the GxE test. Below is the sample code.

# FA: a vector of length p containing functional annotation for variants in consideration.

FA = runif(p)
misti.FAOnly = MiSTi(data = data,
outcome_type = "Binary",
d = d,
p = p,
m = 1,
R = 1,
weight_method = "User",
user_weight = FA
)

If the user would like to jointly test the effects of two weighted burden scores (e.g., a vector of 1’s and functional annotation), the sample code is shown below. In this case, the p-value of the joint BxE effects of the two burdens and their individual p-values after adjusting for the presence of each other will be returned by default.

misti.2burdens = MiSTi(data = data,
outcome_type = "Binary",
d = d,
p = p,
m = 1,
R = 2,
weight_method = "User",
user_weight = cbind(rep(1,p),FA)
)

Below we demonstrate sample code for situations under which there are multiple environmental factors to be tested, and their joint interaction effect with the genetic varaints are of interest. Extension to situations with functional annotations is straightforward by assigning values to the option ‘user_weight’.

E = sapply(c(0.5,0.5,0.5),function(p_e) rbinom(n,size=1,prob=p_e))
GE = do.call(cbind,sapply(1:ncol(E),simplify=FALSE,function(j) G*E[,j]))
eta = E%*%c(0,0,0) + G%*%rnorm(10,mean=0,sd=0.5) + GE%*%rnorm(ncol(GE),mean=0.5,sd=5)
Y = rbinom(n,size=1,prob=exp(eta)/(1+exp(eta)))
m = 3
data = data.frame(Y=Y,E=E,X=X,G=G)
misti.3Es = MiSTi(data = data,
outcome_type = "Binary",
d = d,
p = p,
m = m,
R = 1,
weight_method = "User"
)
misti.3Es$pvalue misti.3Es$stat
misti.3Es\$rho