We propose a novel sparse regression method for Brain-Wide and Genome-Wide association study. Specifically, we impose a low-rank constraint on the weight coefficient matrix and then decompose it into two low-rank matrices, which find relationships in genetic features and in brain imaging features, respectively. We also introduce a sparse acyclic digraph with a sparsity-inducing penalty to take further into account the correlations among the genetic variables, by which it can be possible to identify the representative SNPs that are highly associated with the brain imaging features. We optimize our objective function by jointly tackling low-rank regression and variable selection in a framework. In our method, the low-rank constraint allows us to conduct variable selection with the low-rank representations of the data; the learned low-sparsity weight coefficients allow discarding unimportant variables at the end. The experimental results on the Alzheimer ’s disease Neuroimaging Initiative (ADNI) dataset showed that the proposed method could select the important SNPs to more accurately estimate the brain imaging features than the state-of-the-art methods.
We optimize our objective function by jointly tackling low-rank regression and variable selection in a framework. In our method, the low-rank constraint allows us to conduct variable selection with the low-rank representations of the data; the learned low-sparsity weight coefficients allow discarding unimportant variables at the end.
In the literature, pairwise univariate analysis measures the correlation between individual phenotype and an isolated genotype without considering the potential correlations on the phenotypes and the genotypes. Regularized ridge regression conducts the imaging genetic association study via ordinary least square estimation, which considers the correlations among the variables and but ignores the correlations among the corresponding responses. Proposed to consider the interlinked structures among Single Nucleotide Polymorphisms (SNPs) to output interpretable results. The previous studies were limited in the sense that they didn’t jointly consider the relational information in a unified framework.
The previous studies were limited in the sense that they didn’t jointly consider the relational information in a unified framework.
The main challenge in current imaging-genetic association study comes from the large number of variables from both brain imaging data and genetic data, thus requiring appropriate statistical techniques such as regression, variable selection and sparsity constraint.We propose a novel low-rank variable selection method in a regularization-based linear regression framework by taking correlations inherent in phenotypes and genotypes into account and also avoiding the adverse effect of noise and redundancy. We use the genotype data to regress the phenotype data with a least square regression to consider the correlations between the variables and the responses. We obtained the SNP and structural Magnetic Resonance Imaging (MRI) data of 737 non-Hispanic Caucasian participants from the Alzheimers Disease Neuro imaging Initiative database. We used the MIPAV software1 on all images to conduct anterior commissure-posterior commissure correction, and then corrected the intensity in homogeneity using the N3 algorithm.A robust skull-stripping method was applied to extract only a brain on all structural MR images. The manual edition and intensity in homogeneity correction were followed for better quality. After repeating N3 algorithm three times to remove the cerebellum based on registration and intensity in homogeneity correction, we used FAST algorithm in to segment the structural MR images into three different tissues, i.e., gray matter, white matter, and cerebrospinal fluid. We used HAMMER to conduct registration and obtained the ROI-labeled images, for which we used the Jacob template to dissect a brain into 93 ROIs. For each of the 93 ROIs in a labeled image, we computed the gray matter tissue volume. Thus, for each MR image, we extracted a feature vector of 93 gray matter tissue volumes. We describe the proposed method for the imaging-genetic analysis between the SNPs and the neuro imaging phenotypes.
The proposed low-rank constraint and sparse graph representation regularization in SNPs along with a structured sparsity constraint in a linear regression framework helped to effectively utilize the inherent information in genetic data and brain imaging data, and thus finding informative associations.
We regard the gray matter tissue volume of the Regions Of Interest (ROIs) as a phenotype by assuming their high relations to AD. The manual edition and intensity in homogeneity correction were followed for better quality. After repeating N3 algorithm three times to remove the cerebellum based on registration and intensity in homogeneity correction, we used FAST algorithm in to segment the structural MR images into three different tissues, i.e., gray matter, white matter, and cerebrospinal fluid.
In preprocessing the steps are as follows
1. Image resampling (resize the image to low resolution)
2. IMAGE SEGMENTATION
We assume there is a linear relationship between SNPs and ROIs, but the data are often found to have complex nonlinear relationships. In this case, even though a number of literature have reported that the sparsity-inducing regularization may implicitly result in nonlinear relationship, an explicit assumption of nonlinear relationship (e.g., mapping the original data into kernel space by kernel functions) could be tried in our further work. Second, we only selected 16 ROIs related to AD to conduct SNP selection in this work. It should have other ROIs which are also in relation to SNPs in the imaging-genetic analysis.
3. FEATURE EXTRACTION
The QC criteria for the SNP data include: 1) call rate check per subject and per SNP marker; 2) gender check; 3) sibling pair identification; 4) the Hardy-Weinberg equilibrium test; 5) marker removal by the minor allele frequency; and 6) population stratification. In this paper, we used the MaCH software to impute the missing SNPs satisfied the QC step.
“Support Vector Machine” (SVM) is a supervised machine learning algorithm that can be used for both classification or regression challenges. However, it is mostly used in classification problems. In this algorithm, we plot each data item as a point in n-dimensional space (where n is a number of features you have) with the value of each feature being the value of a particular coordinate. Then, we perform classification by finding the hyper-plane that differentiates the two classes very well.
The system requirement of the project is described and the specification of the software and hardware requirements of the project is described.
Processor Type : Pentium -IV
Speed : 2.4 GHZ
Ram : 128 MB RAM
Hard disk : 20 GB HD
Operating System : Windows 7
Software Programming Package : Matlab R2014a
 N. Filippini, et al., “Anatomically-distinct genetic associations of APOE _4 allele load with regional cortical atrophy in Alzheimer’s disease,” NeuroImage, vol. 44, no. 3, pp. 724–728, 2009.
 X. Zhu, et al., “A novel relational regularization feature selection method for joint regression and classification in AD diagnosis,” Med. Image Anal., vol. 38, pp. 205–214, 2017.
 M. Vounou, et al., “Sparse reduced-rank regression detects genetic associations with voxel-wise longitudinal phenotypes in Alzheimer’s disease,” NeuroImage, vol. 60, no. 1, pp. 700–716, 2012.
 S. L. Rosenthal, et al., “Beta-amyloid toxicity modifier genes and the risk of Alzheimers disease,” Amer. J. Neurodegenerative Disease, vol. 1, no. 2, pp. 191–198, 2012.
 Y. Zhu, X. Zhu, M. Kim, D. Shen, and G. Wu, “Early diagnosis of Alzheimer’s disease by joint feature selection and classification on temporally structured support vector machine,” in Proc. Med. Image Comput. Comput. Assist. Interv., 2016, pp. 264–272.
 K. Xia, et al., “Common genetic variants on 1p13. 2 associate with risk of autism,”Mol. Psychiatry, vol. 19, no. 11, pp. 1212–1219, 2014.
 D. H. Ballard, J. Cho, and H. Zhao, “Comparisons of multi-marker association methods to detect association between a candidate region and disease,” Genetic Epidemiology, vol. 34, no. 3, pp. 201–212, 2010.
 J. Bralten, et al., “Association of the Alzheimer’s gene SORL1 with hippocampal volume in young, healthy adults,” Amer. J. Psychiatry, vol. 168, no. 10, pp. 1083–1089, 2011.
 X. Zhu, X. Li, S. Zhang, Z. Xu, L. Yu, and C. Wang, “Graph PCA hashing for similarity search,” IEEE Trans. Multimedia, doi: 10.1109/TMM.2017.2703636.
 D. P. Hibar, et al., “Voxelwise gene-wide association study (vGeneWAS): Multivariate gene-based association testing in 731 elderly subjects,” NeuroImage, vol. 56, no. 4, pp. 1875–1891, 2011.