Machine learning helps elucidate chronic kidney disease progression

Science Spotlight

Machine learning helps elucidate chronic kidney disease progression

From the Ma Group, Division of Public Health Sciences

May 20, 2019

Technological advances in recent years allow researchers to easily measure thousands of molecules in individual biological samples, commonly referred to ‘omics’-based experiments. However, the use of these powerful techniques in the lab has been met with challenges in data analysis as the development of analytical methods that can reliably harness information from these large-scale datasets has been more limited. A common technique to understand changes in levels of molecules generated from these experiments, including DNA, RNA, and cellular metabolites, typically involves pathway mapping based on known cellular or metabolic networks. This method is sufficiently robust for the study of many types of molecules, but the coverage of lipid metabolic pathways in commonly used databases is typically incomplete. In a recent paper published in the journal Bioinformatics, Dr. Jing Ma in the Division of Public Health Sciences and collaborators describe a new method for the analysis of lipidomics data. While their work focuses on the relationship between lipids and chronic kidney disease (CKD), this method may be used more broadly to help investigators overcome challenges that accompany the study of various omics datasets.

A small number of biological samples available for omics analyses also adds to the challenges of omics-based research, particularly in lipid metabolism research. Dr. Ma explained, “One major challenge in studying lipid interaction networks is the limited number of patients (samples) compared to the number of lipids, which is a common problem in many omics studies. Statistically, one biological sample helps infer one pair of interactions, and how can one learn all pairwise interactions among lipids if there are not enough biological samples?” The development of new statistical methods has been helpful to address to this problem. “It turns out that one can learn a sparse interaction network given limited number of biological samples, which is actually more interpretable. In addition, my collaborators and I developed novel statistical methodologies for joint estimation of multiple interaction networks by leveraging the fact that lipid networks before and after CKD progression share common interactions. These methodological developments made the current study possible.”

The authors relied on data available from two independent study cohorts, the Clinical Phenotyping Resource and Biobank Core (CPROBE) and the Chronic Renal Insufficiency Cohort (CRIC), to conduct their study. Both of these cohorts include patients across the range of CKD severity and with previously analyzed and published lipidomics results. Data from over 200 CPROBE participants, 79 with early-stage and 135 with advanced-stage CKD, were included in the study. The CRIC study followed patients for a mean of six years which allowed the authors to assess lipidomic signatures in an additional 200 individuals with information on whether they progressed to end-stage CKD. The authors included 285 lipids that were represented in both study datasets and performed a differential network-based enrichment analysis (DNEA) of the lipidomics data that involved three steps. First, network estimation across the different disease stages was conducted to create stage-specific partial correlation networks. This was followed by consensus clustering which extracts stable subnetworks from the consolidated network. The final step involves an enrichment analysis of consensus subnetworks based on differential expression levels of the lipids and the network structure.

Graphical representation of triacylglyceride (TAG) pathway altered in chronic kidney disease progression.

Triacylglyceride (TAG) pathway altered in chronic kidney disease progression. (A) Stages 2 or 3 and (B) Stages 4 or 5 in the CPROBE study. Each node represents a lipid, and node color indicates the lipid’s average concentration level. Blue edges are more likely to be present in early-stage disease. Pink edges are more likely to be present in late-stage CKD. Black edges are equally likely to be present in both conditions.

Image provided by Dr. Jing Ma

The DNEA method was applied to the CPROBE and CRIC datasets to identify differential networks between CKD patients at various disease stages. Triacylglycerides (TAGs) are comprised of a glycerol backbone and three fatty acids and highly abundant in plasma. The TAG lipid cluster network was significantly different in both studies and could distinguish between early-stage and advanced-stage CKD patients (CPROBE cohort, see figure) and between end-stage progressors and non-progressors (CRIC cohort). Closer investigation of the network revealed that TAGs with long-chain polyunsaturated fatty acids were more abundant in advanced CKD, but the TAG network in early-stage CKD had a higher number of edges (see figure). A second differential network comprised of cardiolipins and phosphatidylethanolamines was also identified through the DNEA method. Early-stage CKD and non-progressors had a greater number edges in this second network.

Dr. Ma summarized the major findings, “We use a novel machine learning tool in network analysis to study the interactions among lipids based on data. We can also assess which part of the interaction network is most likely affected by the underlying biological processes, such as CKD progression studied in the paper.  Our method thus provides lipid regulatory pathways and their interactions as potential biomarkers for disease diagnostics.” Furthermore, this new statistical tool will allow scientists across various disciplines to overcome common challenges that accompany omics research and will enable more informative interpretation of results.

When asked about the next steps, Dr. Ma emphasized that the results from this study only reveal associations between lipids and CKD progression and additional “Careful mechanistic model system studies are warranted to test the validity of these hypotheses to confirm biological relevance of these new lipid regulatory networks.” In addition, these methods could be employed more broadly across various diseases, types of experiment design, and datasets. “More generally, I’m interested in studying biological processes using a systems perspective, which involves joint analysis of multiple types of omics data. For example, I’m extending the method to simultaneously analyze metabolomic and microbiome data to understand host-microbe interactions,” said Dr. Ma.


This work was supported by the National Institutes of Health.

Fred Hutch/UW Cancer Consortium member Dr. Jing Ma contributed to this research.

Ma J, Karnovsky A, Afshinnia F, Wigginton J, Rader DJ, Natarajan L, Sharma K, Porter AC, Rahman M, He J, Hamm L, Shafi T, Gipson D, Gadegbeku C, Feldman H, Michailidis G, Pennathur S. 2019. Differential network enrichment analysis reveals novel lipid pathways in chronic kidney disease. Bioinformatics. doi: 10.1093/bioinformatics/btz114.

Additional references:

Guo J, Levina E, Michailidis G, Zhu J. 2011. Joint estimation of multiple graphical models. Biometrika98(1), 1-15.

Ma J, Michailidis G. 2016. Joint structural estimation of multiple graphical models. The Journal of Machine Learning Research17(1), 5777-5824.

Ma J, Shojaie A, Michailidis G. 2016. Network-based pathway enrichment analysis with incomplete network information. Bioinformatics, 32(20), 3165-3174.