Navigate the Crossroad of Statistics, ML/AI and Genomic and Health Science
Abstract: Scalable and robust statistical and ML/AI methods and tools play a pivotal role intrustworthy science by accounting uncertainty, empowering scientific discovery, andimproving interpretability. In this talk, I will discuss the challenges and opportunitiesas we navigate the crossroad of statistics and ML/AI to empower genomic and healthscience. Examples include leveraging the AI/ML-generated synthetic data to empowerstatistical analysis of large biobank data in the presence of missing data, and scalableanalysis of the large whole genome sequencing studies and biobanks by leveragingvariant functional annotation and ensemble methods. We will discuss the analysis ofthe UK biobank of 500,000 subjects in the cloud platform RAP and the All of Us dataof 400,000 subjects in the NIH cloud platform AnVIL. This talk aims to ignite proactiveand thought-provoking discussions, foster cross-disciplinary collaboration, andcultivate open-minded approaches to advance scientific discovery.