*Special Date, Time, and Location:
December 10, 2018, 3:00 - 4:00 pm, M3-A805
Donghui Yan, University of Massachussetts Dartmouth
Randon Projection Forests
In this talk, I will introduce a new tool for data mining and inference — Random Projection Forests (rpForests). rpForests is an ensemble of random projection trees constructed recursively through a series of carefully chosen random projections. rpForests combines the power of ensemble methods and the flexibility of trees; it is simple to implement, highly scalable, and readily adapt to the geometry of the underlying data. The ensemble nature of rpForests makes it easy to run in parallel on multicore or clustered computers, with running time nearly inversely proportional to the number of cores or computers used in the computation. This complements previous development of unsupervised extension to Random Forests-Cluster Forests-which aims at clustering by random feature pursuits. One potential use of rpForests is to leverage the locality of data points captured by rpForests, which has the desired property that the probability of neighboring points being separated decays exponentially fast when the ensemble size increases. We discuss two applications along this line, fast k-nearest neighbor (kNN) search and deep representation learning in the scoring of tissue microarray images.