The Curse of Popularity: How High Degree Nodes in Graphs Distort the Relevance Estimated by Gaussian Random Projection 

Tvrtko Tadic (Microsoft)
-
CLK 219

Random projections (RP) provide a simple and elegant approach to perform dimensionality reduction. In the context of large graphs, they can be applied to generate low-dimensional encodings of a node’s connectivity to other nodes in the graph. In this work, we provide an in-depth analysis of the behavior of three popular ways to measure relevance between the nodes estimated by Random Projections. We conclude that in two cases relevance estimation for high degree nodes will often be a significant miss. Such fail modes are important in practical applications, since high degree nodes often exert high-influence in the graph. We develop results for the specific use of random projection in ranking, and illustrate its consequences with a real graph dataset from Wikipedia. This work builds on many experiments the Graph Intelligence Sciences team at Microsoft did to compute relevance between entities (email, documents, people, events, ...) in Office 365. Joint work with Cassiano Becker.

Event Type
Event Subcalendar