Precision as a measure of predictability of missing links in real networks

Predicting missing links in real networks is an important open problem in network science to which considerable efforts have been devoted, giving as a result a vast plethora of link prediction methods in the literature. In this work, we take a different point of view on the problem and focus on predictability instead of prediction. By considering ensembles defined by well-known network models, we prove analytically that even the best possible link prediction method, given by the ensemble connection probabilities, yields a limited precision that depends quantitatively on the topological properties—such as degree heterogeneity, clustering, and community structure—of the ensemble. This suggests an absolute limitation to the predictability of missing links in real networks, due to the irreducible uncertainty arising from the random nature of link formation processes. We show that a predictability limit can be estimated in real networks, and we propose a method to approximate such a bound from real-world networks with missing links. The predictability limit gives a benchmark to gauge the quality of link prediction methods in real networks.