Majid Yazdani awarded the EPFL PhD degree for his work on Similarity Learning Over Large Collaborative Networks.
Abstract
In this thesis, we propose novel solutions to similarity learning problems on collaborative networks. Similarity learning is essential for modeling and predicting the evolution of collabo- rative networks. In addition, similarity learning is used to perform ranking, which is the main component of recommender systems. Due to the the low cost of developing such collaborative networks, they grow very quickly, and therefore, our objective is to develop models that scale well to large networks.
The similarity measures proposed in this thesis make use of the global link structure of the network and of the attributes of the nodes in a complementary way. We first define a random walk model, named Visiting Probability ( VP ), to measure proximity between two nodes in a graph. VP considers all the paths between two nodes collectively and thus reduces the effect of potentially unreliable individual links. Moreover, using VP and the structural characteristics of small-world networks (a frequent type of networks), we design scalable algorithms based on VP similarity. We then model the link structure of a graph within a similarity learning frame- work, in which the transformation of nodes to a latent space is trained using a discriminative model. When trained over VP scores, the model is able to better predict the relations in a graph in comparison to models learned directly from the network’s links.
Using the VP approach, we explain how to transfer knowledge from a hypertext encyclopedia to text analysis tasks. We consider the graph of Wikipedia articles with two types of links between them: hyperlinks and content similarity ones. To transfer the knowledge learned from the Wikipedia network to text analysis tasks, we propose and test two shared repre- sentation methods. In the first one, a given text is mapped to the corresponding concepts in the network. Then, to compute similarity between two texts, VP similarity is applied to compute the distance between the two sets of nodes. The second method uses the latent space model for representation, by training a transformation from words to the latent space over VP scores. We test our proposals on several benchmark tasks: word similarity, docu- ment similarity / clustering / classification, information retrieval, and learning to rank. The results are most often competitive compared to state-of-the-art task-specific methods, thus demonstrating the generality of our proposal. These results also support the hypothesis that both types of links over Wikipedia are useful, as the improvement is higher when both are used.
In many collaborative networks, different link types can be used in a complementary way.
Congratulations to him.
For more details, please download the thesis here: Similarity Learning Over Large Collaborative Networks