README Node similarity measures refers to computing similarity between two objects when represented using graphs. The objects are represented as nodes and an edge represents a relation between the objects. This package implements node similarity algorithms described in the following papers 1. G. Jeh and J. Widom. SimRank: a measure of structural-context similarity. In KDD'02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 538-543. ACM Press, 2002. 2. David Harel , Yehuda Koren, Clustering spatial data using random walks, Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, p.281-286, August 26-29, 2001, San Francisco, California INSTALLATION We have included a Makefile along with the package and the code does not have any external dependencies. Therefore the code can be compiled using the make command at the terminal prompt. EXECUTION You can run the code to compute random walk based similarity as follows ./random_walk_similarity input_graph_file k The SimRank similarity can be computed as follows ./simrank input_graph_file num_iterations INPUT FORMAT The first argument to both programs is the input graph and it should be formatted as follows Number of vertices should be mentioned on the first line, followed by edges on each line. Each edge should be formatted as Node_label1\tNode_label2\tWeight The second parameter to the random walk similarity measure, K specifies the length of the random walk to be performed. The second parameter to the SimRank similarity measure, num_iterations specifies the number of iterations to run the simrank equations. OUTPUT FORMAT The code outputs the similarity values between the nodes as follows Node_label1tNode_label2tSimilarity_score BRIEF EXPLANATION SimRank SimRank is a general similarity measure, based on a simple and intuitive graph-theoretic model. SimRank is applicable in any domain with object-to-object relationships, that measures similarity of the structural context in which objects occur, based on their relationships with other objects. Effectively, SimRank is a measure that says "two objects are similar if they are related to similar objects." Random Walk Similarity neighborhoods The Harel and Koren's similarity measure works using a fairly simple idea using random walks. Starting from node v, we perform a random walk for k steps and probability of reaching node i during the random walk. Similarly, we start a random walk from node u, and obtain the vector of probabilities p_u[i]. Now the similarity between u and v is computed as the similarity between these two vectors. To compute the vector similarity, this is the suggested function in their paper. sim(u,v) = e^(2k- sum(abs(p_v[i] - p_u[i]))) - 1