String Similarity Measures


This C++ package can be used to compute the similarity between two different strings. The similarity measures included are
  • Cosine Similarity
  • Jaccard Similarity
  • Length-weighted Kernel
  • P-Spectrum Kernel
  • Levenshtein Similarity
The code also provides example scripts which can be used to compute similarity between strings in two different text files.
Here is a README which explains what is included and instructions for usage.

Click here to download the package

References
  1. Leslie, C. S., E. Eskin, and W. S. Noble. The spectrum kernel: A string kernel for svm protein classification. In Pacific Symposium on Biocomputing 2002, pp. 566-575.
  2. S. V. N. Vishwanathan and Alex Smola , Fast Kernels for String and Tree Matching. In NIPS 2004