Clairlib |
MEAD |
MEAD Evaluation add-on |
AAN: The AAN corpus includes three networks, paper citation, author citation and auth or collaboration. The paper citation network (paper-citation-network.txt) is a directed network composed of nodes labeled with paper ids which correspond to in dividual papers (acl-metadata.txt). The author citation network (author-citation-network.txt), a directed network, is compiled from the paper network and the metadata file. For each citation in the paper network, where paper A cites paper B, and for each author in paper A, an edge is created for that author to each author in paper B. The author collaboration network (author-collaboration-network.txt), an undirected network, is composed of authors where, for each paper in t he paper citation network, an edge is created between each collaborator for that paper.Download |
CSTBank: Cross-document Structure Theory Bank Download |
Surveyor: paper collection Download |
Cartoons: data set Download |
CreateDebate: data set Download |
Similarity: data set Download |
FRAUD: CLAIR collection of fraud email Download |
SUMMBank: a collection of summaries used in the JHU workshop in 2001Download |
String Similarity Measures A C++ package for computing similarity between strings. The package supports the following similarity measures
|
Node Similarity Measures A C++ library for computing similarity between nodes in a graph. The library supports the following similarity measures
|
Relational Classification Dataset
|
Publication Classification
|
Near Duplicate Detection A C++ package for detecting near-duplicate documents in a large corpus |