test image

Downloads and Demos from Michigan

Downloads

AAN previous version
The AAN corpus includes three networks, paper citation, author citation and author collaboration. The paper citation network (paper-citation-network.txt) is a directed network composed of nodes labeled with paper ids which correspond to individual papers (acl-metadata.txt). The author citation network (author-citation-network.txt), a directed network, is compiled from the paper network and the metadata file. For each citation in the paper network, where paper A cites paper B, and for each author in paper A, an edge is created for that author to each author in paper B. The author collaboration network (author-collaboration-network.txt), an undirected network, is composed of authors where, for each paper in the paper citation network, an edge is created between each collaborator for that paper.
Cartoons
Clairlib
The Clair library (i.e. Clairlib) is a suite of open-source Perl modules intended to simplify a number of generic tasks in natural language processing (NLP), information retrieval (IR), and network analysis (NA). Its architecture also allows for external software to be plugged in with very little effort. (temporarily unavailable)
CreateDebate
Download
Cross-document Structure Theory Bank
FRAUD
CLAIR collection of fraud email.
MEAD
A prerequisite for all Clairlib versions (temporarily unavailable)
MEAD Evaluation add-on
An Evaluation Framekwork for Extractive Summarization (temporarily unavailable)
Near Duplicate Detection
A C++ package for detecting near-duplicate documents in a large corpus
Node Similarity Measures
A C++ library for computing similarity between nodes in a graph. The library supports the following similarity measures
  • SimRank
  • Random walk based similarity measure

Publication Classification
Contains 383 papers manually classified into 31 research areas using session information.
Reference Scope Identification in Citing Sentences
Download
Relational Classification Dataset
  • Contains 380 papers manually classified into the three research areas of Machine Translation, Dependency Parsing and Summarization.
  • Contains Authorship information, venue information, title and citation information for all the papers.

Similarity
Download
String Similarity Measures
A C++ package for computing similarity between strings. The package supports the following similarity measures
  • Cosine Similarity
  • Jaccard Similarity
  • Similarity based on Levenshtein Distance
  • P-Spectrum Kernel
  • Length-Weighted Kernel

SUMMBank
a collection of summaries used in the JHU workshop in 2001
Surveyor
Paper collection. (coming soon)

Demos

Lexical networks and lexical centrality
Graph-based semi-supervised learning