Computational Linguistics and Information Retrieval (CLAIR)
University of Michigan
CSTBank Phase I
Characteristics of families
Family | Source(s) | No. Clusters | Clustering method | Publicly available? |
duc01 | DUC01 data | 60 | automatic | No |
duc01trial | DUC01 sample data | 4 | automatic | No |
duc02 | DUC02 data | 60 | automatic | No |
duc03 | DUC03 data | 60 | automatic | No |
hknews | HKNews corpus | 40 | automatic | No |
manual | various online news agencies | 10 | manual | manual.tar.gz |
manual2 | usenet groups | 2 | semi-manual | manual2.tar.gz |
mds | online news agencies | 6 | manual | mds.tar.gz |
nie | NewsInEssence | 50 | automatic | nie.tar.gz |
novelty02 | TREC2002 Novelty Track | 53 | automatic | No |
other | misc. | 1 | automatic | No |
tdt-pilot | Topic Detection and Tracking pilot data | 25 | automatic | No |
tdt2 | Topic Detection and Tracking 2 | 100 | automatic | No |