CSTBank
Computational Linguistics and Information Retrieval (CLAIR)
University of Michigan
CSTBank Phase I
Characteristics of families
| Family | Source(s) | No. Clusters | Clustering method | Publicly available? |
| duc01 | DUC01 data | 60 | automatic | No |
| duc01trial | DUC01 sample data | 4 | automatic | No |
| duc02 | DUC02 data | 60 | automatic | No |
| duc03 | DUC03 data | 60 | automatic | No |
| hknews | HKNews corpus | 40 | automatic | No |
| manual | various online news agencies | 10 | manual | manual.tar.gz |
| manual2 | usenet groups | 2 | semi-manual | manual2.tar.gz |
| mds | online news agencies | 6 | manual | mds.tar.gz |
| nie | NewsInEssence | 50 | automatic | nie.tar.gz |
| novelty02 | TREC2002 Novelty Track | 53 | automatic | No |
| other | misc. | 1 | automatic | No |
| tdt-pilot | Topic Detection and Tracking pilot data | 25 | automatic | No |
| tdt2 | Topic Detection and Tracking 2 | 100 | automatic | No |