README We have manually annotated all the publications in ACL 2005-2008 based on session information. DATASET CREATION We compiled session information from three different conferences: COLING, ACL and EMNLP from 2005-2008 and merged semantically sessions. For example, we merged "Text Categorization" with "Text Classification". Below is a list of the 31 sessions we chose. Applications Coreference Corpus Annotation Discourse and Dialogue Generation Grammar Induction Grammars Inference and Entailment Information Extraction Information Retrieval Lexical Acquisition from Corpora Lexical Issues Machine Learning and Statistical Methods Machine Translation Morphology Multimodality and Situated Language Processing Named Entity Parsing Question Answering Segmentation Semantic Role Labeling Semantics Sentiment and Opinion Speech and Language Modeling Speech Processing Summarization Tagging Text Classification Topic Modeling Web Corpora Word Sense Disambiguation We manually annotated 383 publications from ACL 2005-2008 into the above 31 different classes. FILES INCLUDED acl_topics.txt: The file contains three fields on each line. The first field contains the class name (session name) while the second field contains the ACL id. The last field contains the title of the paper. text_content: This directory contains the text files of all the papers included in this data set.