Relational Classification Dataset |
|
This classification dataset contains 380 scientific publications from AAN manually classified into three research areas ("Machine Translation", "Dependency Parsing" and "Summarization"). This is a relational dataset since we have included metadata information
for the papers which includes citation information, authorship information, venue information and year of publication.
Here is a description of the files included. aan_mds | |-----metadata.txt Contains the id, title, authorship, venue and the class information for all the papers. | |-----papers_text This directory contains the full text of the 380 papers. We obtained this text by converting | the PDF of the paper to text using PDFBox. | |-----citations.txt The file contains citations between ALL the papers in the AAN data set not just the citations between the 380 papers in the dataset. This is because many link/citation similarity measures like cocitation or coupling compute similarity between two papers using citations between other papers. Here is a complete README which explains the selection process for the publications, annotation process and the format of the different files. Click here to download this data set. Papers that have used this dataset
|