RoSE 🌹 Benchmark

RoSE can be downloaded with Hugging Face Datasets under Salesforce/rose.

RoSE benchmark contains system outputs annotated with our ACU protocol. It contains four parts:

We summarize the statistics below.

Dataset	Split	#Doc.	#Sys.	#Total Summ.	HF Name
CNNDM	Test	500	12	6000	`cnndm_test`
CNNDM	Validation	1000	8	8000	`cnndm_validation`
XSum	Test	500	8	4000	`xsum`
SamSum	Test	500	8	4000	`samsum`

We have system outputs annotated with four different human evaluation protocols in total. We summarize them below.

Protocol	w/ Input Document	w/ Reference Summary	Fine-grained
Prior	✗	✗	✗
Ref-free	✓	✗	✗
Ref-based	✗	✓	✗
ACU	✗	✓	✓

We annotated two sets of system summaries.

Summaries of 12 fine-tuned systems. The huggingface data split name is cnndm_protocol.
Zero-shot summaries from large langauge models (GPT3, T0), together with summaries from BRIO and BART. The huggingface data split name is cnndm_protocol_gpt3.

ACU Explorer