test image test image CoSQL 1.0 test image test image

A Conversational Text-to-SQL Challenge
Towards Cross-Domain Natural Language Interfaces to Databases

What is CoSQL?

CoSQL is a corpus for building cross-domain Conversational text-to-SQL systems. It is the dilaogue version of the Spider and SParC tasks. CoSQL consists of 30k+ turns plus 10k+ annotated SQL queries, obtained from a Wizard-of-Oz collection of 3k dialogues querying 200 complex databases spanning 138 domains. Each dialogue simulates a real-world DB query scenario with a crowd worker as a user exploring the database and a SQL expert retrieving answers with SQL, clarifying ambiguous questions, or otherwise informing of unanswerable questions.
CoSQL Paper (EMNLP'19) CoSQL Post
Related challenges: single-turn Spider and multi-turn SParC text-to-SQL tasks Spider Chanllenge (EMNLP'18) SParC Chanllenge (ACL'19)

News

Why CoSQL?

CoSQL introduces new challenges compared to existing task-oriented dialogue tasks:
  • the dialogue states are grounded in domain-independent SQL program instead of domain-specific slot-value pairs.
  • because testing is done on unseen databases, success requires generalizing to new domains.

Compared to other semantic parsing/text-to-SQL tasks, CoSQL presents new challenges:
  • user questions are not necessarily answerable.
  • it involves system responses to clarify ambiguous questions, verify returned results, and notify users of unanswerable or unrelated questions.
  • each dialog is obtained via the Wizard-of-Oz setting between a crowd worker and a SQL expert.

CoSQL includes three tasks:
  • SQL-grounded dialogue state tracking to map user utterances into SQL queries if possible given the interaction history
  • natural language response generation based on an executed SQL and its results for user verification
  • user dialogue act prediction to detect and resolve ambiguous and unanswerable questions

Getting Started

The data is split into training, development, and unreleased test sets. Download a copy of the dataset (distributed under the CC BY-SA 4.0 license):

CoSQL Dataset Details of baseline models and evaluation script can be found on the following GitHub site: CoSQL GitHub Page

Once you have built a model that works to your expectations on the dev set, you can submit it to get official scores on the dev and a hidden test set. To preserve the integrity of test results, we do not release the test set to the public. Instead, we request you to submit your model so that we can run it on the test set for you. Here's a tutorial walking you through official evaluation of your model:

Submission Tutorial

Data Examples

Some examples look like the following:

test image

Have Questions or Want to Contribute ?

Ask us questions at our Github issues page or contact Tao Yu, Rui Zhang, or Xi Victoria Lin.

We expect the dataset to evolve. We would greatly appreciate it if you could donate us your non-private databases or SQL queries for the project.

Acknowledgement

We thank Pranav Rajpurkar for giving us the permission to build this website based on SQuAD.

Part of our CoSQL team at YINS:

test image

Leaderboard - SQL-grounded Dialogue State Tracking

In CoSQL, user dialogue states are grounded in SQL queries. Dialogue state tracking (DST) in this case is to predict the correct SQL query for each user utterance with INFORM_SQL label given the interaction context and the DB schema. Comparing to other context-dependent text-to-SQL tasks such as SParC, the DST task in CoSQL also includes the ambiguous questions if the user affirms the system clarification of them. In this case, the system clarification is also given as part of the interaction context to predict the SQL query corresponding to the question. As in Spider and SParC tasks, we report results of Exact Set Match without Values here.

Rank Model Question Match Interaction Match

1

Aug 30, 2019
EditSQL

Yale University & Salesforce Research

(Zhang et al. EMNLP '19) code
40.8 13.7

2

Aug 30, 2019
CD-Seq2Seq

Yale University & Salesforce Research

(Yu et al. EMNLP '19) code
13.9 2.6

3

Aug 30, 2019
SyntaxSQL-con

Yale University

(Yu et al. EMNLP '18) code
14.1 2.2

Leaderboard - Response Generation from SQL and Query Results

This task requires generating a natural language description of the SQL query and the result for each system response labeled as INFORM_SQL. It considers a SQL query, the execution result, and the DB schema. Preserving logical consistency (Logic Correctness Rate (LCR)) between SQL and NL response is crucial in this task, in addition to naturalness and syntactical correctness.

Rank Model BLEU Grammar LCR (%)

1

Aug 30, 2019
Template baseline 9.3 4.0 41.0

2

Aug 30, 2019
Pointer-generator baseline 15.1 3.6 35.0

3

Aug 30, 2019
Seq2Seq baseline 14.1 3.5 27.0

Leaderboard - User Dialogue Act Prediction

For a real-world DB querying dialogue system, it has to decide if the user question can be mapped to a SQL query or if special actions are needed. We define a series of dialogue acts for the DB user and the SQL expert (refer to the paper for more details). For example, if the user question can be answered by a SQL query, the dialogue act of the question is INFORM_SQL.

Rank Model Accuracy

1

Aug 30, 2019
TBCNN-pair baseline 83.9

2

Aug 30, 2019
Majority baseline 62.8