SParC 1.0 test image

Yale & Salesforce Semantic Parsing and Text-to-SQL in Context Challenge

What is SParC?

Nov 12, 2024: We have released Spider 2.0 full paper, data and code. Follow the guideline to submit your scores to the leaderboard!

Aug 28, 2024: The early access version of Spider 2.0 (a more realistic and challenging text-to-SQL task) is now available! We expect to release the whole dataset in 1-2 weeks. As this is a preliminary release, there may be errors. Your feedback would be invaluable in refining the dataset!

SParC is a dataset for cross-domain Semantic Parsing in Context. It is the context-dependent/multi-turn version of the Spider task, a complex and cross-domain text-to-SQL challenge. SParC consists of 4,298 coherent question sequences (12k+ unique individual questions annotated with SQL queries annotated by 14 Yale students), obtained from user interactions with 200 complex databases over 138 domains.
XLANG Lab for Building LLM/VLM Agents SParC Paper (ACL'19) SParC Post
Related Works from XLANG Lab: Spider 2.0 Text-to-SQL ('24) Spider2-V ('24) OSWorld ('24) DS-1000 Challenge (ICML'23) Binder Framework (ICLR '23) UnifiedSKG Framework (EMNLP'22) Spider Chanllenge (EMNLP'18) CoSQL Chanllenge (EMNLP'19)

News

Why SParC?

SParC is built upon the Spider dataset. Comparing to other existing context-dependent semantic parsing/text-to-SQL datasets such as ATIS, it demonstrates:
  • complex contextual dependencies (annotated by 15 Yale computer science students)
  • has greater semantic diversity due to complex coverage of SQL logic patterns in the Spider dataset.
  • requires generalization to new domains due to its cross-domain nature and the unseen databasest time.

Getting Started

The data is split into training, development, and unreleased test sets. Download a copy of the dataset (distributed under the CC BY-SA 4.0 license):

SParC Dataset Details of baseline models and evaluation script can be found on the following GitHub site: SParC GitHub Page

Once you have built a model that works to your expectations on the dev set, you can submit it to get official scores on the dev and a hidden test set. To preserve the integrity of test results, we do not release the test set to the public. Instead, we request you to submit your model so that we can run it on the test set for you. Here's a tutorial walking you through official evaluation of your model:

Submission Tutorial

Data Examples

Some examples look like the following:

test image Another example: test image

Have Questions or Want to Contribute ?

Ask us questions at our Github issues page or contact Tao Yu, Rui Zhang, or Xi Victoria Lin.

We expect the dataset to evolve. We would greatly appreciate it if you could donate us your non-private databases or SQL queries for the project.

Acknowledgement

We thank Tianze Shi and the anonymous reviewers for their precious comments on this project and Melvin Gruesbeck for designing the nice example illustrations. Also, we thank Pranav Rajpurkar for giving us the permission to build this website based on SQuAD. .

Part of our SParC team at YINS:

test image

Leaderboard - Execution with Values

Our current models do not predict any value in SQL conditions so that we do not provide execution accuracies. However, we encourage you to provide it in the future submissions. For value prediction, your model should be able to 1) copy from the question inputs, 2) retrieve from the database content (database content is available), or 3) generate numbers (e.g. 3 in "LIMIT 3").

Rank Model Question Match Interaction Match

1

Jun 4, 2022
RASAT + PICARD

SJTU LUMIA & Netmind.AI

(Qi et al., EMNLP'22) code
74.0 52.6

2

May 24, 2020
TreeSQL V2 + BERT

Anonymous

48.5 21.6

3

May 21, 2020
GAZP + BERT

University of Washington & Facebook AI Research

(Zhong et al., EMNLP '20)
44.6 19.7

Leaderboard - Exact Set Match without Values

For exact matching evaluation, instead of simply conducting string comparison between the predicted and gold SQL queries, we decompose each SQL into several clauses, and conduct set comparison in each SQL clause. Please refer to the paper and the Github page for more details.

Rank Model Question Match Interaction Match

1

Feb 14, 2022
STAR

Alibaba DAMO & SIAT

(Cai and Li et al., EMNLP-Findings '22) code demo
67.4 46.6

2

Jun 4, 2022
RASAT + PICARD

SJTU LUMIA & Netmind.AI

(Qi et al., EMNLP'22) code
67.7 45.2

3

Apr 27, 2022
CQR-SQL

Tencent Cloud Xiaowei

(Xiao et al.,'22)
68.2 44.4

4

Oct 8, 2021
RAT-SQL-TC + GAP

Meituan & PKU

(Li et al.,'21)
65.7 43.2

5

Oct 18, 2021
HIE-SQL + GraPPa

Alibaba DAMO

(Zheng et al. ACL-Findings '22)
64.6 42.9

6

Sep. 21, 2020
RAT-SQL + SCoRe

Yale & Microsoft Research & PSU

(Yu et al. ICLR '21)
62.4 38.1

7

Oct 21, 2020
WaveSQL+BERT

Anonymous

58.7 33.3

8

July 08, 2020
R²SQL + BERT

Alibaba DAMO

(Hui et al. AAAI '21) code
55.8 30.8

9

May 26, 2020
IGSQL + BERT

Peking University

(Cai et al. EMNLP '20) code
51.2 29.5

10

Jun. 02, 2020
MIE + BERT

Anonymous

49.6 27.1

11

May 04, 2020
SubTreeSQL + BERT

Anonymous

47.4 25.5

12

Sep 1, 2019
EditSQL + BERT

Yale University & Salesforce Research

(Zhang et al. EMNLP '19) code
47.9 25.3

13

May 03, 2020
TreeSQL V2 + BERT

Anonymous

48.1 25.0

14

May 22, 2020
MH-LTA + BERT

Anonymous

48.5 24.7

15

Jan 15, 2020
TreeSQL + BERT

Anonymous

46.3 24.3

16

May 21, 2020
GAZP + BERT

University of Washington & Facebook AI Research

(Zhong et al., EMNLP '20)
45.9 23.5

17

Feb 13, 2020
ConcatSQL + BERT

Anonymous

46.3 22.4

18

Apr 21, 2021
MemCE

UoE

(Jain et al., TACL '21)
40.3 16.7

19

Feb 13, 2020
ConcatSQL

Anonymous

39.0 16.3

20

Dec 13, 2019
GuideSQL

Anonymous

34.4 13.1

21

May 17, 2019
CD-Seq2Seq

Yale University & Salesforce Research

(Yu et al. ACL '19) code
23.2 7.5

22

May 17, 2019
SyntaxSQL-con

Yale University

(Yu et al. EMNLP '18) code
20.2 5.2