Track-SQL: Enhancing Generative Language Models with Dual-Extractive Modules for Schema and Context Tracking in Multi-turn Text-to-SQL

NAACL 2025 Submission

Project Overview

Track-SQL is a novel framework designed to boost multi-turn Text-to-SQL performance by integrating dual-extractive modules for precise schema and context tracking. Built on state-of-the-art generative language models, this work addresses key challenges in multi-turn dialogue scenarios—including schema confusion and context drift—through structured module design and optimized training pipelines.

Environment Setup

Hardware Requirements

GPU: ≥ 2 × NVIDIA RTX 3090/4090 (24GB VRAM) for training; ≥ 1 × NVIDIA RTX 3090/4090 for inference
CUDA Version: 11.8 (compatible with PyTorch 2.1.0)
CPU: Intel Xeon Platinum 83xx series or equivalent; ≥ 64GB RAM
OS: Ubuntu 20.04 LTS / CentOS Linux 7

Software Installation

Create a dedicated Conda environment and install dependencies:

# Create and activate environment
conda create -n track-sql python=3.10
conda activate track-sql

# Install PyTorch with CUDA 11.8 support
pip install torch==2.1.0+cu118 torchvision==0.16.0+cu118 torchaudio==2.1.0+cu118 --index-url https://download.pytorch.org/whl/cu118

# Install additional dependencies
python -m pip install -r requirements.txt

# Optional: Install Flash-Attention for memory efficiency (requires CUDA ≥ 11.7)
pip install flash-attn==2.0.0.post1

Dataset Preparation

We evaluate Track-SQL on two standard multi-turn Text-to-SQL benchmarks: SparC and CoSQL. Follow the steps below to download and preprocess the data.

1. Create Data Directories

mkdir -p data raw_data preprocessed_data metrics/ts_database

2. Download Benchmark Datasets

Dataset	Download Link	Notes
SparC	Google Drive	Multi-turn cross-domain SQL dataset
CoSQL	Google Drive	Conversational SQL dataset with execution feedback
Test-Suite Databases	Google Drive	Evaluation databases for SQL execution testing

3. Extract Datasets

# Move downloaded files to raw_data directory
mv sparc.zip cosql_dataset.zip testsuitedatabases.zip raw_data/

# Extract SparC and CoSQL
unzip raw_data/sparc.zip -d data/
unzip raw_data/cosql_dataset.zip -d data/

# Extract test-suite databases
unzip raw_data/testsuitedatabases.zip -d metrics/ts_database/

Usage Instructions

Follow the sequential steps below to reproduce the results of Track-SQL. All scripts are provided in the preprocess/, train/, and inference/ directories.

Step 1: Data Preprocessing

Preprocess raw datasets (schema parsing, dialogue segmentation, and format conversion) for model training:

# Preprocess data for Schema Tracking Module 
sh preprocess/scripts/preprocess_sic/sparc.sh
sh preprocess/scripts/preprocess_sic/cosql.sh

# Preprocess data for SQL generation model
sh preprocess/scripts/preprocess_text2sql/sparc.sh
sh preprocess/scripts/preprocess_text2sql/cosql.sh

Step 2: Train Schema Tracking Module

Train the extractive schema filter to identify relevant table/column items for each dialogue turn:

# Create checkpoint directory
mkdir -p ckpts/stm

# Train STM on SparC and CoSQL
sh preprocess/scripts/train_sic/sparc.sh
sh preprocess/scripts/train_sic/cosql.sh

Trained checkpoints will be saved to ckpts/stm/.

Step 3: Schema Filtering Inference

Apply the trained STM to filter schemas for the SQL generation model:

sh preprocess/scripts/infer_sic/sparc.sh
sh preprocess/scripts/infer_sic/cosql.sh

Filtered data will be saved to preprocessed_data/.

Step 4: Train Track-SQL (SQL Generation Model)

Train the full Track-SQL framework with dual-extractive modules:

# Create directory for model checkpoints
mkdir -p ckpts/track-sql

# Train on SparC and CoSQL
sh train/scripts/sparc.sh
sh train/scripts/cosql.sh

Step 5: Inference and Evaluation

Generate SQL predictions and evaluate performance using official metrics (Exact Match, Execution Accuracy):

# Inference on test sets
sh inference/scripts/sparc.sh
sh inference/scripts/cosql.sh

License

This project is licensed under the Apache-2.0 License. See the LICENSE file for more details.

Acknowledgments

We thank the authors of the following projects for their open-source contributions:

RUCKBReasoning/codes (schema filtering and training pipelines)
Yale-Lily/SparC and Yale-Lily/CoSQL (benchmark datasets)
Hugging Face Transformers and DeepSpeed (training frameworks)

Contact Information

For questions or issues, please open a GitHub issue or contact the corresponding author:

Author: [Shaobin Shi]
Email: [[email protected]]
Project Repository: [GitHub Link]

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
img		img
inference		inference
metrics		metrics
preprocess		preprocess
train		train
utils		utils
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Track-SQL: Enhancing Generative Language Models with Dual-Extractive Modules for Schema and Context Tracking in Multi-turn Text-to-SQL

Project Overview

Environment Setup

Hardware Requirements

Software Installation

Dataset Preparation

1. Create Data Directories

2. Download Benchmark Datasets

3. Extract Datasets

Usage Instructions

Step 1: Data Preprocessing

Step 2: Train Schema Tracking Module

Step 3: Schema Filtering Inference

Step 4: Train Track-SQL (SQL Generation Model)

Step 5: Inference and Evaluation

License

Acknowledgments

Contact Information

About

Uh oh!

Releases

Packages

Uh oh!

Languages

DMIRLAB-Group/Track-SQL

Folders and files

Latest commit

History

Repository files navigation

Track-SQL: Enhancing Generative Language Models with Dual-Extractive Modules for Schema and Context Tracking in Multi-turn Text-to-SQL

Project Overview

Environment Setup

Hardware Requirements

Software Installation

Dataset Preparation

1. Create Data Directories

2. Download Benchmark Datasets

3. Extract Datasets

Usage Instructions

Step 1: Data Preprocessing

Step 2: Train Schema Tracking Module

Step 3: Schema Filtering Inference

Step 4: Train Track-SQL (SQL Generation Model)

Step 5: Inference and Evaluation

License

Acknowledgments

Contact Information

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages