Skip to content

Commit 6e2799c

Browse files
authored
Merge branch 'main' into vijay_create_daily_ci
2 parents acc64eb + d4cd23d commit 6e2799c

23 files changed

+1025
-878
lines changed

.gitignore

+2-1
Original file line numberDiff line numberDiff line change
@@ -22,8 +22,9 @@ huggingface_data/huggingface_datasets/huggingface_datasets_datafinder_index
2222
huggingface_data/huggingface_datasets/reranking_dataset_index.json
2323
huggingface_data/huggingface_models/
2424
retrieved_dataset_dict/
25+
result/
26+
checkpoint/
2527
status.yaml
26-
2728
# Outputs generated by the colab demo
2829
trained_model/
2930
trained_tokenizer/

README.md

+16-1
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,9 @@ If you're interested in contributing to the `prompt2model` project, please
9595

9696
We have [written a paper describing Prompt2Model in detail](https://arxiv.org/abs/2308.12261).
9797

98-
If you use Prompt2Model in your research, please cite our paper:
98+
If you use Prompt2Model in your research, please cite us!
99+
100+
If you discuss or use the overall prompt2model framework, please reference
99101

100102
```bibtex
101103
@misc{prompt2model,
@@ -107,3 +109,16 @@ If you use Prompt2Model in your research, please cite our paper:
107109
primaryClass={cs.CL}
108110
}
109111
```
112+
113+
If you discuss or use our dataset retrieval and transformation tools, please reference
114+
115+
```bibtex
116+
@misc{prompt2modeldatatune,
117+
title={Better Synthetic Data by Retrieving and Transforming Existing Datasets},
118+
author={Saumya Gandhi and Ritu Gala and Vijay Viswanathan and Tongshuang Wu and Graham Neubig},
119+
year={2024},
120+
eprint={2404.14361},
121+
archivePrefix={arXiv},
122+
primaryClass={cs.CL}
123+
}
124+
```

examples/create_transform_data_example.py

+5-4
Original file line numberDiff line numberDiff line change
@@ -33,12 +33,13 @@
3333

3434
# run this pipeline to retrieve relevant datasets, rerank them,
3535
# and transform them based on the prompt
36-
retriever = DescriptionDatasetRetriever()
37-
num_points_to_transform = 20
36+
total_num_points_to_transform = 20
37+
retriever = DescriptionDatasetRetriever(
38+
auto_transform_data=True,
39+
total_num_points_to_transform=total_num_points_to_transform,
40+
)
3841
retrieved_dataset_dict = retriever.retrieve_dataset_dict(
3942
prompt_spec,
40-
auto_transform_data=True,
41-
num_points_to_transform=num_points_to_transform,
4243
)
4344

4445
# save the final dataset to disk

examples/huggingface_data/huggingface_datasets/dataset_index.json

-1
This file was deleted.

examples/huggingface_data/huggingface_datasets/reranking_dataset_index.json

-1
This file was deleted.
Binary file not shown.

0 commit comments

Comments
 (0)