Skip to content

Commit f2eabc1

Browse files
ritugalaneubig
andauthored
Merging Dataset Reranking changes(#383)
* dataset reranking changes, no verify * added reranking.csv * latest changes for reranking * added tests * minor changes * seprating changes for dataset index file creation * minor fixes to the prompt * changes for using dataset index file * Remove redundant file * undo minor testing change * PR changes * final changes * fixing CI test * Added reranking dataset index to gitignore * removed print stmts * updated gitignore * Update prompt2model/dataset_retriever/description_dataset_retriever.py Co-authored-by: Graham Neubig <[email protected]> * Update prompt2model/utils/parse_responses.py Co-authored-by: Graham Neubig <[email protected]> * Update scripts/dataset_index/retrieve_dataset_info.py Co-authored-by: Graham Neubig <[email protected]> * Update scripts/dataset_index/retrieve_dataset_info.py Co-authored-by: Graham Neubig <[email protected]> * requested review changes * lint changes --------- Co-authored-by: Graham Neubig <[email protected]>
1 parent 947b636 commit f2eabc1

12 files changed

+1337
-236
lines changed

Diff for: .gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ cached_generated_dataset/
1919
generated_dataset/
2020
huggingface_data/huggingface_datasets/dataset_index.json
2121
huggingface_data/huggingface_datasets/huggingface_datasets_datafinder_index
22+
huggingface_data/huggingface_datasets/reranking_dataset_index.json
2223
huggingface_data/huggingface_models/
2324
retrieved_dataset_dict/
2425
status.yaml

Diff for: prompt2model/dataset_retriever/column_selection_prompt.py

+1-16
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,6 @@
22

33
from __future__ import annotations # noqa FI58
44

5-
import json
6-
75
METAPROMPT_BASE = """Your objective is to carefully analyze the task and the dataset mentioned, and decide whether the columns are relevant input, relevant output, irrelevant for the given task, or if it is ambiguous. There should be at most one output column. It is possible to have no relevant columns, in which case return the input and output column as empty lists. Answer in a json format, with the following keys: input, output, irrelevant, ambiguous""" # noqa: E501
86
METAPROMPT_EXAMPLES = [
97
(
@@ -90,19 +88,6 @@
9088
ENDING_LINE = "After seeing these examples with the required columns, please provide the relevant columns for this context:" # noqa: E501
9189

9290

93-
def truncate_row(example_row: dict, max_length=50) -> str:
94-
"""Truncate the row before displaying if it is too long."""
95-
truncated_row = {}
96-
for key in example_row.keys():
97-
curr_row = json.dumps(example_row[key])
98-
truncated_row[key] = (
99-
curr_row
100-
if len(curr_row) <= max_length - 3
101-
else curr_row[:max_length] + "..."
102-
)
103-
return json.dumps(truncated_row)
104-
105-
10691
def build_input(
10792
instruction: str,
10893
dataset_name: str,
@@ -116,7 +101,7 @@ def build_input(
116101
dataset_name=dataset_name,
117102
dataset_description=dataset_description,
118103
dataset_columns=dataset_columns,
119-
sample_row=truncate_row(sample_row),
104+
sample_row=sample_row,
120105
)
121106
input_prompt = SINGLE_DEMONSTRATION_TEMPLATE.format(
122107
prompt=input_prompt, columns=""

0 commit comments

Comments
 (0)