Skip to content

Commit

Permalink
fix: fix some bugs in feedback.py and refine the prompt (#292)
Browse files Browse the repository at this point in the history
* fix some bugs in feedback.py and refine the prompt

* fix a ci error
  • Loading branch information
WinstonLiyt authored Sep 22, 2024
1 parent da752ec commit d834052
Show file tree
Hide file tree
Showing 3 changed files with 12 additions and 5 deletions.
13 changes: 10 additions & 3 deletions rdagent/scenarios/kaggle/developer/feedback.py
Original file line number Diff line number Diff line change
Expand Up @@ -103,12 +103,19 @@ def generate_feedback(self, exp: Experiment, hypothesis: Hypothesis, trace: Trac
.render(scenario=self.scen.get_scenario_all_desc())
)

last_task_and_code = None
if trace.hist:
last_task_and_code = (
trace.hist[-1][1].experiment_workspace.data_description
if trace.hist[-1][0].action == "Feature engineering" or trace.hist[-1][0].action == "Feature processing"
else trace.hist[-1][1].experiment_workspace.model_description
)

# Prepare render dictionary
render_dict = {
"context": self.scen.get_scenario_all_desc(),
"last_hypothesis": trace.hist[-1][0] if trace.hist else None,
"last_task": trace.hist[-1][1] if trace.hist else None,
"last_code": self.get_model_code(trace.hist[-1][1]) if trace.hist else None,
"last_hypothesis": trace.hist[-1][0].hypothesis if trace.hist else None,
"last_task_and_code": last_task_and_code,
"last_result": trace.hist[-1][1].result if trace.hist else None,
"hypothesis": hypothesis,
"exp": exp,
Expand Down
1 change: 1 addition & 0 deletions rdagent/scenarios/kaggle/experiment/prompts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,7 @@ kg_feature_interface: |-
4. Ensure that the generation of new features does not drastically increase the number of columns, which can slow down data processing. For example, avoid creating pairwise interactions for all features, as this would lead to a quadratic increase in the number of columns.
5. Avoids raising a `ValueError` or any other exceptions that could interrupt the main program's flow. The code should not include checks that could potentially lead to a `ValueError`. Instead, focus on writing robust and fault-tolerant feature engineering functions that handle edge cases and missing data gracefully, without stopping the program.
6. Specific categories of features can be filtered, and processing can be applied to those categories. For example, normalization can be applied to float-type features, but such processing should not be done on one-hot encoded features.
7. You are participating in a Kaggle competition and need data engineering ideas that are small, efficient, and quick to execute. Your suggestions should avoid unnecessary complexity or excessive processing time. Focus on delivering concise, impactful transformations or preprocessing steps that improve model performance with minimal resource usage. Please suggest clear, targeted approaches that can be implemented and tested rapidly.
kg_model_interface: |-
Your code should contain several parts:
Expand Down
3 changes: 1 addition & 2 deletions rdagent/scenarios/kaggle/prompts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -229,8 +229,7 @@ feature_selection_feedback_generation:
{% if last_hypothesis %}
Last Round Information:
Hypothesis: {{last_hypothesis.hypothesis}}
Task: {{last_task}}
Code Implemented: {{last_code}}
Last Task and Code: {{last_task_and_code}}
Result: {{last_result}}
{% else %}
This is the first round. No previous information available. As long as the performance is not too negative (e.g., ICIR is greater than 0), treat it as successful. Do not set the threshold too high.
Expand Down

0 comments on commit d834052

Please sign in to comment.