fix: fix some bugs in feedback.py and refine the prompt (#292)

* fix some bugs in feedback.py and refine the prompt * fix a ci error
microsoft · Sep 22, 2024 · d834052 · d834052
1 parent da752ec
commit d834052
Show file tree

Hide file tree

Showing 3 changed files with 12 additions and 5 deletions.
diff --git a/rdagent/scenarios/kaggle/developer/feedback.py b/rdagent/scenarios/kaggle/developer/feedback.py
@@ -103,12 +103,19 @@ def generate_feedback(self, exp: Experiment, hypothesis: Hypothesis, trace: Trac
             .render(scenario=self.scen.get_scenario_all_desc())
         )
 
+        last_task_and_code = None
+        if trace.hist:
+            last_task_and_code = (
+                trace.hist[-1][1].experiment_workspace.data_description
+                if trace.hist[-1][0].action == "Feature engineering" or trace.hist[-1][0].action == "Feature processing"
+                else trace.hist[-1][1].experiment_workspace.model_description
+            )
+
         # Prepare render dictionary
         render_dict = {
             "context": self.scen.get_scenario_all_desc(),
-            "last_hypothesis": trace.hist[-1][0] if trace.hist else None,
-            "last_task": trace.hist[-1][1] if trace.hist else None,
-            "last_code": self.get_model_code(trace.hist[-1][1]) if trace.hist else None,
+            "last_hypothesis": trace.hist[-1][0].hypothesis if trace.hist else None,
+            "last_task_and_code": last_task_and_code,
             "last_result": trace.hist[-1][1].result if trace.hist else None,
             "hypothesis": hypothesis,
             "exp": exp,

diff --git a/rdagent/scenarios/kaggle/experiment/prompts.yaml b/rdagent/scenarios/kaggle/experiment/prompts.yaml
@@ -114,6 +114,7 @@ kg_feature_interface: |-
   4. Ensure that the generation of new features does not drastically increase the number of columns, which can slow down data processing. For example, avoid creating pairwise interactions for all features, as this would lead to a quadratic increase in the number of columns.
   5. Avoids raising a `ValueError` or any other exceptions that could interrupt the main program's flow. The code should not include checks that could potentially lead to a `ValueError`. Instead, focus on writing robust and fault-tolerant feature engineering functions that handle edge cases and missing data gracefully, without stopping the program.
   6. Specific categories of features can be filtered, and processing can be applied to those categories. For example, normalization can be applied to float-type features, but such processing should not be done on one-hot encoded features.
+  7. You are participating in a Kaggle competition and need data engineering ideas that are small, efficient, and quick to execute. Your suggestions should avoid unnecessary complexity or excessive processing time. Focus on delivering concise, impactful transformations or preprocessing steps that improve model performance with minimal resource usage. Please suggest clear, targeted approaches that can be implemented and tested rapidly.
 
 kg_model_interface: |-
   Your code should contain several parts:

diff --git a/rdagent/scenarios/kaggle/prompts.yaml b/rdagent/scenarios/kaggle/prompts.yaml
@@ -229,8 +229,7 @@ feature_selection_feedback_generation:
     {% if last_hypothesis %} 
     Last Round Information:
     Hypothesis: {{last_hypothesis.hypothesis}}
-    Task: {{last_task}}
-    Code Implemented: {{last_code}}
+    Last Task and Code: {{last_task_and_code}}
     Result: {{last_result}}
     {% else %}
     This is the first round. No previous information available. As long as the performance is not too negative (e.g., ICIR is greater than 0), treat it as successful. Do not set the threshold too high.