Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: refactor for general data science #498

Draft
wants to merge 155 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
155 commits
Select commit Hold shift + click to select a range
c437bd7
Draft new data science RDLoop
you-n-g Nov 26, 2024
c5c5d7f
Pushforward and refactor
you-n-g Nov 27, 2024
db87704
prompt T fix
XianBW Nov 27, 2024
98121f4
finish data loader template
WinstonLiyt Nov 27, 2024
a5a0d70
ci
WinstonLiyt Nov 27, 2024
9874fbb
fix Task __init__ name bug
XianBW Nov 27, 2024
cc94bd2
Merge branch 'ds_refactor' of github.com:microsoft/RD-Agent into ds_r…
XianBW Nov 27, 2024
a9b04ff
changes about source data in scenario
XianBW Nov 27, 2024
4d782c0
add more stubs
you-n-g Nov 27, 2024
69a861d
fix bug
XianBW Nov 28, 2024
87e8308
small changes
XianBW Nov 28, 2024
cc1dae1
update some docs
you-n-g Nov 28, 2024
ea5dc12
Interface
you-n-g Nov 29, 2024
fc4e063
some changes
XianBW Nov 29, 2024
55f3737
ds_model_initial_changes
TPLin22 Nov 29, 2024
abcf1cf
model base py files
XianBW Nov 29, 2024
e0f8042
CI
XianBW Nov 29, 2024
a4c6845
ds_model some changes in eval and exp
TPLin22 Dec 2, 2024
f1003b8
annotation
TPLin22 Dec 3, 2024
6d30f57
changes in model coder final evaluate
TPLin22 Dec 3, 2024
33e9a00
ds model test init
TPLin22 Dec 5, 2024
d04b280
remove value detection from data_science model evaluator
TPLin22 Dec 5, 2024
da8d0b7
data loader CoSTEER
XianBW Dec 6, 2024
9e8d74d
merge
XianBW Dec 6, 2024
b2a445c
ds model eval: init use gpt for shape evaluator
TPLin22 Dec 6, 2024
231bf91
Merge branch 'ds_refactor' of github.com:microsoft/RD-Agent into ds_r…
TPLin22 Dec 6, 2024
dea5605
refactor: Update data loader evaluation and execution logic
you-n-g Dec 6, 2024
e023791
redundance
XianBW Dec 9, 2024
bf36de8
split spec.md
XianBW Dec 9, 2024
9674c16
ds model test: init evolving strategy and unit test
TPLin22 Dec 9, 2024
06e2043
data science scenario changes
XianBW Dec 10, 2024
4f76a0b
data science base file
XianBW Dec 10, 2024
14325de
proposal related
XianBW Dec 10, 2024
7031610
proposal related
XianBW Dec 10, 2024
075aa9e
complete judge
XianBW Dec 11, 2024
e2133a0
some changes
XianBW Dec 11, 2024
ca7d785
simple readme for data loader costeer
XianBW Dec 11, 2024
9c89f4b
proposal related
XianBW Dec 12, 2024
3352c99
Draft for Bowen
you-n-g Dec 12, 2024
318e457
add property
XianBW Dec 12, 2024
a26bdb4
fix knowledgemn
XianBW Dec 12, 2024
3c0883d
fix feedback prompt
XianBW Dec 12, 2024
bb4a0a4
fix feedback bug
XianBW Dec 12, 2024
67be799
fix json_mode bug
XianBW Dec 12, 2024
d24ba79
feature processing
WinstonLiyt Dec 12, 2024
7d7cab8
fix execute data volume problem
XianBW Dec 12, 2024
67ec2b0
Merge branch 'ds_refactor' of github.com:microsoft/RD-Agent into ds_r…
XianBW Dec 12, 2024
57293b1
proposal related
XianBW Dec 12, 2024
3a746fa
hypothesis2experiment base
XianBW Dec 13, 2024
68e4c1f
only hypothesis gen and task gen
XianBW Dec 13, 2024
a570408
proposal related
XianBW Dec 13, 2024
3324601
exp_gen base code
XianBW Dec 13, 2024
8d73ea3
dependency_codes inject
XianBW Dec 13, 2024
00526ff
proposal completed(not test)
XianBW Dec 13, 2024
6980e1d
rewrite ds model evaluate
TPLin22 Dec 13, 2024
1b051e3
fix data bug
XianBW Dec 13, 2024
db09a45
Merge branch 'ds_refactor' of github.com:microsoft/RD-Agent into ds_r…
XianBW Dec 13, 2024
a37071e
small code refinement on conf and other
peteryang1 Dec 13, 2024
f2ed789
load data in ds model
TPLin22 Dec 13, 2024
9bbf9a3
fix some bugs
WinstonLiyt Dec 13, 2024
93b6656
a debug llm tool app
XianBW Dec 15, 2024
b283cb1
model task base_code added
XianBW Dec 16, 2024
1728a90
return code dict in ds model evolvingstrategy
TPLin22 Dec 16, 2024
a23731f
fix ds_scen description_template
WinstonLiyt Dec 16, 2024
1178618
redundent prompts.yaml
XianBW Dec 16, 2024
505cc4f
fix some bugs
TPLin22 Dec 16, 2024
9fba70c
Merge branch 'ds_refactor' of github.com:microsoft/RD-Agent into ds_r…
XianBW Dec 16, 2024
f18478d
feature test change
XianBW Dec 16, 2024
a108072
fix a bug
WinstonLiyt Dec 16, 2024
636a1d5
fix prompts.yaml
XianBW Dec 16, 2024
0c3881c
Merge branch 'ds_refactor' of github.com:microsoft/RD-Agent into ds_r…
XianBW Dec 16, 2024
d7b6ca4
remove feature test local path
XianBW Dec 16, 2024
5c4490f
refine the structure of scene
WinstonLiyt Dec 16, 2024
6cf001e
fix some bugs
WinstonLiyt Dec 16, 2024
e1abb6f
exp_gen change
XianBW Dec 17, 2024
f7dff55
Merge branch 'ds_refactor' of github.com:microsoft/RD-Agent into ds_r…
XianBW Dec 17, 2024
1379654
exp_gen change
XianBW Dec 17, 2024
8d1eca9
spec & workspace changes
XianBW Dec 17, 2024
8ccd32f
init for workflow
TPLin22 Dec 17, 2024
2e2d153
fix
TPLin22 Dec 17, 2024
6926e18
ds model fit for spec & workspace change
TPLin22 Dec 17, 2024
3118c67
improve data_loader_spec
WinstonLiyt Dec 17, 2024
b630f79
ds model eval for more cases
TPLin22 Dec 17, 2024
84cf4d4
refine prompts
WinstonLiyt Dec 17, 2024
81a427a
spec change
XianBW Dec 17, 2024
7fae309
Merge branch 'ds_refactor' of github.com:microsoft/RD-Agent into ds_r…
XianBW Dec 17, 2024
b7aad31
spell check
qew21 Dec 18, 2024
12f1217
refine ds modal for more cases: eval and es
TPLin22 Dec 18, 2024
e86aa73
update model template
TPLin22 Dec 18, 2024
cf5e18c
prompts for model and ensemble
WinstonLiyt Dec 18, 2024
81b27b4
fix a bug
WinstonLiyt Dec 18, 2024
dc8f71c
fix a bug
WinstonLiyt Dec 18, 2024
b6acea3
init: ds workflow evovingstrategy
TPLin22 Dec 18, 2024
7f70ce2
Adding ensemble (#505)
xisen-w Dec 18, 2024
62dbcf5
data science loop changes
XianBW Dec 18, 2024
3e240f3
merge pull
XianBW Dec 18, 2024
13fae9a
data science loop base
XianBW Dec 18, 2024
999d133
ds loop feedback
XianBW Dec 19, 2024
b6241cd
fix
XianBW Dec 19, 2024
7e2874f
remove measure_time because it's duplicated (in LoopBase)
XianBW Dec 19, 2024
3335406
add the knowledge query for data_loader & feature
WinstonLiyt Dec 19, 2024
26da5c4
edit ds workflow evaluator
TPLin22 Dec 19, 2024
00ad54e
data_loader bug fix
XianBW Dec 19, 2024
35a1db9
stop evolving when all tasks completed
XianBW Dec 19, 2024
f96f9a2
llm app change
XianBW Dec 20, 2024
a0a3db5
fix break all complete strategy
peteryang1 Dec 20, 2024
74a2829
Adding queried knowledge (#508)
xisen-w Dec 20, 2024
737bdb9
fix loop bug
XianBW Dec 20, 2024
6234966
Merge branch 'ds_refactor' of github.com:microsoft/RD-Agent into ds_r…
XianBW Dec 20, 2024
ab41352
ds workflow evaluator; test; refine prompts
TPLin22 Dec 20, 2024
c2ed6e1
workflow spec
WinstonLiyt Dec 20, 2024
02ddf81
fix ci
WinstonLiyt Dec 20, 2024
bfa455a
feature task changes
XianBW Dec 20, 2024
61f0cb8
ds loop change
XianBW Dec 23, 2024
251688b
fix a bug in feat
WinstonLiyt Dec 23, 2024
438a569
add query knowledge for model and workflow
WinstonLiyt Dec 23, 2024
6497957
llm_debug info(for show) using pickle instead of json
XianBW Dec 23, 2024
4b7f4f2
Merge branch 'ds_refactor' of github.com:microsoft/RD-Agent into ds_r…
XianBW Dec 23, 2024
3920a5c
remove NextLoopException
peteryang1 Dec 23, 2024
e8a85a6
loop change
XianBW Dec 23, 2024
3114f78
Merge branch 'ds_refactor' of github.com:microsoft/RD-Agent into ds_r…
XianBW Dec 23, 2024
5845173
coder raise CoderError when all sub_tasks failed
XianBW Dec 23, 2024
3db73f0
rename code_dict to file_dict in FBWorkspace
XianBW Dec 23, 2024
7f85fdf
add CoSTEER unittest
XianBW Dec 23, 2024
9009b73
now show self.version in Task.get_task_information(), simplify CoSTEE…
XianBW Dec 23, 2024
39abb25
remove some properties in ModelTask, add model_type in it.
XianBW Dec 23, 2024
a6505d1
fix llm app bug
XianBW Dec 24, 2024
87dea18
llm web app bug fix
XianBW Dec 24, 2024
d2d88d9
ds loop bug fix
XianBW Dec 24, 2024
e8c2d6c
fix: give component code to feature&ens eval
XianBW Dec 24, 2024
0722d77
loop catch error bug
XianBW Dec 25, 2024
b53e03e
rename load_from_raw_data to load_data
XianBW Dec 25, 2024
01ad2e9
feat: Add debug data creation functionality for data science scenarios
you-n-g Dec 25, 2024
db1455b
support local folder (#511)
qew21 Dec 25, 2024
12a27ec
update sample data script
qew21 Dec 25, 2024
de2825f
make sure frac < 1
qew21 Dec 25, 2024
9a4ba5f
fix a bug
WinstonLiyt Dec 25, 2024
75b40d8
feature spec changes
XianBW Dec 25, 2024
84f0fb8
Merge branch 'ds_refactor' of github.com:microsoft/RD-Agent into ds_r…
XianBW Dec 25, 2024
e8f2410
fix
XianBW Dec 25, 2024
418c2ce
changeimport order
qew21 Dec 26, 2024
fe10da7
clear unnecessary std outputs
WinstonLiyt Dec 26, 2024
a4e3ced
fix a typo
WinstonLiyt Dec 26, 2024
e009fd7
create sample folder after unzip kaggle data
qew21 Dec 26, 2024
36d26ee
feature/model test script update
XianBW Dec 26, 2024
08df71a
Align the data types across modules.
WinstonLiyt Dec 26, 2024
c02fd79
fix a bug in model eval
WinstonLiyt Dec 26, 2024
d3e3f60
show line number
XianBW Dec 26, 2024
fa21c04
move sample entry point to app
qew21 Dec 26, 2024
6682711
spec & model prompt changes
XianBW Dec 27, 2024
f8113b2
Refine the competition specification to address the data type problem…
WinstonLiyt Dec 27, 2024
36b4191
fix some bugs
WinstonLiyt Dec 27, 2024
34aa750
add file filter in FBworkspace.code property
XianBW Dec 27, 2024
d30ff40
support non-binary prediction
qew21 Dec 27, 2024
72bfa90
avoid too much warnings
qew21 Dec 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -170,3 +170,4 @@ mlruns/
# shell script
*.out
*.sh
.aider*
2 changes: 1 addition & 1 deletion rdagent/app/data_mining/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ class MedBasePropSetting(BasePropSetting):
runner: str = "rdagent.scenarios.data_mining.developer.model_runner.DMModelRunner"
"""Runner class"""

summarizer: str = "rdagent.scenarios.data_mining.developer.feedback.DMModelHypothesisExperiment2Feedback"
summarizer: str = "rdagent.scenarios.data_mining.developer.feedback.DMModelExperiment2Feedback"
"""Summarizer class"""

evolving_n: int = 10
Expand Down
48 changes: 48 additions & 0 deletions rdagent/app/data_science/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
from rdagent.app.kaggle.conf import KaggleBasePropSetting
from rdagent.core.conf import ExtendedSettingsConfigDict


class DataScienceBasePropSetting(KaggleBasePropSetting):
model_config = ExtendedSettingsConfigDict(env_prefix="DS_", protected_namespaces=())

# Main components
## Scen
scen: str = "rdagent.scenarios.data_science.scen.KaggleScen"
"""Scenario class for data mining model"""

## proposal
exp_gen: str = "rdagent.scenarios.data_science.proposal.exp_gen.DSExpGen"

# the two below should be used in ExpGen
# hypothesis_gen: str = "rdagent.scenarios.kaggle.proposal.proposal.KGHypothesisGen"
# """Hypothesis generation class"""
#
# hypothesis2experiment: str = "rdagent.scenarios.kaggle.proposal.proposal.KGHypothesis2Experiment"
# """Hypothesis to experiment class"""

## dev/coder
data_loader_coder: str = "rdagent.components.coder.data_science.raw_data_loader.DataLoaderCoSTEER"
"""Data Loader CoSTEER"""

# feature_coder: str = "rdagent.scenarios.kaggle.developer.coder.KGFactorCoSTEER"
# """Feature Coder class"""

# model_feature_selection_coder: str = "rdagent.scenarios.kaggle.developer.coder.KGModelFeatureSelectionCoder"
# """Model Feature Selection Coder class"""

# model_coder: str = "rdagent.scenarios.kaggle.developer.coder.KGModelCoSTEER"
# """Model Coder class"""

## dev/runner
feature_runner: str = "rdagent.scenarios.kaggle.developer.runner.KGFactorRunner"
"""Feature Runner class"""

model_runner: str = "rdagent.scenarios.kaggle.developer.runner.KGModelRunner"
"""Model Runner class"""

## feedback
summarizer: str = "rdagent.scenarios.kaggle.developer.feedback.KGExperiment2Feedback"
"""Summarizer class"""


DS_RD_SETTING = DataScienceBasePropSetting()
7 changes: 7 additions & 0 deletions rdagent/app/data_science/debug.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
import fire

from rdagent.scenarios.data_science.debug.data import create_debug_data


if __name__ == "__main__":
fire.Fire(create_debug_data)
131 changes: 131 additions & 0 deletions rdagent/app/data_science/loop.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
from pathlib import Path
from typing import Any

import fire

from rdagent.app.data_science.conf import DS_RD_SETTING
from rdagent.components.coder.data_science.ensemble import EnsembleCoSTEER
from rdagent.components.coder.data_science.feature import FeatureCoSTEER
from rdagent.components.coder.data_science.model import ModelCoSTEER
from rdagent.components.coder.data_science.raw_data_loader import DataLoaderCoSTEER
from rdagent.components.coder.data_science.workflow import WorkflowCoSTEER
from rdagent.components.workflow.conf import BasePropSetting
from rdagent.components.workflow.rd_loop import RDLoop
from rdagent.core.proposal import HypothesisFeedback
from rdagent.core.scenario import Scenario
from rdagent.core.utils import import_class
from rdagent.log import rdagent_logger as logger
from rdagent.scenarios.data_science.dev.feedback import DSExperiment2Feedback
from rdagent.scenarios.data_science.dev.runner import DSRunner
from rdagent.scenarios.data_science.experiment.experiment import DSExperiment
from rdagent.scenarios.data_science.proposal.exp_gen import DSExpGen, DSTrace
from rdagent.scenarios.kaggle.kaggle_crawler import download_data


class DataScienceRDLoop(RDLoop):
skip_loop_error = ()

def __init__(self, PROP_SETTING: BasePropSetting):
scen: Scenario = import_class(PROP_SETTING.scen)(PROP_SETTING.competition)

### shared components in the workflow # TODO: check if
knowledge_base = (
import_class(PROP_SETTING.knowledge_base)(PROP_SETTING.knowledge_base_path, scen)
if PROP_SETTING.knowledge_base != ""
else None
)

# 1) task generation from scratch
# self.scratch_gen: tuple[HypothesisGen, Hypothesis2Experiment] = DummyHypothesisGen(scen),

# 2) task generation from a complete solution
# self.exp_gen: ExpGen = import_class(PROP_SETTING.exp_gen)(scen)
self.exp_gen = DSExpGen(scen)
self.data_loader_coder = DataLoaderCoSTEER(scen)
self.feature_coder = FeatureCoSTEER(scen)
self.model_coder = ModelCoSTEER(scen)
self.ensemble_coder = EnsembleCoSTEER(scen)
self.workflow_coder = WorkflowCoSTEER(scen)

self.runner = DSRunner(scen)
# self.summarizer: Experiment2Feedback = import_class(PROP_SETTING.summarizer)(scen)
# logger.log_object(self.summarizer, tag="summarizer")

# self.trace = KGTrace(scen=scen, knowledge_base=knowledge_base)
self.trace = DSTrace(scen=scen)
self.summarizer = DSExperiment2Feedback(scen)
super(RDLoop, self).__init__()

def direct_exp_gen(self, prev_out: dict[str, Any]):
exp = self.exp_gen.gen(self.trace)
return exp

def coding(self, prev_out: dict[str, Any]):
exp: DSExperiment = prev_out["direct_exp_gen"]
if exp.hypothesis.component == "DataLoadSpec":
exp = self.data_loader_coder.develop(exp)
elif exp.hypothesis.component == "FeatureEng":
exp = self.feature_coder.develop(exp)
elif exp.hypothesis.component == "Model":
exp = self.model_coder.develop(exp)
elif exp.hypothesis.component == "Ensemble":
exp = self.ensemble_coder.develop(exp)
elif exp.hypothesis.component == "Workflow":
exp = self.workflow_coder.develop(exp)
else:
raise NotImplementedError(f"Unsupported component in DataScienceRDLoop: {exp.hypothesis.component}")

return exp

def running(self, prev_out: dict[str, Any]):
if self.trace.all_components_completed():
exp = self.runner.develop(prev_out["coding"])
else:
exp = prev_out["coding"]
return exp

def feedback(self, prev_out: dict[str, Any]):
if self.trace.all_components_completed():
feedback = self.summarizer.generate_feedback(
prev_out["running"], prev_out["direct_exp_gen"].hypothesis, self.trace
)
else:
feedback = HypothesisFeedback(
observations="Not all 5 components are completed, skip feedback of DataScienceRDLoop.",
hypothesis_evaluation="",
new_hypothesis="",
reason="",
decision=True,
)
self.trace.hist.append((prev_out["direct_exp_gen"].hypothesis, prev_out["running"], feedback))


def main(path=None, step_n=None, competition="bms-molecular-translation"):
"""
Auto R&D Evolving loop for models in a kaggle{} scenario.
You can continue running session by
.. code-block:: bash
dotenv run -- python rdagent/app/data_science/loop.py [--competition titanic] $LOG_PATH/__session__/1/0_propose --step_n 1 # `step_n` is a optional parameter
rdagent kaggle --competition playground-series-s4e8 # You are encouraged to use this one.
"""
if competition is not None:
DS_RD_SETTING.competition = competition

if DS_RD_SETTING.competition:
if DS_RD_SETTING.scen.endswith("KaggleScen"):
download_data(competition=DS_RD_SETTING.competition, settings=DS_RD_SETTING)
else:
if not Path(f"{DS_RD_SETTING.local_data_path}/{competition}").exists():
logger.error(f"Please prepare data for competition {competition} first.")
return
else:
logger.error("Please specify competition name.")
if path is None:
kaggle_loop = DataScienceRDLoop(DS_RD_SETTING)
else:
kaggle_loop = DataScienceRDLoop.load(path)
kaggle_loop.run(step_n=step_n)


if __name__ == "__main__":
fire.Fire(main)
28 changes: 12 additions & 16 deletions rdagent/app/kaggle/conf.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
from rdagent.components.workflow.conf import BasePropSetting
from rdagent.core.conf import ExtendedSettingsConfigDict
from rdagent.core.conf import ExtendedBaseSettings, ExtendedSettingsConfigDict


class KaggleBasePropSetting(BasePropSetting):
class KaggleBasePropSetting(ExtendedBaseSettings):
model_config = ExtendedSettingsConfigDict(env_prefix="KG_", protected_namespaces=())

# 1) overriding the default
Expand Down Expand Up @@ -30,7 +29,7 @@ class KaggleBasePropSetting(BasePropSetting):
model_runner: str = "rdagent.scenarios.kaggle.developer.runner.KGModelRunner"
"""Model Runner class"""

summarizer: str = "rdagent.scenarios.kaggle.developer.feedback.KGHypothesisExperiment2Feedback"
summarizer: str = "rdagent.scenarios.kaggle.developer.feedback.KGExperiment2Feedback"
"""Summarizer class"""

evolving_n: int = 10
Expand All @@ -45,12 +44,21 @@ class KaggleBasePropSetting(BasePropSetting):
local_data_path: str = ""
"""Folder storing Kaggle competition data"""

if_using_mle_data: bool = False
auto_submit: bool = False
"""Automatically upload and submit each experiment result to Kaggle platform"""
# Conditionally set the knowledge_base based on the use of graph RAG
knowledge_base: str = ""
"""Knowledge base class, uses 'KGKnowledgeGraph' when advanced graph-based RAG is enabled, otherwise empty."""
if_action_choosing_based_on_UCB: bool = False
"""Enable decision mechanism based on UCB algorithm"""

domain_knowledge_path: str = "/data/userdata/share/kaggle/domain_knowledge"
"""Folder storing domain knowledge files in .case format"""

knowledge_base_path: str = "kg_graph.pkl"
"""Advanced version of graph-based RAG"""

rag_path: str = "git_ignore_folder/kaggle_vector_base.pkl"
"""Base version of vector-based RAG"""

Expand All @@ -60,20 +68,8 @@ class KaggleBasePropSetting(BasePropSetting):
if_using_graph_rag: bool = False
"""Enable advanced graph-based RAG"""

# Conditionally set the knowledge_base based on the use of graph RAG
knowledge_base: str = ""
"""Knowledge base class, uses 'KGKnowledgeGraph' when advanced graph-based RAG is enabled, otherwise empty."""

knowledge_base_path: str = "kg_graph.pkl"
"""Advanced version of graph-based RAG"""

auto_submit: bool = False
"""Automatically upload and submit each experiment result to Kaggle platform"""

mini_case: bool = False
"""Enable mini-case study for experiments"""

if_using_mle_data: bool = False


KAGGLE_IMPLEMENT_SETTING = KaggleBasePropSetting()
28 changes: 15 additions & 13 deletions rdagent/app/kaggle/loop.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,13 @@
from rdagent.core.developer import Developer
from rdagent.core.exception import FactorEmptyError, ModelEmptyError
from rdagent.core.proposal import (
Experiment2Feedback,
Hypothesis2Experiment,
HypothesisExperiment2Feedback,
HypothesisGen,
)
from rdagent.core.scenario import Scenario
from rdagent.core.utils import import_class
from rdagent.log import rdagent_logger as logger
from rdagent.log.time import measure_time
from rdagent.scenarios.kaggle.experiment.scenario import (
KG_ACTION_FEATURE_ENGINEERING,
KG_ACTION_FEATURE_PROCESSING,
Expand All @@ -28,7 +27,6 @@


class KaggleRDLoop(RDLoop):
@measure_time
def __init__(self, PROP_SETTING: BasePropSetting):
with logger.tag("init"):
scen: Scenario = import_class(PROP_SETTING.scen)(PROP_SETTING.competition)
Expand All @@ -55,27 +53,31 @@ def __init__(self, PROP_SETTING: BasePropSetting):
logger.log_object(self.feature_runner, tag="feature runner")
self.model_runner: Developer = import_class(PROP_SETTING.model_runner)(scen)
logger.log_object(self.model_runner, tag="model runner")
self.summarizer: HypothesisExperiment2Feedback = import_class(PROP_SETTING.summarizer)(scen)
self.summarizer: Experiment2Feedback = import_class(PROP_SETTING.summarizer)(scen)
logger.log_object(self.summarizer, tag="summarizer")
self.trace = KGTrace(scen=scen, knowledge_base=knowledge_base)
super(RDLoop, self).__init__()

@measure_time
def coding(self, prev_out: dict[str, Any]):
with logger.tag("d"): # develop
if prev_out["propose"].action in [KG_ACTION_FEATURE_ENGINEERING, KG_ACTION_FEATURE_PROCESSING]:
exp = self.feature_coder.develop(prev_out["exp_gen"])
elif prev_out["propose"].action == KG_ACTION_MODEL_FEATURE_SELECTION:
exp = self.model_feature_selection_coder.develop(prev_out["exp_gen"])
if prev_out["direct_exp_gen"]["propose"].action in [
KG_ACTION_FEATURE_ENGINEERING,
KG_ACTION_FEATURE_PROCESSING,
]:
exp = self.feature_coder.develop(prev_out["direct_exp_gen"]["exp_gen"])
elif prev_out["direct_exp_gen"]["propose"].action == KG_ACTION_MODEL_FEATURE_SELECTION:
exp = self.model_feature_selection_coder.develop(prev_out["direct_exp_gen"]["exp_gen"])
else:
exp = self.model_coder.develop(prev_out["exp_gen"])
exp = self.model_coder.develop(prev_out["direct_exp_gen"]["exp_gen"])
logger.log_object(exp.sub_workspace_list, tag="coder result")
return exp

@measure_time
def running(self, prev_out: dict[str, Any]):
with logger.tag("ef"): # evaluate and feedback
if prev_out["propose"].action in [KG_ACTION_FEATURE_ENGINEERING, KG_ACTION_FEATURE_PROCESSING]:
if prev_out["direct_exp_gen"]["propose"].action in [
KG_ACTION_FEATURE_ENGINEERING,
KG_ACTION_FEATURE_PROCESSING,
]:
exp = self.feature_runner.develop(prev_out["coding"])
else:
exp = self.model_runner.develop(prev_out["coding"])
Expand Down Expand Up @@ -126,7 +128,7 @@ def main(path=None, step_n=None, competition=None):
"""
if competition:
KAGGLE_IMPLEMENT_SETTING.competition = competition
download_data(competition=competition, local_path=KAGGLE_IMPLEMENT_SETTING.local_data_path)
download_data(competition=competition, settings=KAGGLE_IMPLEMENT_SETTING)
if KAGGLE_IMPLEMENT_SETTING.if_using_graph_rag:
KAGGLE_IMPLEMENT_SETTING.knowledge_base = (
"rdagent.scenarios.kaggle.knowledge_management.graph.KGKnowledgeGraph"
Expand Down
4 changes: 2 additions & 2 deletions rdagent/app/qlib_rd_loop/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ class ModelBasePropSetting(BasePropSetting):
runner: str = "rdagent.scenarios.qlib.developer.model_runner.QlibModelRunner"
"""Runner class"""

summarizer: str = "rdagent.scenarios.qlib.developer.feedback.QlibModelHypothesisExperiment2Feedback"
summarizer: str = "rdagent.scenarios.qlib.developer.feedback.QlibModelExperiment2Feedback"
"""Summarizer class"""

evolving_n: int = 10
Expand All @@ -47,7 +47,7 @@ class FactorBasePropSetting(BasePropSetting):
runner: str = "rdagent.scenarios.qlib.developer.factor_runner.QlibFactorRunner"
"""Runner class"""

summarizer: str = "rdagent.scenarios.qlib.developer.feedback.QlibFactorHypothesisExperiment2Feedback"
summarizer: str = "rdagent.scenarios.qlib.developer.feedback.QlibFactorExperiment2Feedback"
"""Summarizer class"""

evolving_n: int = 10
Expand Down
2 changes: 0 additions & 2 deletions rdagent/app/qlib_rd_loop/factor.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,11 @@
from rdagent.components.workflow.rd_loop import RDLoop
from rdagent.core.exception import FactorEmptyError
from rdagent.log import rdagent_logger as logger
from rdagent.log.time import measure_time


class FactorRDLoop(RDLoop):
skip_loop_error = (FactorEmptyError,)

@measure_time
def running(self, prev_out: dict[str, Any]):
with logger.tag("ef"): # evaluate and feedback
exp = self.runner.develop(prev_out["coding"])
Expand Down
Loading
Loading