Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error when HORIZON=1 #1040

Closed
bachng2017 opened this issue Nov 4, 2022 · 8 comments · Fixed by #1202
Closed

error when HORIZON=1 #1040

bachng2017 opened this issue Nov 4, 2022 · 8 comments · Fixed by #1202
Assignees
Labels
bug Something isn't working wontfix Will not fix / very low priority

Comments

@bachng2017
Copy link
Contributor

Your Environment

  • Python version: python 3.8.8
  • Operating system: oraclelinux:8 container
  • Lightwood version: 22.10.4.2 (compiled from code)

Describe your issue

when HORIZON=1 some predictor creation stopped with error

INFO:lightwood-490943:The block ICP is now running its analyze() method
Traceback (most recent call last):
  File "./mindsdb/integrations/handlers/lightwood_handler/lightwood_handler/functions.py", line 150, in run_learn
    run_fit(predictor_id, df, company_id)
  File "./mindsdb/utilities/functions.py", line 57, in wrapper
    return func(*args, **kwargs)
  File "./mindsdb/integrations/handlers/lightwood_handler/lightwood_handler/functions.py", line 118, in run_fit
    raise e
  File "./mindsdb/integrations/handlers/lightwood_handler/lightwood_handler/functions.py", line 86, in run_fit
    predictor.learn(df)
  File "/home/7538911/.local/lib/python3.8/site-packages/lightwood/helpers/log.py", line 30, in wrap
    result = f(predictor, *args, **kw)
  File "/tmp/9dfe41087bab7a240661225931a1369d5c36e65fc4183ac91667568302975776.py", line 315, in learn
    self.analyze_ensemble(enc_train_test)
  File "/home/7538911/.local/lib/python3.8/site-packages/lightwood/helpers/log.py", line 30, in wrap
    result = f(predictor, *args, **kw)
  File "/tmp/9dfe41087bab7a240661225931a1369d5c36e65fc4183ac91667568302975776.py", line 277, in analyze_ensemble
    self.model_analysis, self.runtime_analyzer = model_analyzer(data=encoded_test_data,train_data=encoded_train_data,stats_info=self.statistical_analysis,tss=self.problem_definition.timeseries_settings,accuracy_functions=self.accuracy_functions,predictor=self.ensemble,target=self.target,dtype_dict=self.dtype_dict,analysis_blocks=self.analysis_blocks,ts_analysis=self.ts_analysis)
  File "/home/7538911/.local/lib/python3.8/site-packages/lightwood/analysis/analyze.py", line 83, in model_analyzer
    runtime_analyzer = block.analyze(runtime_analyzer, **kwargs)
  File "/home/7538911/.local/lib/python3.8/site-packages/lightwood/analysis/nc/calibrate.py", line 150, in analyze
    output['icp']['__default'].calibrate(icp_df.values, y)
  File "/home/7538911/.local/lib/python3.8/site-packages/lightwood/analysis/nc/icp.py", line 102, in calibrate
    cal_scores = self.nc_function.score(self.cal_x, self.cal_y)
  File "/home/7538911/.local/lib/python3.8/site-packages/lightwood/analysis/nc/nc.py", line 409, in score
    err = self.err_func.apply(prediction, y)
  File "/home/7538911/.local/lib/python3.8/site-packages/lightwood/analysis/nc/nc.py", line 223, in apply
    return np.abs(prediction - y)
TypeError: unsupported operand type(s) for -: 'list' and 'float'

How can we replicate it?

run home-sale predict tutorial but using HORIZON=1

@bachng2017 bachng2017 added the bug Something isn't working label Nov 4, 2022
@paxcema paxcema self-assigned this Nov 4, 2022
@bachng2017
Copy link
Contributor Author

a little bit debug showed that, when HORIZON=1 the pd.abs try to compare [ list(double) ] with [ double ], which causes the error. And when HORIZON=2 for example, it correctly compares [ (double,double) ] to [ (double,double)]

@paxcema
Copy link
Contributor

paxcema commented Nov 6, 2022

Hi @bachng2017, thanks for reporting

I've added an extra test to explicitly check this in #1041. However, tests are passing, which means the issue is in the handler upstream. I'll move this issue to the mindsdb repo and we can continue the conversation there.

@bachng2017
Copy link
Contributor Author

bachng2017 commented Nov 6, 2022

hello @paxcema , is this really an upstream issue ? the following test code (copied and modified from your commit) will reproduce the error

data = pd.read_csv('tests/data/house_sales.csv')
gby = ['bedrooms', 'type']
target = 'MA'
order_by = 'saledate'
window = 8
horizon = 1

train, _, test = stratify(data, pct_train=0.8, pct_dev=0, pct_test=0.2, stratify_on=gby, seed=1,
                          reshuffle=False)
jai = json_ai_from_problem(train,
                           ProblemDefinition.from_dict({'target': target,
                                                        'time_aim': 30,  # short time aim
                                                        'timeseries_settings': {
                                                            'group_by': gby,
                                                            'horizon': horizon,
                                                            'order_by': order_by,
                                                            'window': window
                                                        }}))
                                                        
# this will reproduce the error 
jai.model['args']['submodels'][0] = {
  "module": "SkTime"
} 
                                                             
code = code_from_json_ai(jai)
pred = predictor_from_code(code)

# Test with inferring mode, check timestamps are further into the future than test dates
test['__mdb_forecast_offset'] = 1
train_and_check_time_aim(pred, train, ignore_time_aim=True)
preds = pred.predict(test)

@paxcema
Copy link
Contributor

paxcema commented Nov 6, 2022

Whoops, reopening 😅 - I can definitely replicate this. Let me get to the bottom of it tomorrow and we will get the fix deployed shortly. Thanks!

@paxcema paxcema reopened this Nov 6, 2022
@bachng2017
Copy link
Contributor Author

bachng2017 commented Nov 7, 2022

FYI, this might be a fix. It solved the above test case but not sure about the other

diff --git a/lightwood/mixer/sktime.py b/lightwood/mixer/sktime.py
index 8f16380c..895aa49a 100644
--- a/lightwood/mixer/sktime.py
+++ b/lightwood/mixer/sktime.py
@@ -308,7 +308,11 @@ class SkTime(BaseMixer):
         for true_idx, (idx, _) in enumerate(series.items()):
             start_idx = max(0, true_idx)
             end_idx = start_idx + self.horizon
-            ydf['prediction'].loc[idx] = all_preds[start_idx:end_idx]
+            if self.horizon > 1 :
+                ydf['prediction'].loc[idx] = all_preds[start_idx:end_idx]
+            else:
+                ydf['prediction'].loc[idx] = all_preds[start_idx]
+
         return ydf

     def _call_default(self, ydf, data, idxs):

@paxcema
Copy link
Contributor

paxcema commented Nov 7, 2022

Okay, it turns out the error is expected, but it is not obvious at all why.

The reason is that the sktime mixer does not support horizon=1 yet. You can check this is by design if you check the dispatch rules in the JsonAI generation phase, in particular this bit. However, it does raise two interesting points:

  1. Can we add support for horizon=1 in sktime? I can't think of a reason why not, and evidently we are not impeding users to try it out right now (hence the error you got), so we should at least try. I will open a new issue for this, plus a branch that explores the proposed fix you've posted above (unless you want to work on it yourself, which if so please let us know and we will assign you).
  2. Exactly how did you get the error in MindsDB by following the tutorial? You mentioned "run home-sale predict tutorial but using HORIZON=1", did you by any chance also force the sktime mixer via USING syntax? Can you post the complete SQL query?

@bachng2017
Copy link
Contributor Author

bachng2017 commented Nov 7, 2022

thanks for the feedback on this and mindsdb #3984

yes, we also somehow had observed that . And actually we tried another fix that introduce is_single_ts (comparing to the existed is_multi_ts ) but it was a little bit complicated so we replaced it with the mentioned patch.

The reason is that the sktime mixer does not support horizon=1 ye

anyway it's good that the issue is aware now

  • for (1), please go ahead :) Because i have only limited information about this issue only but not for the whole architecture so our proposed patch is very limited.
  • for (2), yes we might unconsciously forced sktime. But as far as I remember, we ran into the problem firstly with some other dataset and configuration including HORIZON=1 then we realized that sktime was related to the cause. For now, we could not reproduce the issue with only HORZION=1.

@paxcema
Copy link
Contributor

paxcema commented Jan 22, 2024

Closing via #1202.

@paxcema paxcema closed this as completed Jan 22, 2024
@paxcema paxcema added the wontfix Will not fix / very low priority label Jan 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working wontfix Will not fix / very low priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants