Skip to content

Commit

Permalink
update xgboost2categorysql
Browse files Browse the repository at this point in the history
  • Loading branch information
anystar authored and anystar committed Nov 27, 2024
1 parent 3e6fb63 commit 32d42a5
Show file tree
Hide file tree
Showing 4 changed files with 27 additions and 17 deletions.
8 changes: 2 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,15 @@

## 思想碰撞

| 微信 | 微信公众号 |
| :---: | :----: |
| <img src="https://github.com/ZhengRyan/autotreemodel/blob/master/images/%E5%B9%B2%E9%A5%AD%E4%BA%BA.png" alt="RyanZheng.png" width="50%" border=0/> | <img src="https://github.com/ZhengRyan/autotreemodel/blob/master/images/%E9%AD%94%E9%83%BD%E6%95%B0%E6%8D%AE%E5%B9%B2%E9%A5%AD%E4%BA%BA.png" alt="魔都数据干饭人.png" width="50%" border=0/> |
| 干饭人 | 魔都数据干饭人 |

本项目fork来自ZhengRyan的优秀项目xgboost2sql,相关链接如下:

> 仓库地址:https://github.com/ZhengRyan/xgboost2sql
>
> 微信公众号文章:https://mp.weixin.qq.com/s/z3IjzMFKP7iEoag5KP6nAA
>
> pipy包:https://pypi.org/project/xgboost2sql/

本项目在xgboost2sql的基础上,对value is missing的情况进行fix,此外扩充了xgboost有类别(category)作为模型输入时的2sql处理

## 环境准备
可以不用单独创建虚拟环境,因为对包的依赖没有版本要求
Expand Down
Binary file removed images/干饭人.png
Binary file not shown.
Binary file removed images/魔都数据干饭人.png
Binary file not shown.
36 changes: 25 additions & 11 deletions xgboost2sql/xgboost2sql.py
Original file line number Diff line number Diff line change
Expand Up @@ -137,16 +137,29 @@ def pre_tree(self, lines, is_right, n):
self.code_str += res + '\n'
return
v = lines[0].strip()
start_index = v.find('[')
median_index = v.find('<')
end_index = v.find(']')
v_name = v[start_index + 1:median_index].strip()
v_value = v[median_index:end_index]
ynm = v[end_index + 1:].strip().split(',')
yes_v = int(ynm[0].replace('yes=', '').strip())
no_v = int(ynm[1].replace('no=', '').strip())
miss_v = int(ynm[2].replace('missing=', '').strip())
z_lines = lines[1:]
if ':{' in v:
start_index = v.find('[')
median_index = v.find(':{')
end_index = v.find(']')
v_name = v[start_index + 1:median_index].strip()
v_value = v[median_index + 1:end_index]
v_value = ' in ('+v_value[1:-1]+')'
ynm = v[end_index + 1:].strip().split(',')
yes_v = int(ynm[0].replace('yes=', '').strip())
no_v = int(ynm[1].replace('no=', '').strip())
miss_v = int(ynm[2].replace('missing=', '').strip())
z_lines = lines[1:]
else:
start_index = v.find('[')
median_index = v.find('<')
end_index = v.find(']')
v_name = v[start_index + 1:median_index].strip()
v_value = v[median_index:end_index]
ynm = v[end_index + 1:].strip().split(',')
yes_v = int(ynm[0].replace('yes=', '').strip())
no_v = int(ynm[1].replace('no=', '').strip())
miss_v = int(ynm[2].replace('missing=', '').strip())
z_lines = lines[1:]

if is_right:
format = '\t' * (n - 1)
Expand All @@ -156,7 +169,8 @@ def pre_tree(self, lines, is_right, n):
res = res + format + 'case when (' + v_name + v_value + ' or ' + v_name + ' is null' + ') then'
else:
format = '\t' * n
res = res + format + 'case when (' + v_name + v_value + ' and ' + v_name + ' is null' + ') then'
# res = res + format + 'case when (' + v_name + v_value + ' and ' + v_name + ' is null' + ') then'
res = res + format + 'case when (' + v_name + v_value + ' and ' + v_name + ' is not null' + ') then'
self.code_str += res + '\n'
left_right = self.get_tree_str(z_lines, yes_v, no_v)

Expand Down

0 comments on commit 32d42a5

Please sign in to comment.