Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decide fate of -viz starters and viz tool #263

Open
2 tasks
astrojuanlu opened this issue Feb 7, 2025 · 1 comment
Open
2 tasks

Decide fate of -viz starters and viz tool #263

astrojuanlu opened this issue Feb 7, 2025 · 1 comment
Assignees

Comments

@astrojuanlu
Copy link
Member

Hi @ankatiyar and @DimedS , I am not sure if it is worth to keep the viz starters after removing ET. The difference I see are the plotly charts. I feel we can keep the viz starters and rename them to spaceflights-pandas and spaceflights-pyspark to get the plots. What do you think ?

Originally posted by @ravi-kumar-pilla in #262 (comment)

About keeping "Viz" as a tool in kedro new, initially the idea was to add make it add kedro-viz as a dependency, but... it was already a dependency of the non -viz starters!

Therefore we need to

  • Agree on having only spaceflights-pandas and spaceflights-pyspark, optionally merging the contents of -viz
  • Agree on what to do with the viz tool in kedro new
@astrojuanlu
Copy link
Member Author

astrojuanlu commented Feb 13, 2025

Difference between picking the Viz tool no/yes, including 1-5 and example pipeline in both cases:

--- tree_no_viz.txt	2025-02-13 10:36:20
+++ tree_yes_viz.txt	2025-02-13 10:36:28
@@ -1,4 +1,4 @@
-kedro-no-viz/
+kedro-yes-viz/
 ├── README.md
 ├── conf
 │   ├── README.md
@@ -6,7 +6,8 @@
 │   │   ├── catalog.yml
 │   │   ├── parameters.yml
 │   │   ├── parameters_data_processing.yml
-│   │   └── parameters_data_science.yml
+│   │   ├── parameters_data_science.yml
+│   │   └── parameters_reporting.yml
 │   ├── local
 │   │   └── credentials.yml
 │   └── logging.yml
@@ -30,7 +31,7 @@
 ├── pyproject.toml
 ├── requirements.txt
 ├── src
-│   └── kedro_no_viz
+│   └── kedro_yes_viz
 │       ├── __init__.py
 │       ├── __main__.py
 │       ├── pipeline_registry.py
@@ -40,7 +41,11 @@
 │       │   │   ├── __init__.py
 │       │   │   ├── nodes.py
 │       │   │   └── pipeline.py
-│       │   └── data_science
+│       │   ├── data_science
+│       │   │   ├── __init__.py
+│       │   │   ├── nodes.py
+│       │   │   └── pipeline.py
+│       │   └── reporting
 │       │       ├── __init__.py
 │       │       ├── nodes.py
 │       │       └── pipeline.py
@@ -53,4 +58,4 @@
     │       └── test_pipeline.py
     └── test_run.py
 
-24 directories, 30 files
+25 directories, 34 files

Relevant content differences between the two:

diff --color=auto -u kedro-no-viz/pyproject.toml kedro-yes-viz/pyproject.toml
--- kedro-no-viz/pyproject.toml	2025-02-13 10:35:55
+++ kedro-yes-viz/pyproject.toml	2025-02-13 10:36:09
@@ -4,7 +4,7 @@
 
 [project]
 requires-python = ">=3.9"
-name = "kedro_no_viz"
+name = "kedro_yes_viz"
 readme = "README.md"
 dynamic = ["version"]
 dependencies = [
@@ -12,13 +12,14 @@
     "jupyterlab>=3.0",
     "notebook",
     "kedro[jupyter]~=0.19.11",
-    "kedro-datasets[pandas-csvdataset, pandas-exceldataset, pandas-parquetdataset]>=3.0",
+    "kedro-datasets[pandas-csvdataset, pandas-exceldataset, pandas-parquetdataset, plotly-plotlydataset, plotly-jsondataset, matplotlib-matplotlibwriter]>=3.0",
     "kedro-viz>=6.7.0",
-    "scikit-learn~=1.5.1"
+    "scikit-learn~=1.5.1",
+    "seaborn~=0.12.1"
 ]
 
 [project.scripts]
-"kedro-no-viz" = "kedro_no_viz.__main__:main"
+"kedro-yes-viz" = "kedro_yes_viz.__main__:main"
 
 [project.entry-points."kedro.hooks"]
diff --color=auto -u kedro-no-viz/requirements.txt kedro-yes-viz/requirements.txt
--- kedro-no-viz/requirements.txt	2025-02-13 10:35:56
+++ kedro-yes-viz/requirements.txt	2025-02-13 10:36:09
@@ -1,7 +1,8 @@
 ipython>=8.10
 jupyterlab>=3.0
-kedro-datasets[pandas-csvdataset, pandas-exceldataset, pandas-parquetdataset]>=3.0
+kedro-datasets[pandas-csvdataset, pandas-exceldataset, pandas-parquetdataset, plotly-plotlydataset, plotly-jsondataset, matplotlib-matplotlibwriter]>=3.0
 kedro-viz>=6.7.0
 kedro[jupyter]~=0.19.11
 notebook
 scikit-learn~=1.5.1
+seaborn~=0.12.1

So given that we were installing Kedro-Viz in both cases anyway and that the content differences are quite minor, I contend that

  • Kedro-Viz should continue to be installed in all cases,
  • we should add information on the README of the starters on how to launch Kedro-Viz ("run kedro viz run and you will see [screenshot] in the browser")
  • while we're at it, we should mention the VS Code extension too,
  • we should add the Plotly and matplotlib datasets as well as seaborn if the user asks for an example pipeline,
  • and the Viz tool should be removed (given that it will be installed anyway)

If I understand correctly, this means merging spaceflights-pandas-viz with spaceflights-pandas, and spaceflights-pyspark-viz with spaceflights-pyspark.

Thoughts?

@astrojuanlu astrojuanlu moved this from To Do to In Progress in Kedro 🔶 Feb 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

1 participant