diff --git a/Makefile b/Makefile index 505fa2d0..46575686 100644 --- a/Makefile +++ b/Makefile @@ -7,6 +7,7 @@ SPHINXBUILD = sphinx-build SPHINXPROJ = DaskExamples SOURCEDIR = . BUILDDIR = _build +PYTEST = pytest # Put it first so that "make" without argument is like "make help". help: @@ -17,4 +18,7 @@ help: # Catch-all target: route all unknown targets to Sphinx using the new # "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). %: Makefile - @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) \ No newline at end of file + @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) + +linkcheck: + $(PYTEST) --check-links --check-links-cache \ No newline at end of file diff --git a/applications/async-web-server.ipynb b/applications/async-web-server.ipynb index 022a3ec9..343cce25 100644 --- a/applications/async-web-server.ipynb +++ b/applications/async-web-server.ipynb @@ -18,7 +18,6 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false, "jupyter": { "outputs_hidden": false } @@ -70,7 +69,6 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false, "jupyter": { "outputs_hidden": false } @@ -111,7 +109,6 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false, "jupyter": { "outputs_hidden": false } @@ -125,7 +122,6 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false, "jupyter": { "outputs_hidden": false } @@ -139,7 +135,6 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false, "jupyter": { "outputs_hidden": false } @@ -153,7 +148,6 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false, "jupyter": { "outputs_hidden": false } @@ -176,7 +170,6 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false, "jupyter": { "outputs_hidden": false } @@ -225,7 +218,6 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false, "jupyter": { "outputs_hidden": false } @@ -285,7 +277,6 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false, "jupyter": { "outputs_hidden": false } @@ -300,7 +291,6 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false, "jupyter": { "outputs_hidden": false } @@ -326,7 +316,6 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false, "jupyter": { "outputs_hidden": false } @@ -344,7 +333,6 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false, "jupyter": { "outputs_hidden": false } @@ -364,19 +352,18 @@ "source": [ "### Other options\n", "\n", - "In these situations people today tend to use [concurrent.futures](https://docs.python.org/3/library/concurrent.futures.html) or [Celery](http://www.celeryproject.org/).\n", + "In these situations people today tend to use [concurrent.futures](https://docs.python.org/3/library/concurrent.futures.html) or [Celery](https://docs.celeryproject.org/en/stable/).\n", "\n", "- concurrent.futures allows easy parallelism on a single machine and integrates well into async frameworks. The API is exactly what we showed above (Dask implements the concurrent.futures API). However concurrent.futures doesn't easily scale out to a cluster.\n", "- Celery scales out more easily to multiple machines, but has higher latencies, doesn't scale down as nicely, and needs a bit of effort to integrate into async frameworks (or at least this is my understanding, my experience here is shallow)\n", "\n", - "In this context Dask provides some of the benefits of both. It is easy to set up and use in the common single-machine case, but can also [scale out to a cluster](http://distributed.readthedocs.io/en/latest/setup.html). It integrates nicely with async frameworks and adds only very small latencies." + "In this context Dask provides some of the benefits of both. It is easy to set up and use in the common single-machine case, but can also [scale out to a cluster](https://docs.dask.org/en/latest/setup.html). It integrates nicely with async frameworks and adds only very small latencies." ] }, { "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false, "jupyter": { "outputs_hidden": false } @@ -409,7 +396,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.7.4" + "version": "3.6.6" } }, "nbformat": 4, diff --git a/applications/clip.gif b/applications/clip.gif new file mode 100644 index 00000000..77f67f2c Binary files /dev/null and b/applications/clip.gif differ diff --git a/applications/embarrassingly-parallel.ipynb b/applications/embarrassingly-parallel.ipynb index 21b65751..2def031a 100644 --- a/applications/embarrassingly-parallel.ipynb +++ b/applications/embarrassingly-parallel.ipynb @@ -384,7 +384,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We encourage you to watch the [dashboard's status page](../proxy/8787/status) to watch on going computation." + "We encourage you to watch the [dashboard's status page](https://docs.dask.org/en/latest/diagnostics-distributed.html#dashboard) to watch on going computation." ] }, { @@ -524,4 +524,4 @@ }, "nbformat": 4, "nbformat_minor": 4 -} +} \ No newline at end of file diff --git a/applications/image-processing.ipynb b/applications/image-processing.ipynb index 833b2816..12e7479a 100644 --- a/applications/image-processing.ipynb +++ b/applications/image-processing.ipynb @@ -36,7 +36,7 @@ " 1. [Segmenting](#segmenting)\n", " 1. [Analyzing](#analyzing)\n", "1. [Next steps](#next_steps)\n", - "1. [Cleaning up temporary directories and files]('#cleanup)" + "1. [Cleaning up temporary directories and files](#cleanup)" ] }, { @@ -525,9 +525,9 @@ "metadata": {}, "source": [ "We'll walk through a simple image segmentation and analysis pipeline with three steps:\n", - "1. [Filtering]('#filtering)\n", - "1. [Segmenting]('#segmenting')\n", - "1. [Analyzing]('#analyzing')" + "1. [Filtering](#filtering)\n", + "1. [Segmenting](#segmenting)\n", + "1. [Analyzing](#analyzing)" ] }, { @@ -825,7 +825,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.7" + "version": "3.6.6" } }, "nbformat": 4, diff --git a/applications/prefect-etl.ipynb b/applications/prefect-etl.ipynb index c6165006..176ec747 100644 --- a/applications/prefect-etl.ipynb +++ b/applications/prefect-etl.ipynb @@ -325,14 +325,35 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 2, "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "text/html": [ + "\"Rick" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "from IPython.display import HTML\n", "\n", "HTML('\"Rick')" ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] } ], "metadata": { @@ -351,7 +372,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.7.4" + "version": "3.6.6" } }, "nbformat": 4, diff --git a/applications/stencils-with-numba.ipynb b/applications/stencils-with-numba.ipynb index a1ddd01f..329b33ac 100644 --- a/applications/stencils-with-numba.ipynb +++ b/applications/stencils-with-numba.ipynb @@ -16,7 +16,7 @@ "In particular we show off two Numba features, and how they compose with Dask:\n", "\n", "1. Numba's [stencil decorator](https://numba.pydata.org/numba-doc/dev/user/stencil.html)\n", - "2. NumPy's [Generalized Universal Functions](https://docs.scipy.org/doc/numpy/reference/c-api.generalized-ufuncs.html)\n", + "2. NumPy's [Generalized Universal Functions](https://numpy.org/devdocs/reference/c-api/generalized-ufuncs.html)\n", "\n", "*This was originally published as a blogpost [here](https://blog.dask.org/2019/04/09/numba-stencil)*" ] @@ -200,7 +200,7 @@ "\n", "**Numba Docs:** https://numba.pydata.org/numba-doc/dev/user/vectorize.html\n", "\n", - "**NumPy Docs:** https://docs.scipy.org/doc/numpy-1.16.0/reference/c-api. generalized-ufuncs.html\n", + "**NumPy Docs:** https://numpy.org/devdocs/reference/c-api/generalized-ufuncs.html\n", "\n", "A generalized universal function (gufunc) is a normal function that has been\n", "annotated with typing and dimension information. For example we can redefine\n", @@ -346,7 +346,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.7.1" + "version": "3.6.6" } }, "nbformat": 4, diff --git a/dataframes/03-from-pandas-to-dask.ipynb b/dataframes/03-from-pandas-to-dask.ipynb index 9a70e7bf..efede45e 100644 --- a/dataframes/03-from-pandas-to-dask.ipynb +++ b/dataframes/03-from-pandas-to-dask.ipynb @@ -110,7 +110,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "See [documentation for addtional cluster configuration](http://distributed.dask.org/en/latest/local-cluster.html)" + "See [documentation for addtional cluster configuration](http://distributed.dask.org/en/1.12.2/local-cluster.html)" ] }, { diff --git a/delayed.ipynb b/delayed.ipynb index a9340ee3..6633fc21 100644 --- a/delayed.ipynb +++ b/delayed.ipynb @@ -287,12 +287,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "If you're watching the [dashboard's status page](../proxy/8787/status) then you may want to note two things:\n", + "If you're watching the [dashboard's status page](https://docs.dask.org/en/latest/diagnostics-distributed.html#dashboard) then you may want to note two things:\n", "\n", "1. The red bars are for inter-worker communication. They happen as different workers need to combine their intermediate values\n", "2. There is lots of parallelism at the beginning but less towards the end as we reach the top of the tree where there is less work to do.\n", "\n", - "Alternativley you may want to navigate to the [dashboard's graph page](../proxy/8787/graph) and then run the cell above again. You will be able to see the task graph evolve during the computation." + "Alternativley you may want to navigate to the dashboard's graph page and then run the cell above again. You will be able to see the task graph evolve during the computation." ] }, { @@ -322,7 +322,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.7.1" + "version": "3.6.6" } }, "nbformat": 4, diff --git a/futures.ipynb b/futures.ipynb index 87b2a988..11fadb0d 100644 --- a/futures.ipynb +++ b/futures.ipynb @@ -274,12 +274,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "If you're watching the [dashboard's status page](../proxy/8787/status) then you may want to note two things:\n", + "If you're watching the [dashboard's status page](https://docs.dask.org/en/latest/diagnostics-distributed.html#dashboard) then you may want to note two things:\n", "\n", "1. The red bars are for inter-worker communication. They happen as different workers need to combine their intermediate values\n", "2. There is lots of parallelism at the beginning but less towards the end as we reach the top of the tree where there is less work to do.\n", "\n", - "Alternatively you may want to navigate to the [dashboard's graph page](../proxy/8787/graph) and then run the cell above again. You will be able to see the task graph evolve during the computation." + "Alternatively you may want to navigate to the [dashboard's graph page](https://docs.dask.org/en/latest/diagnostics-distributed.html#dashboard) and then run the cell above again. You will be able to see the task graph evolve during the computation." ] }, { diff --git a/machine-learning/hyperparam-opt.ipynb b/machine-learning/hyperparam-opt.ipynb index 29fd2d05..c7ae4904 100644 --- a/machine-learning/hyperparam-opt.ipynb +++ b/machine-learning/hyperparam-opt.ipynb @@ -574,7 +574,7 @@ "source": [ "This notebook covered basic usage `HyperbandSearchCV`. The following documentation and resources might be useful to learn more about `HyperbandSearchCV`, including some of the finer use cases:\n", "\n", - "* [A talk](https://www.youtube.com/watch?v=x67K9FiPFBQ) introducing `HyperbandSearchCV` to the SciPy 2019 audience and the [corresponding paper](https://conference.scipy.org/proceedings/scipy2019/pdfs/scott_sievert.pdf)\n", + "* [A talk](https://www.youtube.com/watch?v=x67K9FiPFBQ) introducing `HyperbandSearchCV` to the SciPy 2019 audience and the [corresponding paper](http://conference.scipy.org/proceedings/scipy2019/pdfs/scott_sievert.pdf)\n", "* [HyperbandSearchCV's documentation](https://ml.dask.org/modules/generated/dask_ml.model_selection.HyperbandSearchCV.html)\n", "\n", "Performance comparisons can be found in the SciPy 2019 talk/paper." diff --git a/machine-learning/incremental.ipynb b/machine-learning/incremental.ipynb index 065a5e34..f34e26e9 100644 --- a/machine-learning/incremental.ipynb +++ b/machine-learning/incremental.ipynb @@ -16,7 +16,7 @@ "...\n", "```\n", "\n", - "The Scikit-Learn documentation discusses this approach in more depth in their [user guide](http://scikit-learn.org/stable/modules/scaling_strategies.html).\n", + "The Scikit-Learn documentation discusses this approach in more depth in their [user guide](https://sklearn.org/modules/scaling_strategies.html).\n", "\n", "This notebook demonstrates the use of Dask-ML's `Incremental` meta-estimator, which automates the use of Scikit-Learn's `partial_fit` over Dask arrays and dataframes. Scikit-Learn handles all of the computation while Dask handles the data management, loading and moving batches of data as necessary. This allows scaling to large datasets distributed across many machines, or to datasets that do not fit in memory, all with a familiar workflow.\n", "\n", @@ -27,7 +27,7 @@ "\n", "Although this example uses Scikit-Learn's SGDClassifer, the `Incremental` meta-estimator will work for any class that implements `partial_fit` and the [scikit-learn base estimator API].\n", "\n", - " \n", + " \n", "\n", "[scikit-learn base estimator API]:http://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html\n", "\n" @@ -181,7 +181,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Here we use `SGDClassifier`, but any estimator that implements the `partial_fit` method will work. A list of Scikit-Learn models that implement this API is available [here](http://scikit-learn.org/stable/modules/scaling_strategies.html#incremental-learning).\n" + "Here we use `SGDClassifier`, but any estimator that implements the `partial_fit` method will work. A list of Scikit-Learn models that implement this API is available [here](https://scikit-learn.org/0.15/modules/scaling_strategies.html#incremental-learning)." ] }, { @@ -316,9 +316,9 @@ "\n", "In this notebook we went over using Dask-ML's `Incremental` meta-estimator to automate the process of incremental training with Scikit-Learn estimators that implement the `partial_fit` method. If you want to learn more about this process you might want to investigate the following documentation:\n", "\n", - "1. http://scikit-learn.org/stable/modules/scaling_strategies.html\n", + "1. https://sklearn.org/modules/scaling_strategies.html\n", "2. [Dask-ML Incremental API documentation](http://ml.dask.org/modules/generated/dask_ml.wrappers.Incremental.html#dask_ml.wrappers.Incremental)\n", - "3. [List of Scikit-Learn estimators compatible with Dask-ML's Incremental](http://scikit-learn.org/stable/modules/scaling_strategies.html#incremental-learning)\n", + "3. [List of Scikit-Learn estimators compatible with Dask-ML's Incremental](https://sklearn.org/modules/scaling_strategies.html#incremental-learning)\n", "4. For more info on the train-test split for model evaluation, see [Hyperparameters and Model Validation](https://jakevdp.github.io/PythonDataScienceHandbook/05.03-hyperparameters-and-model-validation.html)." ] } @@ -339,7 +339,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.4" + "version": "3.6.6" } }, "nbformat": 4, diff --git a/machine-learning/voting-classifier.ipynb b/machine-learning/voting-classifier.ipynb index 7a0796aa..b72701d4 100644 --- a/machine-learning/voting-classifier.ipynb +++ b/machine-learning/voting-classifier.ipynb @@ -87,7 +87,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Creating a Dask [client](https://distributed.readthedocs.io/en/latest/client.html) provides performance and progress metrics via the dashboard. Because ```Client``` is given no arugments, its output refers to a [local cluster](http://distributed.readthedocs.io/en/latest/local-cluster.html) (not a distributed cluster).\n", + "Creating a Dask [client](https://distributed.readthedocs.io/en/latest/client.html) provides performance and progress metrics via the dashboard. Because ```Client``` is given no arugments, its output refers to a [local cluster](https://distributed.dask.org/en/latest/api.html#cluster) (not a distributed cluster).\n", "\n", "We can view the dashboard by clicking the link after running the cell." ] @@ -149,7 +149,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.7.3" + "version": "3.6.6" } }, "nbformat": 4, diff --git a/machine-learning/xgboost.ipynb b/machine-learning/xgboost.ipynb index 897e597a..d8876917 100644 --- a/machine-learning/xgboost.ipynb +++ b/machine-learning/xgboost.ipynb @@ -272,7 +272,6 @@ "metadata": {}, "source": [ "## Learn more\n", - "* Similar example that uses DataFrames for a real world dataset: http://ml.dask.org/examples/xgboost.html\n", "* Recorded screencast stepping through the real world example above:\n", "* A blogpost on dask-xgboost http://matthewrocklin.com/blog/work/2017/03/28/dask-xgboost\n", "* XGBoost documentation: https://xgboost.readthedocs.io/en/latest/python/python_intro.html#\n", @@ -296,7 +295,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.7.4" + "version": "3.6.6" } }, "nbformat": 4, diff --git a/surveys/2019.ipynb b/surveys/2019.ipynb index 6c01f527..f9eeb617 100644 --- a/surveys/2019.ipynb +++ b/surveys/2019.ipynb @@ -147,7 +147,7 @@ "source": [ "Overall, documentation is still the leader across user user groups.\n", "\n", - "The usage of the [Dask tutorial](https://github.com/dask/dask-tutorial) and the [dask examples](examples.dask.org) are relatively consistent across groups. The primary difference between regular and new users is that regular users are more likely to engage on GitHub.\n", + "The usage of the [Dask tutorial](https://github.com/dask/dask-tutorial) and the [dask examples](https://examples.dask.org) are relatively consistent across groups. The primary difference between regular and new users is that regular users are more likely to engage on GitHub.\n", "\n", "From StackOverflow questions and GitHub issues, we have a vague idea about which parts of the library are used.\n", "The survey shows that (for our respondents at least) DataFrame and Delayed are the most commonly used APIs." @@ -465,7 +465,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.7.3" + "version": "3.6.6" } }, "nbformat": 4,