Skip to content

Fix broken links (#152) #202

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion applications/async-web-server.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -364,7 +364,7 @@
"source": [
"### Other options\n",
"\n",
"In these situations people today tend to use [concurrent.futures](https://docs.python.org/3/library/concurrent.futures.html) or [Celery](http://www.celeryproject.org/).\n",
"In these situations people today tend to use [concurrent.futures](https://docs.python.org/3/library/concurrent.futures.html) or [Celery](https://docs.celeryproject.org/en/latest/index.html).\n",
"\n",
"- concurrent.futures allows easy parallelism on a single machine and integrates well into async frameworks. The API is exactly what we showed above (Dask implements the concurrent.futures API). However concurrent.futures doesn't easily scale out to a cluster.\n",
"- Celery scales out more easily to multiple machines, but has higher latencies, doesn't scale down as nicely, and needs a bit of effort to integrate into async frameworks (or at least this is my understanding, my experience here is shallow)\n",
Expand Down
6 changes: 3 additions & 3 deletions applications/stencils-with-numba.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
"In particular we show off two Numba features, and how they compose with Dask:\n",
"\n",
"1. Numba's [stencil decorator](https://numba.pydata.org/numba-doc/dev/user/stencil.html)\n",
"2. NumPy's [Generalized Universal Functions](https://docs.scipy.org/doc/numpy/reference/c-api.generalized-ufuncs.html)\n",
"2. NumPy's [Generalized Universal Functions](https://numpy.org/doc/stable/reference/c-api/generalized-ufuncs.html)\n",
"\n",
"*This was originally published as a blogpost [here](https://blog.dask.org/2019/04/09/numba-stencil)*"
]
Expand Down Expand Up @@ -168,7 +168,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"And then because each of the chunks of a Dask array are just NumPy arrays, we can use the [map_blocks](https://docs.dask.org/en/latest/array-api.html#dask.array.core.map_blocks) function to apply this function across all of our images, and then save them out.\n",
"And then because each of the chunks of a Dask array are just NumPy arrays, we can use the [map_blocks](https://docs.dask.org/en/latest/generated/dask.array.map_blocks.html) function to apply this function across all of our images, and then save them out.\n",
"\n",
"This is fine, but lets go a bit further, and discuss generalized universal functions from NumPy."
]
Expand Down Expand Up @@ -200,7 +200,7 @@
"\n",
"**Numba Docs:** https://numba.pydata.org/numba-doc/dev/user/vectorize.html\n",
"\n",
"**NumPy Docs:** https://docs.scipy.org/doc/numpy-1.16.0/reference/c-api. generalized-ufuncs.html\n",
"**NumPy Docs:** https://numpy.org/doc/stable/reference/c-api/generalized-ufuncs.html\n",
"\n",
"A generalized universal function (gufunc) is a normal function that has been\n",
"annotated with typing and dimension information. For example we can redefine\n",
Expand Down
2 changes: 1 addition & 1 deletion dataframes/02-groupby.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@
"source": [
"This is the same as with Pandas. Generally speaking, Dask.dataframe groupby-aggregations are roughly same performance as Pandas groupby-aggregations, just more scalable.\n",
"\n",
"You can read more about Pandas' common aggregations in [the Pandas documentation](https://pandas.pydata.org/pandas-docs/stable/groupby.html#aggregation).\n",
"You can read more about Pandas' common aggregations in [the Pandas documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#aggregation).\n",
"\n"
]
},
Expand Down
18 changes: 9 additions & 9 deletions dataframes/03-from-pandas-to-dask.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -242,7 +242,7 @@
}
},
"source": [
"* Remember `Dask framework` is **lazy** thus in order to see the result we need to run [compute()](https://docs.dask.org/en/latest/dataframe-api.html#dask.dataframe.DataFrame.compute) \n",
"* Remember `Dask framework` is **lazy** thus in order to see the result we need to run [compute()](https://docs.dask.org/en/latest/generated/dask.dataframe.DataFrame.compute.html) \n",
" (or `head()` which runs under the hood compute()) )"
]
},
Expand Down Expand Up @@ -481,7 +481,7 @@
},
"source": [
"## Creating a `Dask dataframe` from `Pandas`\n",
"In order to utilize `Dask` capablities on an existing `Pandas dataframe` (pdf) we need to convert the `Pandas dataframe` into a `Dask dataframe` (ddf) with the [from_pandas](https://docs.dask.org/en/latest/dataframe-api.html#dask.dataframe.from_pandas) method. \n",
"In order to utilize `Dask` capablities on an existing `Pandas dataframe` (pdf) we need to convert the `Pandas dataframe` into a `Dask dataframe` (ddf) with the [from_pandas](https://docs.dask.org/en/latest/generated/dask.dataframe.from_pandas.html) method. \n",
"You must supply the number of partitions or chunksize that will be used to generate the dask dataframe"
]
},
Expand Down Expand Up @@ -1211,7 +1211,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"For more information see [dask mask documentation](https://docs.dask.org/en/latest/dataframe-api.html#dask.dataframe.DataFrame.mask)"
"For more information see [dask mask documentation](https://docs.dask.org/en/latest/generated/dask.dataframe.DataFrame.mask.html)"
]
},
{
Expand Down Expand Up @@ -1597,7 +1597,7 @@
"metadata": {},
"source": [
"### Map partitions\n",
"* We can supply an ad-hoc function to run on each partition using the [map_partitions](https://dask.readthedocs.io/en/latest/dataframe-api.html#dask.dataframe.DataFrame.map_partitions) method. \n",
"* We can supply an ad-hoc function to run on each partition using the [map_partitions](https://docs.dask.org/en/latest/generated/dask.dataframe.DataFrame.map_partitions.html) method. \n",
"Mainly useful for functions that are not implemented in `Dask` or `Pandas` . \n",
"* Finally we can return a new `dataframe` which needs to be described in the `meta` argument \n",
"The function could also include arguments."
Expand Down Expand Up @@ -2274,7 +2274,7 @@
"ddf = dd.read_csv('data/pd2dd/ddf*.csv', compression='gzip', header=False). \n",
"* However some are not available such as `nrows`.\n",
"\n",
"[see documentaion](https://docs.dask.org/en/latest/dataframe-api.html#dask.dataframe.DataFrame.to_csv) (including the option for output file naming)."
"[see documentaion](https://docs.dask.org/en/latest/generated/dask.dataframe.DataFrame.to_csv.html) (including the option for output file naming)."
]
},
{
Expand Down Expand Up @@ -2415,9 +2415,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"To find the number of partitions which will determine the number of output files use [dask.dataframe.npartitions](https://docs.dask.org/en/latest/dataframe-api.html#dask.dataframe.DataFrame.npartitions) \n",
"To find the number of partitions which will determine the number of output files use [dask.dataframe.npartitions](https://docs.dask.org/en/latest/generated/dask.dataframe.DataFrame.npartitions.html) \n",
"\n",
"To change the number of output files use [repartition](https://docs.dask.org/en/latest/dataframe-api.html#dask.dataframe.DataFrame.repartition) which is an expensive operation."
"To change the number of output files use [repartition](https://docs.dask.org/en/latest/generated/dask.dataframe.DataFrame.repartition.html) which is an expensive operation."
]
},
{
Expand Down Expand Up @@ -2527,7 +2527,7 @@
"source": [
" ## Consider using client.persist()\n",
" Since Dask is lazy - it may run the **entire** graph/DAG (again) even if it already run part of the calculation in a previous cell. Thus use [persist](https://docs.dask.org/en/latest/dataframe-best-practices.html?highlight=parquet#persist-intelligently) to keep the results in memory \n",
"Additional information can be read in this [stackoverflow issue](https://stackoverflow.com/questions/45941528/how-to-efficiently-send-a-large-numpy-array-to-the-cluster-with-dask-array/45941529#45941529) or see an exampel in [this post](http://matthewrocklin.com/blog/work/2017/01/12/dask-dataframes) \n",
"Additional information can be read in this [stackoverflow issue](https://stackoverflow.com/questions/45941528/how-to-efficiently-send-a-large-numpy-array-to-the-cluster-with-dask-array/45941529#45941529) or see an example in [this post](http://matthewrocklin.com/blog/work/2017/01/12/dask-dataframes) \n",
"This concept should also be used when running a code within a script (rather then a jupyter notebook) which incoperates loops within the code.\n"
]
},
Expand Down Expand Up @@ -2893,7 +2893,7 @@
"metadata": {},
"source": [
"We can do better... \n",
"Using [dask custom aggregation](https://docs.dask.org/en/latest/dataframe-api.html?highlight=dropna#dask.dataframe.groupby.Aggregation) is consideribly better"
"Using [dask custom aggregation](https://docs.dask.org/en/latest/generated/dask.dataframe.groupby.Aggregation.html) is consideribly better"
]
},
{
Expand Down
4 changes: 2 additions & 2 deletions dataframes/04-reading-messy-data-into-dataframes.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"source": [
"# DataFrames: Reading in messy data\n",
" \n",
"In the [01-data-access](./01-data-access.ipynb) example we show how Dask Dataframes can read and store data in many of the same formats as Pandas dataframes. One key difference, when using Dask Dataframes is that instead of opening a single file with a function like [pandas.read_csv](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html), we typically open many files at once with [dask.dataframe.read_csv](https://docs.dask.org/en/latest/dataframe-api.html#dask.dataframe.read_csv). This enables us to treat a collection of files as a single dataset. Most of the time this works really well. But real data is messy and in this notebook we will explore a more advanced technique to bring messy datasets into a dask dataframe."
"In the [01-data-access](./01-data-access.ipynb) example we show how Dask Dataframes can read and store data in many of the same formats as Pandas dataframes. One key difference, when using Dask Dataframes is that instead of opening a single file with a function like [pandas.read_csv](https://docs.dask.org/en/latest/generated/dask.dataframe.read_csv.html), we typically open many files at once with [dask.dataframe.read_csv](https://docs.dask.org/en/latest/dataframe-api.html#dask.dataframe.read_csv). This enables us to treat a collection of files as a single dataset. Most of the time this works really well. But real data is messy and in this notebook we will explore a more advanced technique to bring messy datasets into a dask dataframe."
]
},
{
Expand Down Expand Up @@ -408,7 +408,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Then we use [dask.dataframe.from_delayed](https://docs.dask.org/en/latest/dataframe-api.html#dask.dataframe.from_delayed). This function creates a Dask DataFrame from a list of delayed objects as long as each delayed object returns a pandas dataframe. The structure of each individual dataframe returned must also be the same."
"Then we use [dask.dataframe.from_delayed](https://docs.dask.org/en/latest/generated/dask.dataframe.from_delayed.html). This function creates a Dask DataFrame from a list of delayed objects as long as each delayed object returns a pandas dataframe. The structure of each individual dataframe returned must also be the same."
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion machine-learning/incremental.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
"\n",
"Although this example uses Scikit-Learn's SGDClassifer, the `Incremental` meta-estimator will work for any class that implements `partial_fit` and the [scikit-learn base estimator API].\n",
"\n",
"<img src=\"http://scikit-learn.org/stable/_static/scikit-learn-logo-small.png\"> <img src=\"https://www.continuum.io/sites/default/files/dask_stacked.png\" width=\"100px\">\n",
"<img src=\"http://scikit-learn.org/stable/_static/scikit-learn-logo-small.png\"> <img src=\"http://dask.readthedocs.io/en/latest/_images/dask_horizontal.svg\" width=\"200px\">\n",
"\n",
"[scikit-learn base estimator API]:http://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html\n",
"\n"
Expand Down
1 change: 0 additions & 1 deletion machine-learning/xgboost.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -272,7 +272,6 @@
"metadata": {},
"source": [
"## Learn more\n",
"* Similar example that uses DataFrames for a real world dataset: http://ml.dask.org/examples/xgboost.html\n",
"* Recorded screencast stepping through the real world example above:\n",
"* A blogpost on dask-xgboost http://matthewrocklin.com/blog/work/2017/03/28/dask-xgboost\n",
"* XGBoost documentation: https://xgboost.readthedocs.io/en/latest/python/python_intro.html#\n",
Expand Down
2 changes: 1 addition & 1 deletion surveys/2019.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,7 @@
"source": [
"Overall, documentation is still the leader across user user groups.\n",
"\n",
"The usage of the [Dask tutorial](https://github.com/dask/dask-tutorial) and the [dask examples](examples.dask.org) are relatively consistent across groups. The primary difference between regular and new users is that regular users are more likely to engage on GitHub.\n",
"The usage of the [Dask tutorial](https://github.com/dask/dask-tutorial) and the [dask examples](https://examples.dask.org) are relatively consistent across groups. The primary difference between regular and new users is that regular users are more likely to engage on GitHub.\n",
"\n",
"From StackOverflow questions and GitHub issues, we have a vague idea about which parts of the library are used.\n",
"The survey shows that (for our respondents at least) DataFrame and Delayed are the most commonly used APIs."
Expand Down