diff --git a/README.md b/README.md index 0d388da3..59114fbf 100644 --- a/README.md +++ b/README.md @@ -17,7 +17,7 @@ siuba ([小巴](http://www.cantonese.sheik.co.uk/dictionary/words/9139/)) is a p * `summarize()` - reduce one or more columns down to a single number. * `arrange()` - reorder the rows of data. -These actions can be preceeded by a `group_by()`, which causes them to be applied individually to grouped rows of data. Moreover, many SQL concepts, such as `distinct()`, `count()`, and joins are implemented. +These actions can be preceded by a `group_by()`, which causes them to be applied individually to grouped rows of data. Moreover, many SQL concepts, such as `distinct()`, `count()`, and joins are implemented. Inputs to these functions can be a pandas `DataFrame` or SQL connection (currently postgres, redshift, or sqlite). For more on the rationale behind tools like dplyr, see this [tidyverse paper](https://tidyverse.tidyverse.org/articles/paper.html). diff --git a/docs/api_table_core/01_filter.Rmd b/docs/api_table_core/01_filter.Rmd index 7ceb3ee1..2ad3a17b 100644 --- a/docs/api_table_core/01_filter.Rmd +++ b/docs/api_table_core/01_filter.Rmd @@ -64,7 +64,7 @@ Otherwise, python will group the operation like `_.cyl == (4 | _.gear) == 5`. ### Dropping NAs -As with most subsetting in pandas, when a condition evalutes to an `NA` value, the row is automatically excluded. This is different from pandas indexing, where `NA` values produce errors. +As with most subsetting in pandas, when a condition evaluates to an `NA` value, the row is automatically excluded. This is different from pandas indexing, where `NA` values produce errors. ```{python} df = pd.DataFrame({ diff --git a/docs/api_table_core/07_summarize.Rmd b/docs/api_table_core/07_summarize.Rmd index 1ba446e4..0302c822 100644 --- a/docs/api_table_core/07_summarize.Rmd +++ b/docs/api_table_core/07_summarize.Rmd @@ -36,7 +36,7 @@ mtcars >> summarize(avg_mpg = _.mpg.mean()) ### Summarizing per group -When you use summarize with a grouped DataFrame, the result has the same number of rows as there are groups in the data. For example, there are 3 values of cylinders (`cyl`) a row can have (4, 6, or 8), so ther result will be 3 rows. +When you use summarize with a grouped DataFrame, the result has the same number of rows as there are groups in the data. For example, there are 3 values of cylinders (`cyl`) a row can have (4, 6, or 8), so the result will be 3 rows. ```{python} (mtcars diff --git a/docs/developer/sql-translators.ipynb b/docs/developer/sql-translators.ipynb index c058d4e0..52e0fc26 100644 --- a/docs/developer/sql-translators.ipynb +++ b/docs/developer/sql-translators.ipynb @@ -26,7 +26,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Using sqlalchemy select statment for convenience\n", + "### Using sqlalchemy select statement for convenience\n", "\n", "Throughout this vignette, we'll use a select statement object from sqlalchemy,\n", "so we can conveniently access its columns as needed." diff --git a/docs/guide_analysis.Rmd b/docs/guide_analysis.Rmd index 62aa8d4c..f1ce68cb 100644 --- a/docs/guide_analysis.Rmd +++ b/docs/guide_analysis.Rmd @@ -270,4 +270,4 @@ Select works okay, now let's uncomment the next line. ) ``` -We found our bug! Note that when working with SQL, siuba prints out the name of the verb where the error occured. This is very useful, and will be added to working with pandas in the future! +We found our bug! Note that when working with SQL, siuba prints out the name of the verb where the error occurred. This is very useful, and will be added to working with pandas in the future! diff --git a/docs/key_features.ipynb b/docs/key_features.ipynb index 60289ace..a2708a71 100644 --- a/docs/key_features.ipynb +++ b/docs/key_features.ipynb @@ -703,7 +703,7 @@ ")\n", " \n", " \n", - "
\n",
+    "      
\n",
     "mtcars.assign(\n",
     "  res = lambda d: d.hp - d.hp.mean()\n",
     ")
\n", @@ -721,7 +721,7 @@ ")
\n", " \n", " \n", - "
\n",
+    "      
\n",
     "mtcars.assign(\n",
     "  res = mtcars.hp - g_cyl.hp.transform(\"mean\")\n",
     ")
\n", diff --git a/examples/architecture/004-user-defined-functions.ipynb b/examples/architecture/004-user-defined-functions.ipynb index 23dd5350..8f90e6f2 100644 --- a/examples/architecture/004-user-defined-functions.ipynb +++ b/examples/architecture/004-user-defined-functions.ipynb @@ -36,7 +36,7 @@ "\n", "This is the tyranny of methods. The object defining the method owns the method. To add or modify a method, you need to modify the class behind the object.\n", "\n", - "Now, this isn't totally true--the class could provide a way for you to register your method (like accessors). But wouldn't it be nice if the actions we wanted to perform on data didn't have to check in with the data class itself? Why does the data class get to decide what we do with it, and why does it get priviledged methods?\n", + "Now, this isn't totally true--the class could provide a way for you to register your method (like accessors). But wouldn't it be nice if the actions we wanted to perform on data didn't have to check in with the data class itself? Why does the data class get to decide what we do with it, and why does it get privileged methods?\n", "\n", "### Enter singledispatch\n", "\n", @@ -82,7 +82,7 @@ "\n", "This concept is incredibly powerful for two reasons...\n", "\n", - "* many people can define actions over a DataFrame, without a quorum of priviledged methods.\n", + "* many people can define actions over a DataFrame, without a quorum of privileged methods.\n", "* you can use normal importing, so don't have to worry about name conflicts\n", "\n" ] diff --git a/examples/architecture/006-autocompletion.ipynb b/examples/architecture/006-autocompletion.ipynb index df2cb220..e8d861d3 100644 --- a/examples/architecture/006-autocompletion.ipynb +++ b/examples/architecture/006-autocompletion.ipynb @@ -293,7 +293,7 @@ "\n", "Essentially, our challenge is figuring how where autocomplete could fit in. Just to set the stage, the IPython IPCompleter uses some of its own useful completion strategies, but the bulk of where we benefit comes from its use of the library jedi.\n", "\n", - "In the sections below, I'll first give a quick preview of how jedi works, followed by two sequence diagrams of how it's intergrated into the ipykernel." + "In the sections below, I'll first give a quick preview of how jedi works, followed by two sequence diagrams of how it's integrated into the ipykernel." ] }, { diff --git a/siuba/dply/verbs.py b/siuba/dply/verbs.py index e7583f81..ffc2cdba 100644 --- a/siuba/dply/verbs.py +++ b/siuba/dply/verbs.py @@ -415,8 +415,8 @@ def __getitem__(self, x): def var_slice(colnames, x): - """Return indices in colnames correspnding to start and stop of slice.""" - # TODO: produces bahavior similar to df.loc[:, "V1":"V3"], but can reverse + """Return indices in colnames corresponding to start and stop of slice.""" + # TODO: produces behavior similar to df.loc[:, "V1":"V3"], but can reverse # TODO: make DRY # TODO: reverse not including end points if isinstance(x.start, Var): @@ -1345,7 +1345,7 @@ def unite(__data, col, *args, sep = "_", remove = True): __data: a DataFrame col: name of the to-be-created column (string). *args: names of each column to combine. - sep: separater joining each column being combined. + sep: separator joining each column being combined. remove: whether to remove the combined columns from the returned DataFrame. """ diff --git a/siuba/experimental/pivot/__init__.py b/siuba/experimental/pivot/__init__.py index a85c412a..38ac1f47 100644 --- a/siuba/experimental/pivot/__init__.py +++ b/siuba/experimental/pivot/__init__.py @@ -68,7 +68,7 @@ def pivot_longer( if not np.all(split_lengths == split_lengths[0]): raise ValueError( - "Splitting by {} leads to unequal lenghts ({}).".format( + "Splitting by {} leads to unequal lengths ({}).".format( names_sep if names_sep is not None else names_pattern ) ) diff --git a/siuba/siu/dispatchers.py b/siuba/siu/dispatchers.py index 88d0e040..cccf68b6 100644 --- a/siuba/siu/dispatchers.py +++ b/siuba/siu/dispatchers.py @@ -237,7 +237,7 @@ def __rrshift__(self, x): This function handles two cases: * callable >> pipe -> pipe - * otherewise, evaluate the pipe + * otherwise, evaluate the pipe """ if isinstance(x, (Symbolic, Call)): diff --git a/siuba/sql/__init__.py b/siuba/sql/__init__.py index 61eb2f4f..2bd6e3aa 100644 --- a/siuba/sql/__init__.py +++ b/siuba/sql/__init__.py @@ -1,7 +1,7 @@ from .verbs import LazyTbl, sql_raw from .translate import SqlColumn, SqlColumnAgg, SqlFunctionLookupError -# preceed w/ underscore so it isn't exported by default +# proceed w/ underscore so it isn't exported by default # we just want to register the singledispatch funcs from .dply import vector as _vector from .dply import string as _string diff --git a/siuba/sql/verbs.py b/siuba/sql/verbs.py index 60af12c5..81295a2e 100644 --- a/siuba/sql/verbs.py +++ b/siuba/sql/verbs.py @@ -454,7 +454,7 @@ def _show_query(tbl, simplify = False): if simplify: - # try to strip table names and labels where uneccessary + # try to strip table names and labels where unnecessary with use_simple_names(): print(compile_query()) else: diff --git a/siuba/tests/test_dply_series_methods.py b/siuba/tests/test_dply_series_methods.py index a9127020..11a4c544 100644 --- a/siuba/tests/test_dply_series_methods.py +++ b/siuba/tests/test_dply_series_methods.py @@ -146,10 +146,10 @@ def do_test_missing_implementation(entry, backend): #if get_spec_no_mutate(entry, backend): # pytest.skip("Spec'd failure") - ## case: Needs to be implmented + ## case: Needs to be implemented ## TODO(table): uses xfail #if backend_status == "todo": - # pytest.xfail("TODO: impelement this translation") + # pytest.xfail("TODO: implement this translation") # ## case: Can't be used in a mutate (e.g. a SQL ordered set aggregate function) ## TODO(table): no_mutate