- Load DataFrame with
to_gbq
to a table in a project different from the API client project. Specify the target table ID asproject.dataset.table
to use this feature. (:issue:`321`, :issue:`347`)
- Avoid 403 error from
to_gbq
when table haspolicyTags
. (:issue:`354`)
- Drop support for Python 3.5 and 3.6. (:issue:`337`)
- Drop support for google-cloud-bigquery==2.4.* due to query hanging bug. (:issue:`343`)
- Use
object
dtype forTIME
columns. (:issue:`328`) - Encode floating point values with greater precision. (:issue:`326`)
- Support
INT64
and other standard SQL aliases in :func:`~pandas_gbq.to_gbq`table_schema
argument. (:issue:`322`)
- Add
dtypes
argument toread_gbq
. Use this argument to override the defaultdtype
for a particular column in the query results. For example, this can be used to select nullable integer columns as theInt64
nullable integer pandas extension type. (:issue:`242`, :issue:`332`)
df = gbq.read_gbq(
"SELECT CAST(NULL AS INT64) AS null_integer",
dtypes={"null_integer": "Int64"},
)
- Support
google-cloud-bigquery-storage
2.0 and higher. (:issue:`329`) - Update the minimum version of
pandas
to 0.20.1. (:issue:`331`)
- Update tests to run against Python 3.8. (:issue:`331`)
- Include needed "extras" from
google-cloud-bigquery
package as dependencies. Exclude incompatible 2.0 version. (:issue:`324`, :issue:`329`)
- Fix
Provided Schema does not match Table
error when the existing table contains required fields. (:issue:`315`)
- Fix
AttributeError
with BQ Storage API to download empty results. (:issue:`299`)
- Raise
NotImplementedError
when the deprecatedprivate_key
argument is used. (:issue:`301`)
- Add
max_results
argument to :func:`~pandas_gbq.read_gbq()`. Use this argument to limit the number of rows in the results DataFrame. Setmax_results
to 0 to ignore query outputs, such as for DML or DDL queries. (:issue:`102`) - Add
progress_bar_type
argument to :func:`~pandas_gbq.read_gbq()`. Use this argument to display a progress bar when downloading data. (:issue:`182`)
- Update the minimum version of
google-cloud-bigquery
to 1.11.1. (:issue:`296`)
- Add code samples to introduction and refactor howto guides. (:issue:`239`)
- Breaking Change: Python 2 support has been dropped. This is to align with the pandas package which dropped Python 2 support at the end of 2019. (:issue:`268`)
- Ensure
table_schema
argument is not modified inplace. (:issue:`278`)
- Use object dtype for
STRING
,ARRAY
, andSTRUCT
columns when there are zero rows. (:issue:`285`)
- Populate
user-agent
withpandas
version information. (:issue:`281`) - Fix
pytest.raises
usage for latest pytest. Fix warnings in tests. (:issue:`282`) - Update CI to install nightly packages in the conda tests. (:issue:`254`)
- Breaking Change: Default SQL dialect is now
standard
. Use :attr:`pandas_gbq.context.dialect` to override the default value. (:issue:`195`, :issue:`245`)
- Document :ref:`BigQuery data type to pandas dtype conversion
<reading-dtypes>` for
read_gbq
. (:issue:`269`)
- Update the minimum version of
google-cloud-bigquery
to 1.9.0. (:issue:`247`) - Update the minimum version of
pandas
to 0.19.0. (:issue:`262`)
- Update the authentication credentials. Note: You may need to set
reauth=True
in order to update your credentials to the most recent version. This is required to use new functionality such as the BigQuery Storage API. (:issue:`267`) - Use
to_dataframe()
fromgoogle-cloud-bigquery
in theread_gbq()
function. (:issue:`247`)
- Fix a bug where pandas-gbq could not upload an empty DataFrame. (:issue:`237`)
- Allow
table_schema
in :func:`to_gbq` to contain only a subset of columns, with the rest being populated using the DataFrame dtypes (:issue:`218`) (contributed by @johnpaton) - Read
project_id
in :func:`to_gbq` from providedcredentials
if available (contributed by @daureg) read_gbq
uses the timezone-awareDatetimeTZDtype(unit='ns', tz='UTC')
dtype for BigQueryTIMESTAMP
columns. (:issue:`269`)- Add
use_bqstorage_api
to :func:`read_gbq`. The BigQuery Storage API can be used to download large query results (>125 MB) more quickly. If the BQ Storage API can't be used, the BigQuery API is used instead. (:issue:`133`, :issue:`270`)
- Warn when deprecated
private_key
parameter is used (:issue:`240`) - New dependency Use the
pydata-google-auth
package for authentication. (:issue:`241`)
- Deprecate
private_key
parameter to :func:`pandas_gbq.read_gbq` and :func:`pandas_gbq.to_gbq` in favor of newcredentials
argument. Instead, create a credentials object using :func:`google.oauth2.service_account.Credentials.from_service_account_info` or :func:`google.oauth2.service_account.Credentials.from_service_account_file`. See the :doc:`authentication how-to guide <howto/authentication>` for examples. (:issue:`161`, :issue:`231`)
- Allow newlines in data passed to
to_gbq
. (:issue:`180`) - Add :attr:`pandas_gbq.context.dialect` to allow overriding the default SQL syntax dialect. (:issue:`195`, :issue:`235`)
- Support Python 3.7. (:issue:`197`, :issue:`232`)
- Migrate tests to CircleCI. (:issue:`228`, :issue:`232`)
- int columns which contain NULL are now cast to float, rather than object type. (:issue:`174`)
- DATE, DATETIME and TIMESTAMP columns are now parsed as pandas' timestamp objects (:issue:`224`)
- Add :class:`pandas_gbq.Context` to cache credentials in-memory, across
calls to
read_gbq
andto_gbq
. (:issue:`198`, :issue:`208`) - Fast queries now do not log above
DEBUG
level. (:issue:`204`) With BigQuery's release of clustering querying smaller samples of data is now faster and cheaper. - Don't load credentials from disk if reauth is
True
. (:issue:`212`) This fixes a bug where pandas-gbq could not refresh credentials if the cached credentials were invalid, revoked, or expired, even whenreauth=True
. - Catch RefreshError when trying credentials. (:issue:`226`)
- Avoid listing datasets and tables in system tests. (:issue:`215`)
- Improved performance from eliminating some duplicative parsing steps (:issue:`224`)
- Improved
read_gbq
performance and memory consumption by delegatingDataFrame
construction to the Pandas library, radically reducing the number of loops that execute in python (:issue:`128`) - Reduced verbosity of logging from
read_gbq
, particularly for short queries. (:issue:`201`) - Avoid
SELECT 1
query when runningto_gbq
. (:issue:`202`)
- Warn when
dialect
is not passed in toread_gbq
. The default dialect will be changing from 'legacy' to 'standard' in a future version. (:issue:`195`) - Use general float with 15 decimal digit precision when writing to local
CSV buffer in
to_gbq
. This prevents numerical overflow in certain edge cases. (:issue:`192`)
- Project ID parameter is optional in
read_gbq
andto_gbq
when it can inferred from the environment. Note: you must still pass in a project ID when using user-based authentication. (:issue:`103`) - Progress bar added for
to_gbq
, through an optional library tqdm as dependency. (:issue:`162`) - Add location parameter to
read_gbq
andto_gbq
so that pandas-gbq can work with datasets in the Tokyo region. (:issue:`177`)
- Add :doc:`authentication how-to guide <howto/authentication>`. (:issue:`183`)
- Update :doc:`contributing` guide with new paths to tests. (:issue:`154`, :issue:`164`)
- Tests now use nox to run in multiple Python environments. (:issue:`52`)
- Renamed internal modules. (:issue:`154`)
- Refactored auth to an internal auth module. (:issue:`176`)
- Add unit tests for
get_credentials()
. (:issue:`184`)
- Only show
verbose
deprecation warning if Pandas version does not populate it. (:issue:`157`)
- Fix bug in read_gbq when building a dataframe with integer columns on Windows. Explicitly use 64bit integers when converting from BQ types. (:issue:`119`)
- Fix bug in read_gbq when querying for an array of floats (:issue:`123`)
- Fix bug in read_gbq with configuration argument. Updates read_gbq to
account for breaking change in the way
google-cloud-python
version 0.32.0+ handles query configuration API representation. (:issue:`152`) - Fix bug in to_gbq where seconds were discarded in timestamp columns. (:issue:`148`)
- Fix bug in to_gbq when supplying a user-defined schema (:issue:`150`)
- Deprecate the
verbose
parameter in read_gbq and to_gbq. Messages use the logging module instead of printing progress directly to standard output. (:issue:`12`)
- Fix an issue where Unicode couldn't be uploaded in Python 2 (:issue:`106`)
- Add support for a passed schema in :func:
to_gbq
instead inferring the schema from the passedDataFrame
withDataFrame.dtypes
(:issue:`46`) - Fix an issue where a dataframe containing both integer and floating point columns could not be uploaded with
to_gbq
(:issue:`116`) to_gbq
now usesto_csv
to avoid manually looping over rows in a dataframe (should result in faster table uploads) (:issue:`96`)
- Use the google-cloud-bigquery library for API calls. The
google-cloud-bigquery
package is a new dependency, and dependencies ongoogle-api-python-client
andhttplib2
are removed. See the installation guide for more details. (:issue:`93`) - Structs and arrays are now named properly (:issue:`23`) and BigQuery functions like
array_agg
no longer run into errors during type conversion (:issue:`22`). - :func:`to_gbq` now uses a load job instead of the streaming API. Remove
StreamingInsertError
class, as it is no longer used by :func:`to_gbq`. (:issue:`7`, :issue:`75`)
- :func:`read_gbq` now raises
QueryTimeout
if the request exceeds thequery.timeoutMs
value specified in the BigQuery configuration. (:issue:`76`) - Environment variable
PANDAS_GBQ_CREDENTIALS_FILE
can now be used to override the default location where the BigQuery user account credentials are stored. (:issue:`86`) - BigQuery user account credentials are now stored in an application-specific hidden user folder on the operating system. (:issue:`41`)
- Drop support for Python 3.4 (:issue:`40`)
- The dataframe passed to
`.to_gbq(...., if_exists='append')`
needs to contain only a subset of the fields in the BigQuery schema. (:issue:`24`) - Use the google-auth library for authentication because
oauth2client
is deprecated. (:issue:`39`) - :func:`read_gbq` now has a
auth_local_webserver
boolean argument for controlling whether to use web server or console flow when getting user credentials. Replaces --noauth_local_webserver command line argument. (:issue:`35`) - :func:`read_gbq` now displays the BigQuery Job ID and standard price in verbose output. (:issue:`70` and :issue:`71`)
- All gbq errors will simply be subclasses of
ValueError
and no longer inherit from the deprecatedPandasError
.
InvalidIndexColumn
will be raised instead ofInvalidColumnOrder
in :func:`read_gbq` when the index column specified does not exist in the BigQuery schema. (:issue:`6`)
- Bug with appending to a BigQuery table where fields have modes (NULLABLE,REQUIRED,REPEATED) specified. These modes were compared versus the remote schema and writing a table via :func:`to_gbq` would previously raise. (:issue:`13`)
Initial release of transfered code from pandas
Includes patches since the 0.19.2 release on pandas with the following:
- :func:`read_gbq` now allows query configuration preferences pandas-GH#14742
- :func:`read_gbq` now stores
INTEGER
columns asdtype=object
if they containNULL
values. Otherwise they are stored asint64
. This prevents precision lost for integers greather than 2**53. FurthermoreFLOAT
columns with values above 10**4 are no longer casted toint64
which also caused precision loss pandas-GH#14064, and pandas-GH#14305