Skip to content

Commit fb97b11

Browse files
authored
Add support for serialise query response into Pandas Data Frame (#41)
1 parent 1e2fae1 commit fb97b11

17 files changed

+1185
-12
lines changed

CHANGELOG.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,8 @@
1+
## 1.1.0 [unreleased]
2+
3+
### Features
4+
1. [#29](https://github.com/influxdata/influxdb-client-python/issues/29): Added support for serialise response into Pandas DataFrame
5+
16
## 1.0.0 [2019-11-11]
27

38
### Features

README.rst

Lines changed: 73 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ InfluxDB 2.0 client features
4343

4444
- Querying data
4545
- using the Flux language
46-
- into csv, raw data, `flux_table <https://github.com/influxdata/influxdb-client-python/blob/master/influxdb_client/client/flux_table.py#L5>`_ structure
46+
- into csv, raw data, `flux_table <https://github.com/influxdata/influxdb-client-python/blob/master/influxdb_client/client/flux_table.py#L5>`_ structure, `Pandas DataFrame <https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html>`_
4747
- `How to queries <#queries>`_
4848
- Writing data using
4949
- `Line Protocol <https://docs.influxdata.com/influxdb/v1.6/write_protocols/line_protocol_tutorial>`_
@@ -65,6 +65,7 @@ InfluxDB 2.0 client features
6565
- `Connect to InfluxDB Cloud`_
6666
- `How to efficiently import large dataset`_
6767
- `Efficiency write data from IOT sensor`_
68+
- `How to use Jupyter + Pandas + InfluxDB 2`_
6869

6970
Installation
7071
------------
@@ -300,6 +301,7 @@ The result retrieved by `QueryApi <https://github.com/influxdata/influxdb-client
300301
1. Flux data structure: `FluxTable <https://github.com/influxdata/influxdb-client-python/blob/master/influxdb_client/client/flux_table.py#L5>`_, `FluxColumn <https://github.com/influxdata/influxdb-client-python/blob/master/influxdb_client/client/flux_table.py#L22>`_ and `FluxRecord <https://github.com/influxdata/influxdb-client-python/blob/master/influxdb_client/client/flux_table.py#L31>`_
301302
2. `csv.reader <https://docs.python.org/3.4/library/csv.html#reader-objects>`__ which will iterate over CSV lines
302303
3. Raw unprocessed results as a ``str`` iterator
304+
4. `Pandas DataFrame <https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html>`_
303305

304306
The API also support streaming ``FluxRecord`` via `query_stream <https://github.com/influxdata/influxdb-client-python/blob/master/influxdb_client/client/query_api.py#L77>`_, see example below:
305307

@@ -372,6 +374,57 @@ The API also support streaming ``FluxRecord`` via `query_stream <https://github.
372374
"""
373375
client.__del__()
374376
377+
Pandas DataFrame
378+
""""""""""""""""
379+
.. marker-pandas-start
380+
381+
.. note:: Note that if a query returns more then one table then the client generates a ``DataFrame`` for each of them.
382+
383+
The ``client`` is able to retrieve data in `Pandas DataFrame <https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html>`_ format thought ``query_data_frame``:
384+
385+
.. code-block:: python
386+
387+
from influxdb_client import InfluxDBClient, Point, Dialect
388+
from influxdb_client.client.write_api import SYNCHRONOUS
389+
390+
client = InfluxDBClient(url="http://localhost:9999", token="my-token", org="my-org")
391+
392+
write_api = client.write_api(write_options=SYNCHRONOUS)
393+
query_api = client.query_api()
394+
395+
"""
396+
Prepare data
397+
"""
398+
399+
_point1 = Point("my_measurement").tag("location", "Prague").field("temperature", 25.3)
400+
_point2 = Point("my_measurement").tag("location", "New York").field("temperature", 24.3)
401+
402+
write_api.write(bucket="my-bucket", org="my-org", record=[_point1, _point2])
403+
404+
"""
405+
Query: using Pandas DataFrame
406+
"""
407+
data_frame = query_api.query_data_frame('from(bucket:"my-bucket") '
408+
'|> range(start: -10m) '
409+
'|> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value") '
410+
'|> keep(columns: ["location", "temperature"])')
411+
print(data_frame.to_string())
412+
413+
"""
414+
Close client
415+
"""
416+
client.__del__()
417+
418+
Output:
419+
420+
.. code-block::
421+
422+
result table location temperature
423+
0 _result 0 New York 24.3
424+
1 _result 1 Prague 25.3
425+
426+
.. marker-pandas-end
427+
375428
Examples
376429
^^^^^^^^
377430

@@ -560,7 +613,7 @@ Efficiency write data from IOT sensor
560613
.. marker-iot-end
561614
562615
Connect to InfluxDB Cloud
563-
^^^^^^^^^^^^^^^^^^^^^^^^^
616+
"""""""""""""""""""""""""
564617
The following example demonstrate a simplest way how to write and query date with the InfluxDB Cloud.
565618

566619
At first point you should create an authentication token as is described `here <https://v2.docs.influxdata.com/v2.0/security/tokens/create-token/>`_.
@@ -634,7 +687,24 @@ The last step is run a python script via: ``python3 influx_cloud.py``.
634687
finally:
635688
client.close()
636689
637-
.. marker-iot-end
690+
How to use Jupyter + Pandas + InfluxDB 2
691+
""""""""""""""""""""""""""""""""""""""""
692+
The first example shows how to use client capabilities to predict stock price via `Keras <https://keras.io>`_, `TensorFlow <https://www.tensorflow.org>`_, `sklearn <https://scikit-learn.org/stable/>`_:
693+
694+
* sources - `stock-predictions.ipynb <notebooks/stock-predictions.ipynb>`_
695+
696+
.. image:: docs/images/stock-price-prediction.gif
697+
698+
Result:
699+
700+
.. image:: docs/images/stock-price-prediction-results.png
701+
702+
The second example shows how to use client capabilities to realtime visualization via `hvPlot <https://hvplot.pyviz.org>`_, `Streamz <https://streamz.readthedocs.io/en/latest/>`_, `RxPY <https://rxpy.readthedocs.io/en/latest/>`_:
703+
704+
* sources - `realtime-stream.ipynb <notebooks/realtime-stream.ipynb>`_
705+
706+
.. image:: docs/images/realtime-result.gif
707+
638708

639709
Advanced Usage
640710
--------------

docs/api.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,3 +55,8 @@ TasksApi
5555

5656
.. autoclass:: influxdb_client.domain.Task
5757
:members:
58+
59+
DeleteApi
60+
""""""""
61+
.. autoclass:: influxdb_client.DeleteApi
62+
:members:

docs/images/realtime-result.gif

600 KB
Loading
23.4 KB
Loading
958 KB
Loading

docs/usage.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,12 @@ Query
1010
:start-after: marker-query-start
1111
:end-before: marker-query-end
1212

13+
Pandas DataFrame
14+
^^^^^^^^^^^^^^^^
15+
.. include:: ../README.rst
16+
:start-after: marker-pandas-start
17+
:end-before: marker-pandas-end
18+
1319
Write
1420
^^^^^
1521
.. include:: ../README.rst

examples/query.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,18 @@
6060
if not len(csv_line) == 0:
6161
print(f'Temperature in {csv_line[9]} is {csv_line[6]}')
6262

63+
print()
64+
print()
65+
66+
"""
67+
Query: using Pandas DataFrame
68+
"""
69+
data_frame = query_api.query_data_frame('from(bucket:"my-bucket") '
70+
'|> range(start: -10m) '
71+
'|> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value") '
72+
'|> keep(columns: ["location", "temperature"])')
73+
print(data_frame.to_string())
74+
6375
"""
6476
Close client
6577
"""

influxdb_client/client/flux_csv_parser.py

Lines changed: 41 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,11 @@
11
import base64
22
import codecs
33
import csv as csv_parser
4+
from enum import Enum
5+
from typing import List
46

57
import ciso8601
8+
from pandas import DataFrame
69
from urllib3 import HTTPResponse
710

811
from influxdb_client.client.flux_table import FluxTable, FluxColumn, FluxRecord
@@ -18,12 +21,20 @@ class FluxCsvParserException(Exception):
1821
pass
1922

2023

24+
class FluxSerializationMode(Enum):
25+
tables = 1
26+
stream = 2
27+
dataFrame = 3
28+
29+
2130
class FluxCsvParser(object):
2231

23-
def __init__(self, response: HTTPResponse, stream: bool) -> None:
32+
def __init__(self, response: HTTPResponse, serialization_mode: FluxSerializationMode,
33+
data_frame_index: List[str] = None) -> None:
2434
self._response = response
2535
self.tables = []
26-
self._stream = stream
36+
self._serialization_mode = serialization_mode
37+
self._data_frame_index = data_frame_index
2738
pass
2839

2940
def __enter__(self):
@@ -64,6 +75,11 @@ def _parse_flux_response(self):
6475
token = csv[0]
6576
# start new table
6677
if "#datatype" == token:
78+
79+
# Return already parsed DataFrame
80+
if (self._serialization_mode is FluxSerializationMode.dataFrame) & hasattr(self, '_data_frame'):
81+
yield self._prepare_data_frame()
82+
6783
start_new_table = True
6884
table = FluxTable()
6985
self._insert_table(table, table_index)
@@ -86,6 +102,12 @@ def _parse_flux_response(self):
86102
if start_new_table:
87103
self.add_column_names_and_tags(table, csv)
88104
start_new_table = False
105+
# Create DataFrame with default values
106+
if self._serialization_mode is FluxSerializationMode.dataFrame:
107+
self._data_frame = DataFrame(data=[], columns=[], index=None)
108+
for column in table.columns:
109+
self._data_frame[column.label] = column.default_value
110+
pass
89111
continue
90112

91113
# to int converions todo
@@ -101,14 +123,28 @@ def _parse_flux_response(self):
101123

102124
flux_record = self.parse_record(table_index - 1, table, csv)
103125

104-
if not self._stream:
126+
if self._serialization_mode is FluxSerializationMode.tables:
105127
self.tables[table_index - 1].records.append(flux_record)
106128

107-
yield flux_record
129+
if self._serialization_mode is FluxSerializationMode.stream:
130+
yield flux_record
131+
132+
if self._serialization_mode is FluxSerializationMode.dataFrame:
133+
self._data_frame.loc[len(self._data_frame.index)] = flux_record.values
134+
pass
108135

109136
# debug
110137
# print(flux_record)
111138

139+
# Return latest DataFrame
140+
if (self._serialization_mode is FluxSerializationMode.dataFrame) & hasattr(self, '_data_frame'):
141+
yield self._prepare_data_frame()
142+
143+
def _prepare_data_frame(self):
144+
if self._data_frame_index:
145+
self._data_frame = self._data_frame.set_index(self._data_frame_index)
146+
return self._data_frame
147+
112148
def parse_record(self, table_index, table, csv):
113149
record = FluxRecord(table_index)
114150

@@ -180,5 +216,5 @@ def add_column_names_and_tags(table, csv):
180216
i += 1
181217

182218
def _insert_table(self, table, table_index):
183-
if not self._stream:
219+
if self._serialization_mode is FluxSerializationMode.tables:
184220
self.tables.insert(table_index, table)

influxdb_client/client/query_api.py

Lines changed: 32 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,11 @@
22
import csv
33
from typing import List, Generator, Any
44

5+
from pandas import DataFrame
6+
57
from influxdb_client import Dialect
68
from influxdb_client import Query, QueryService
7-
from influxdb_client.client.flux_csv_parser import FluxCsvParser
9+
from influxdb_client.client.flux_csv_parser import FluxCsvParser, FluxSerializationMode
810
from influxdb_client.client.flux_table import FluxTable, FluxRecord
911

1012

@@ -68,7 +70,7 @@ def query(self, query: str, org=None) -> List['FluxTable']:
6870
response = self._query_api.post_query(org=org, query=self._create_query(query, self.default_dialect),
6971
async_req=False, _preload_content=False, _return_http_data_only=False)
7072

71-
_parser = FluxCsvParser(response=response, stream=False)
73+
_parser = FluxCsvParser(response=response, serialization_mode=FluxSerializationMode.tables)
7274

7375
list(_parser.generator())
7476

@@ -88,10 +90,37 @@ def query_stream(self, query: str, org=None) -> Generator['FluxRecord', Any, Non
8890
response = self._query_api.post_query(org=org, query=self._create_query(query, self.default_dialect),
8991
async_req=False, _preload_content=False, _return_http_data_only=False)
9092

91-
_parser = FluxCsvParser(response=response, stream=True)
93+
_parser = FluxCsvParser(response=response, serialization_mode=FluxSerializationMode.stream)
9294

9395
return _parser.generator()
9496

97+
def query_data_frame(self, query: str, org=None, data_frame_index: List[str] = None):
98+
"""
99+
Synchronously executes the Flux query and return Pandas DataFrame.
100+
Note that if a query returns more then one table than the client generates a DataFrame for each of them.
101+
102+
:param query: the Flux query
103+
:param org: organization name (optional if already specified in InfluxDBClient)
104+
:param data_frame_index: the list of columns that are used as DataFrame index
105+
:return:
106+
"""
107+
if org is None:
108+
org = self._influxdb_client.org
109+
110+
response = self._query_api.post_query(org=org, query=self._create_query(query, self.default_dialect),
111+
async_req=False, _preload_content=False, _return_http_data_only=False)
112+
113+
_parser = FluxCsvParser(response=response, serialization_mode=FluxSerializationMode.dataFrame,
114+
data_frame_index=data_frame_index)
115+
_dataFrames = list(_parser.generator())
116+
117+
if len(_dataFrames) == 0:
118+
return DataFrame(columns=[], index=None)
119+
elif len(_dataFrames) == 1:
120+
return _dataFrames[0]
121+
else:
122+
return _dataFrames
123+
95124
# private helper for c
96125
@staticmethod
97126
def _create_query(query, dialect=default_dialect):

0 commit comments

Comments
 (0)