Skip to content
This repository has been archived by the owner on Oct 11, 2023. It is now read-only.

Data Model

Hector Garcia Tellado edited this page Oct 31, 2017 · 1 revision

Click to open

Storage backend

There are two main use cases for the storage:

  1. User data manually generated by the User Interface
  2. Streams generated by IoT Devices, Operations, UX

The two types of data might be stored in different storage services. In particular, the first type is managed via the Storage Adapter API.

Storage details

The storage is organized in tables, partitioned and indexed differently, depending on the access patterns. Some data like images and deployment artifacts, are hosted in blobs.

All tables have a partition key used to distribute data into multiple servers, and a primary key used to identify single records. Other indexes are available to support additional access patterns. Small tables store records in one partition.

[Configuration] UI Config table

A small non-partitioned storage.

Might be a single record, or a blob, with PCS name + PCS logo etc.

The storage is accessed via the Storage Adapter API.

[Configuration] Device Groups table

A small non-partitioned table, indexed by Group ID.

The storage is accessed via the Storage Adapter API.

[Telemetry] Monitoring Rules table

A small non-partitioned table, indexed by Rule ID.

The storage is accessed via the Storage Adapter API.

[Telemetry] Device Telemetry Records table

A storage designed to contain billions of records about millions of devices, with long retention. Records are never updated.

The table is partitioned into a fixed number of partitions (e.g. 16) to allow queries fetching telemetry for multiple devices at once, and indexed by timestamps.

Alternative partitioning methods not used:

  • Partition by Device ID: when creating a graph showing data about multiple devices, a client would have to run one query per device
  • Partition by time: reads and writes would always hit one hot partition

All queries must specify a partition ID, which is calculated from the Device ID. For instance, it's possible to use one query to fetch data about multiple devices, as long as the devices have the same partition ID.

The storage is accessed directly by the Streaming Service and the Telemetry API.

[Telemetry] Annotations table

A storage designed to contain millions of records, with long retention. Records can be updated.

The table is partitioned into a fixed number of partitions (e.g. 16) to allow queries fetching telemetry for multiple devices at once, and indexed by timestamps.

Alternative partitioning methods not used:

  • Partition by Device ID: when creating a graph showing data about multiple devices, a client would have to run one query per device
  • Partition by time: reads and writes would always hit one hot partition

All queries must specify a partition ID, which is calculated from the Device ID. For instance, it's possible to use one query to fetch data about multiple devices, as long as the devices have the same partition ID.

The storage is accessed directly by the Streaming Service and the Telemetry API.

Annotations can be of multiple types, for example:

  • Device Alert, with acknowledgement status
  • Operational event, e.g. "deployed firmware 1.2", "added DPS"
  • Business event, e.g. "product going public"