Skip to content

Commit 0961b95

Browse files
committed
Initial commit with basic description
0 parents  commit 0961b95

File tree

1 file changed

+65
-0
lines changed

1 file changed

+65
-0
lines changed

README.md

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
# JSON Schema Profile
2+
3+
The goal of JSON Schema Profile is to augment the vocabulary of [JSON Schema](http://json-schema.org/) to represent properties of the data as opposed to focusing only on the structure.
4+
5+
## Definitions
6+
### Bloom filter
7+
This is a string which represents a serialized Bloom filter. Currently this is a Base64 encoded serialized value of the specific Bloom filter class used by [JSONoid](https://github.com/dataunitylab/jsonoid-discovery), but we plan to make this a more reusable format.
8+
9+
Bloom filters are useful to check if specific values were observed for a particular property without the need to store all the values.
10+
11+
### Histogram
12+
property | description
13+
:-- | :--
14+
`bins` | An array of two-element arrays where the first element is the mean of the bin and the second is the number of elements in the bin
15+
`hasExtremeValues` | A Boolean indicating whether the histogram contains values which cannot be represented in the given bounds. This usually only occurs for extremely large absolute values and is rarely observed in practice
16+
17+
### Statistics
18+
property | description
19+
:-- | :--
20+
`variance` | The variance of all values of this property
21+
`stdev` | The standard deviation of all values of this property
22+
`skewness` | The skewness of all values of this property
23+
`kurtosis` | The kurtosis of all values of this property
24+
25+
## Arrays
26+
property | description
27+
:-- |:--
28+
`lengthHistogram` | A [histogram](#Histogram) of array lengths
29+
30+
## Booleans
31+
32+
property | description
33+
:-- |:--
34+
`pctTrue` | Percentage of the Boolean values which are `true`
35+
36+
## Integers
37+
38+
property | description
39+
:-- | :--
40+
`bloomFilter` | A [Bloom filter](#bloom-filter) of integer values
41+
`distinctValues` | An estimate of the number of distinct values of this property
42+
`histogram` | A [histogram](#histogram) of integer values
43+
`statistics` | A set of [statistics](#statistics) of integer values
44+
45+
## Numbers
46+
47+
property | description
48+
:-- | :--
49+
`bloomFilter` | A [Bloom filter](#bloom-filter) of number values
50+
`distinctValues` | An estimate of the number of distinct values of this property
51+
`histogram` | A [histogram](#histogram) of number values
52+
`statistics` | A set of [statistics](#statistics) of number values
53+
54+
## Objects
55+
56+
property | description
57+
:-- | :--
58+
`fieldPresence` | An object where the value represents the percentage of the time the corresponding key appears
59+
60+
## Strings
61+
property | description
62+
:-- |:--
63+
`bloomFilter` | A [Bloom filter](#bloom-filter) of string values
64+
`distinctValues` | An estimate of the number of distinct values of this property
65+
`lengthHistogram` | A [histogram](#Histogram) of string lengths

0 commit comments

Comments
 (0)