-
Notifications
You must be signed in to change notification settings - Fork 1
JSON data
Just like delimited data, JSON data, short for JavaScript Object Notation is nearly everywhere in the data science world. Similar to a data frame, a JSON object is a 'container' for data. The data it contains can have many different types. The basic structure is a set of key/value pairs. Let's look at the possible combinations for key/value pairs and a specific example.
// generally
{
<string>: <string, number, object, array, true, false, null>
}
// example
{
"key1": "bla",
"key2": 80,
"key3": { "key3Inner": 9 },
"key4": [5, 6, 6, 7, 3, 4],
"key5": true,
"key6": false,
"key7": null
}When working with JSON data in Python and/or R, the same data will be represented as a different data structure in memory - in Python typically a dictionary and in R typically a list. Accessing keys and values then becomes a matter of using the dictionary or list interface as usual. Both programming languages have libraries for reading JSON data into memory, converting it to a particular data structure, and for writing data to JSON. The process of transforming a data structure to a format that can be stored, such as JSON, is called serialization, while the opposite is called deserialization. Therefore, one will often hear people talk about serializing and/or deserializing JSON. For the rest of this section suppose we have the example JSON data in a file called ex.json.
The Python standard library ships with a module called json which gives the user the most functionality they typically need. However, it is common to install and use a more performant module named ujson (for ultra json). When your application demands speed this is a good idea. Both modules have the same interface for reading, writing and manipulating JSON data so it will not have much of an impact on your code.
import ujson as json
# read the file
f = open("ex.json", "r")
data = json.loads(f.read())
f.close()
# have a look at the data
data.keys()
data.values()
# remove a key/value pair
data.pop("key1")
# serialize the dictionary to JSON
json.dumps(data)The short session above shows the very basics of working with JSON data in Python.
Base R does not have capabilities for dealing with JSON data, so one has to refer to a package. There are about three different ones that provide roughly the same functionality. The small example will show how to use the jsonlite package.
library(jsonlite)
# read the file
data <- fromJSON("ex.json")
# have a look
str(data)
# remove a key
data$key1 <- NULL
# serialize the list to JSON
toJSON(data)- Please drop me a mail