JSON Serialization

This topic explains some details about how data are serialized to JSON, beyond the behavior of the standard Python json module.

The first thing to note is that JSON serialization is lossy. This is why Newt saves data in both pickle and JSON format.

The serialization nevertheless preserves class information.

The serialization, like pickle, supports cyclic data structures using a combination of persistent references and intra-record references.

Non-persistent instances

Non-persistent instances are converted to JSON objects with :: properties giving their dotted class names. In the common case of objects with their instance dictionaries used as their pickled state, the object attributes become properties.

So, for example, given a class MyClass in module mymodule:

class MyClass:

    def __init__(self, a, b):
        self.a = a
        self.b = b

The JSON serialization would look like:

{"::": "mymodule.MyClass", "a": 1, "b": 2}

Non-dictionary state

For instances with pickled state that’s not a dictionary, a JSON object is created with a state property containing the serialized state and a :: property with the dotted class name.

New arguments

Objects that take arguments to their __new__ method will have the arguments serialized in the ::() property.

Intra-object reference ids

If a record has cycles and an object in the record is referenced more than once, then the object will have an ::id property who’s value is an internal reference id.

For objects like lists and sets, which aren’t normally serialized as objects, when an object is referenced more than once, it’s wrapped in a “shared” object with an ::id property and a value property.

Intra-record cycles

Cyclic data structures are allowed within persistent object records, although they are extremely rare. When there’s a cycle, then objects that are referenced more than once:

  • have ::id properties that assign them intra-record ids.

    Objects like lists, who’s state are not dictionary are wrapped in a “shared” objects.

  • Are replaced with reference objects in all bit one of the references. Reference objects have a single property, ::-> giving the intra-record id of the object being referenced.

Here’s an example:

>>> from newt.db.tests.testjsonpickle import I
>>> i = I(a=1)
>>> d = dict(b=1)
>>> l = [i, i, d, d]
>>> l.append(l)

The serialization of the list, l would be equivalent to:

{
  "::": "shared",
  "::id": 0,
  "value": [
    {
      "::": "newt.db.tests.testjsonpickle.I",
      "::id": 2,
      "a": 1
    },
    {"::->": 2},
    {
      "::id": 5,
      "b": 1
    },
    {"::->": 5},
    {"::->": 0}
  ]
}

Intra-record references like these are difficult to work with, which is a good reason to avoid intra-record cycles.

Persistent object

Persistent objects are stored in 4 columns of the newt table:

Column Type
zoid bigint
class_name text
ghost_pickle bytea
state jsonb

The class name and state are separated and the state doesn’t have a :: property containing the dotted class name.

The ghost_pickle field contains the class name and __new__ arguments if necessary. It’s used to create new objects when searching.

Persistent references

When one persistent object references another persistent object, the reference is serialized with a reference object, having a property ::=> whose value is the object id of the referenced object [1]. For example, serialization of a sub-task object containing a reference to a parent task would be equivalent to:

{
  "title": "Do something",
  "parent": {"::=>": 42}
}

Note that cycles among persistent objects are common and don’t present any problems for serialization because persistent objects are serialized separately.

Dates and times

datetime.date objects and datetime.datetime instances without time zones are converted strings using their isoformat methods.

datetime.datetime instances with time zones are serialized as objects with a :: property of datetime, a value property with their ISO formatted value, and a tz property containing a JSON serialization of their time zones.

[1]This is a change from versions of Newt before 0.4.0. Earlier versions represented persistent references as objects with a :: property with the value persistent and an id property who’s value is an integer object id or a list containing an integer object id and a dotted class name. The attributes will be retained until Newt DB version 1, at which point they will no longer be included.