Container Types

The Universal Binary JSON Specification defines a total of 2 container types matching JSON’s container types:

  1. Array Type
  2. Object Type

Ignoring special-case optimizations, the design of the Universal Binary JSON containers is intentionally identical to JSON (the same start/end markers) and are streaming-friendly; more specifically they can be written out on-demand without knowing the size of the container ahead of time.

Optimized Format

Both array and object container types in UBJSON support being represented in a more optimized format that can increase parsing performance as well as shrink data size in most cases (without compression).

Please see Optimized Format below for details on how to leverage this support.

Array Type


The array type in Universal Binary JSON is defined as:

Type Size Marker Length Data Payload
array  2+ bytes**  [ and ]  Optional  Yes (if non-empty)

** See Optimized Format below.

Usage

The array type in Universal Binary JSON is equivalent to the array type from the JSON specification.

Example

JSON snippet (42 bytes compacted):

[
    null,
    true,
    false,
    4782345193,
    153.132,
    "ham"
]

UBJSON snippet (21 bytes, 50% smaller):

[[]
    [Z]
    [T]
    [F]
    [l][4782345193]
    [d][153.132]
    [S][i][3][ham]
[]]

[box type=”tick”]Universal Binary JSON format is 50% smaller than the compacted JSON.[/box]

Object Type


The object type in Universal Binary JSON is defined as:

Type Size Marker Length Data Payload
object  2+ bytes**  { and }  Optional  Yes (if non-empty)

** See Optimized Format below.

Usage

The object type in Universal Binary JSON is equivalent to the object type from the JSON specification.

Example

JSON snippet (90 bytes compacted):

{
    "post": {
        "id": 1137,
        "author": "rkalla",
        "timestamp": 1364482090592,
        "body": "I totally agree!"
    }
}

UBJSON snippet (82 bytes, 9% smaller):

[{]
    [i][4][post][{]
        [i][2][id][I][1137]
        [i][6][author][S][i][5][rkalla]
        [i][9][timestamp][L][1364482090592]
        [i][4][body][S][i][16][I totally agree!]
    [}]
[}]

[box type=”info”]NOTE: The [S] (string) marker is omitted from each of the names in the name/value pairings inside the object. The JSON specification does not allow non-string name values, therefore the [S] marker is redundant and must not be used.[/box]

Optimized Format


While the basic specification for the array and object types are identical to the JSON specification (i.e. simple beginning and end markers), both containers support optional parameters that can help optimize the container for better parsing performance and smaller size.

At a very high level, the optimized format for both array and object container types are built around two optional parameters: type and count

Type Size Marker Arg. Type Example Desc
type  1-byte  $  Value Type or Container Type Marker  [$][S]  string type
count  1-byte  #  Integer Numeric Value  [#][i][64]  count of 64

The effect on the container when specifying one or both parameters is as follows:

  • type [$] – when a type is specified, all value types stored in the container (either array or object) are considered to be of that singular type and as a result, type markers are omitted for each value in the container. This can be thought of providing the ability to create a strongly typed container in UBJSON.
    • If a type is specified, it must be done so before a count.
    • If a type is specified, a count must be specified as well (otherwise it is impossible to tell when a container is ending; e.g., did you just parse ‘]’ or the int8 value of 93?)
  • count [#] – when a count is specified, the parser is able to know ahead of time how many child elements will be parsed. This allows the parser to pre-size any internal construct used for parsing, verify that the promised number of child values were found and avoid scanning for any terminating bytes while parsing.
    • count can be specified without a type.

[box type=”info”]NOTE: Yes it is possible for an array or object to define their type as ‘[‘ or ‘{‘ to signal that they themselves contain additional containers![/box]

[box type=”download”]BONUS: Parsers can provide highly-optimized implementations for strongly typed containers of non-variable-length types (e.g. numeric, boolean, etc.) because the exact byte-length of the data is known![/box]

Some rules that generators and parsers need to be aware of when dealing with these optional parameters is as follows:

  • [count] A count must be >= 0.
  • [count] A count can be specified by itself.
  • [count] If a count is specified the container must not specify an end-marker.
  • [count] A container that specifies a count must contain the specified number of child elements.
  • [type] If a type is specified, it must be done so before count.
  • [type] If a type is specified, a count must also be specified. A type cannot be specified by itself.
  • [type] A container that specifies a type must not contain any additional type markers for any contained value.

Array Example

Below are examples of incrementally more optimized representations of an array in UBJSON.

No Optimization

[[]
    [d][29.97]
    [d][31.13]
    [d][67.0]
    [d][2.113]
    [d][23.888]
[]]

Optimized with count

[[][#][i][5] // An array of 5 elements.
    [d][29.97]
    [d][31.13]
    [d][67.0]
    [d][2.113]
    [d][23.8889]
// No end marker since a count was specified.

Optimized with type & count

[[][$][d][#][i][5] // An array of 5 float32 elements.
    [29.97] // Value type is known, so type markers are omitted.
    [31.13]
    [67.0]
    [2.113]
    [23.8889]
// No end marker since a count was specified.

Object Example

Below are examples of incrementally more optimized representations of an object in UBJSON.

[box type=”info”]Remember, in UBJSON the string markers ([S]) are omitted from the names in the name-value pairs of an Object because JSON only allows names of type string.[/box]

No Optimization

[{]
    [i][3][lat][d][29.976]
    [i][4][long][d][31.131]
    [i][3][alt][d][67.0]
[}]

Optimized with count

[{][#][i][3] // An object of 3 name:value pairs.
    [i][3][lat][d][29.976]
    [i][4][long][d][31.131]
    [i][3][alt][d][67.0]
// No end marker since a count was specified.

Optimized with type & count

[{][$][d][#][i][3] // An object of 3 name:float32-value pairs.
    [i][3][lat][29.976] // Value type is known, so type markers are omitted.
    [i][4][long][31.131] 
    [i][3][alt][67.0] 
// No end marker since a count was specified.

Special Cases (Null, No-Op and Boolean)

Up until now all the examples of leveraging type and count have illustrated the benefit of optimizing out the markers from value types that have a data payload (e.g. numeric values, strings, etc.); since the type of all the values are known, the markers are easily omitted. There are, however, a few special value types that have no data payload and the markers themselves represent the value, specifically: nullno-op and boolean.

This section will take a look at how those types behave when used with strongly-typed containers.

At a high level, placing these values in a strongly-typed container provides the basic behavior of essentially pre-defining the value for every element in the container. In the case of and array, all the values contained in it. In the case of an object, all the values associated with all the names in the name-value pairs.

Array

[[][$][N][#][I][512] // 512 'no-op' values.

The example above is a strongly typed array of type no-op and with a count of 512.

This simple declaration is equivalent to a 514-byte array containing 512 [N] markers; instead this single line is 6-bytes providing a 99% size reduction.

Admittedly this is a selective example of leveraging this feature, but the point is that there are potentially very large performance and size optimizations available if your data can take advantage of this shorthand.

[box type=”info”]Strongly-typed arrays of nullno-op and boolean must have an empty body. The header itself defines the container’s contents.[/box]

Object

[{][$][Z][#][i][3]
    [i][4][name] // name only, no value specified.
    [i][8][password]
    [i][5][email]

The example above is a strongly typed object of type null and with a count of 3.

When used in the context of an object, specifying one of these special-case values as a type has the effect of setting the default value for every name-value pair in the object; therefore the object only contains the names of all the pairs.

In the case of objects the space-savings is typically a little less drastic than in the array case depending on the size of the names; in the case of small names, it could be significant, approaching a 50% reduction.

[box type=”info”]Strongly-typed objects of nullno-op and boolean must not have any values specified in the body, just the name portions of the name-value pairs. The header itself defines the value for every name-value pair.[/box]

Size & Performance Benefits

The benefits realized by leveraging the optimized container types in UBJSON depend heavily on the data being stored and the implementation of the generator or parser. Baring the frustration of “it depends” as an answer, the benefits can be viewed at a very high level as the following:

Optimized for Parsing

By specifying a count, you are hinting to the parser about the number of elements to expect. The performance gains are primarily around allowing the parser to pre-size its internal data structures to exactly the right size to hold pointers to the parsed values.

By specifying a type and count, the parser not only knows how many child elements to expect, as well as less data to parse and less conditions to run (no marker checks), but in the cases of fixed-length values, the parser knows the exact byte length of the payload!

For example, consider:

[[][$][l][#][I][1024] // 1,024 int32 values
    [32]
    [2147483647]
    [101231]
    [77832823]
    ... 1,000 more int32 values ...

After the parser parses the container’s header, it knows the byte length of the entire payload is 4096 and in a single read operation can read all the values in and quickly break them up into their int32 representations.

When you are able to leverage the type and count together to help the parser understand the payload in more detail is where the real performance gains come from.

Simple Validation Mechanism

By specifying a count parameter, you are telling the parser the number of child elements it should find in the container. In the case where the parser is unable to find the specified number of child elements it can quickly report a format error to the caller.

This is a very simple version of verification and not as robust as say a checksum-based approach, but it still provides benefit in addition to a performance gain.

Reduce Size up to 50%

This is a 1-byte-per-value reduction in any container where strong typing is used.

In the case of containers holding large amounts of fairly compact data (small numbers, chars, small strings or value-types like null), removing the type marker from the beginning of each of the values in the container can almost cut the size requirements for the data in half.

The smaller the containers and bigger the individual values are (large numbers, large strings) the less size benefit this optimization will have, but it still provides a potentially significant opportunity to the parser to optimize it’s code paths for parsing large chunks of same-type values (and not needing to worry about type changes mid-container). This is covered in more detail in the previous section: Optimized for Parsing

Binary Data Support

This section is here for referential convenience; please see Binary Data for information on storing binary data in UBJSON.