This page contains information for developers looking to develop a Universal Binary JSON library.
Libraries implementing the Universal Binary JSON spec must adhere to the following guidelines:
- Parsers must follow a “writer-makes-right” policy – more specifically, if a parser encounters unexpected or invalid data (e.g. negative container length value) an exception should be thrown and parsing stopped.
- Optimizing Container Performance
- Using Smallest Number Representation
- Handling High-Precision Numbers on Unsupported Platforms
Through work with the community, feedback from others and our own experience with the specification, below are some of the best-practices collected into one place making it easy for folks working with the format to find answers to the more flexible portions of the spec.
[box type=”tick”]Why: (Potentially large) data size reduction and parsing performance increase.
How: Homogeneous data type in a container.[/box]
Very large performance advantages are available when writing out ARRAY or OBJECT containers that contain same-type values. Be sure to read through the optimized container format that can be leveraged in these cases.
A typical level of optimization is being able to omit all the marker characters for all same-typed values in a container, making the sizes of all typical value types 1-byte smaller.
An a-typical level of optimization, that leads to the biggest reduction, is for all 1-byte value types (e.g. NO-OP, NULL, etc); when used in conjunction with the optimized container format, the values themselves can be omitted from the container entirely leading to a space savings that approaches 100% as the size of the container grows.
[box type=”tick”]Why: ~50% size reduction for numbers > 5 digits and < 20 digits.
How: Always use the most compact numeric type possible when writing UBJSON.[/box]
Numeric values can be represented in a number of ways in UBJSON; you can reduce the size of your UBJSON by inspecting the stored value and ensuring it is represented in the most-compact numeric representation possible when storing the UBJSON blob.
Keep in mind that varying the type of values inside of a container may impact your ability to use the type parameter to optimize container storage.
[box type=”tick”]Why: Cleanly handle > 64-bit numbers on platforms that don’t support them.
How: By using the high-precision type.[/box]
Not every language supports arbitrarily long numbers and some not even numbers greater than 64-bits in size. In order to safely allow the transport and handling of > 64-bit numbers across every platform, UBJSON provides the high-precision numeric type.
The high-precision type is a string-based type (identical in format to the string type) that provides a universally compatible mechanism by which arbitrarily large or precise numbers can be handled.
For platforms with arbitrarily large/precise number support, they are free to parse the high-precision value into a native type; for platforms without support, the high-precision value can be safely passed on, persisted to storage or handled in other non-numeric ways while still allowing the client to handle the request and not overflow or otherwise balk at the unsupported numeric type.
That said, for libraries written to support platforms that do not natively support arbitrarily large or precise values, the following guidance can be employed to provide a safe and consistent behavior when encountering them:
- [Default] Exception/Error: Throw an exception(or return an error) when an unsupported high-precision value is encountered during parsing. The platform doesn’t support them so allow the client a chance to be aware of the fact that it is receiving data it won’t know how to parse into a native type.
- [Optional] Handle as a String: (must be user-enabled) In the case where the client doesn’t need to do any processing of the value and is just doing pass-through like persisting it to a data store, treat the high-precision value as a string and return it to the caller.
- [Optional] Skip: (must be user-enabled) Provide the ability for the parser to optionally skip unsupported values during parsing. Be aware that this is a dangerous approach and will likely lead to data loss (skipped values won’t be visible to the client), but in the case where a client must be able to parse any and all UBJSON it received even if it doesn’t support arbitrarily large or precise numbers, then this has to be considered.
These guidelines should provide the most functional experience for a client to work with UBJSON on their platform of choice.
[box type=”alert”]Example files below only support Draft 8[/box]
You can find files to test your implementation with here. There are formatted-json, compacted-json and UBJ versions of each of the testing files contained in the repository.
The simple Java classes that have matching names to the UBJ files are Java class representations of the files (for Java testing) and the Marshaller classes are the hand-coded serialization and deserialization code used to write out and read in those test files from UBJ format.
Even if you are not working in Java, you can use those classes as a high level guide if you are curious or ignore them completely and just test against the raw file resources.