Bao Format

What is bao?

The basic format Assign Onward uses to communicate
*Bao is also well suited for compact data storage, but the specifics of data storage and retrieval implementations are outside the Ⓐ protocol scope, at the discretion of the application architects.
messages is called bao.
*Could stand for "Binary Assign Onward" format, or "Basic Assign Onward" format, or "Tasty, juicy filling with soft, fluffy wrappers." Take your pick, although I'd say the bao wrappers used in Ⓐ protocol are more thin than fluffy. Also, probably best represented as 包 in Chinese.
It is a strongly typed binary encoding which can be easily translated to and from json,
*Although bao can be translated to json, and back again, not all json can be represented in bao - due first to the strong type definition, and also to dictionary references required for keys and key codes.
and to dot representation.

What is in bao?

The actual data stored in bao is in one of the defined types:
*While some types are redundant, they are specified separately to ease implementation (for instance: int64 is much easier to access than arbitrary length integer) as well as preventing common problems such as confusion between arbitrary binary data and strings which are intended to be human readable. Types like floating point are specifically excluded to discourage their use in protocols which are meant to be easily 100% reproducable on varying hardware and software.
  • signed int64
  • arbitrary length integer
    *such as is implemented by arbitrary precision math libraries like GMP. Other libraries implementing the same functions may be used, as long as they produce correct results.
  • ratio of two arbitrary length integers
  • key code
  • UTF-8 string
  • octet stream
  • bao object
  • an array of any of the above types
Like json, bao wraps data in objects or arrays which may be nested. In bao data items, objects and arrays are always paired with keys.

Bao keys

Bao keys are defined in a dictionary. Each defined key has a unique human readable name which is used in json translations of bao objects and code references to the key. The key also has a unique integer representation, and that integer is encoded to a compact binary representation in bao byte streams. The lower bits of bao keys specify the type of the data which follows, as well as whether the data is a single item, or an array of items all of the same type as specified in the key.

The meaning of keys is often context specific, depending on what wrappers they are contained in. The first key, on the outermost wrapper of a bao message, defines the type of the message and what protocol rules apply to everything within the message's wrappers.

Bao objects

A bao object contains zero or more key value pair elements. Each element in a bao object has a key which is unique within the object, followed by either an "atomic" data item (listed above), an array of data items or objects, or a single object - the key defines both the type of data in the element and how that data is to be handled by the operating protocol.

Bao arrays

Bao arrays are a single key followed by an integer size zero or larger which specifies how many data elements (of the type specified by the key) follow. Bao arrays may contain objects,
*Any object may contain zero or more key value pairs where the value is an array, but... if the desire is to implement a multi-dimensional structure like a table or volume description, I would rather suggest that the data be laid into a one-dimensional array type and include the dimensions of the table / volume / whatever in a wrapper object which includes the array.
but may not directly contain arrays.
*In a sense, bao key codes specify an array of dimension zero or one. With the addition of a single bit to the array dimension encoding, two and three dimensional arrays could be specified in the array, enabling more json like arrays of arrays where each element in the outer array could contain an inner array of a different length. While academically interesting, the complexity of implementation seems a poor tradeoff for the value of having such encoding options available.

Bao encoding

Bao encoding is a fairly straightforward system where the outermost key is followed by the binary encode of the value belonging to that key. Object values start with a coded integer telling how many key-value elements are in the following object, then each key + value of the key in the object follows. Arrays similarly start with a coded integer telling how many value elements follow and then each value follows. Remember that values may themselves be objects which then nest further... The atomic types are encoded as follows:
  • signed int64 - 8 bytes lsb first.
  • arbitrary length integer - bcd digits, terminated by hex E for a positive and F for a negative number. If the terminator is the first nybble of a byte, it is followed by hex D.
  • ratio of two arbitrary length integers - two arbitrary length integers as above, numerator first.
  • key code - a string of one to nine bytes, most significant bits first, where the seven lsb of each byte are part of the value, and the first byte with a 0 in the most significant bit is the final byte of the string.
  • octet stream - an encoded length followed by that number of bytes.
  • UTF-8 string - exactly as the octet stream, but the contents are required to be valid UTF-8.
When bao objects are stored, the elements are sorted by
*the RiceyInt representation of the
key value. Order of bao arrays is preserved, not sorted. This means that a the bao binary representation of an object is specific, and will always be the same for a valid representation of the object. Json representations of bao are often in a very different order, sorted by the key name instead of its code value, but when they are translated back to bao, the original bao order is restored due to sorting by key value instead of name.