All checks were successful
Gajumaru Serialization Tests / tests (push) Successful in -3m56s
gmser_dyn no longer tries to compress output by omitting type tags. Decoding streams using custom template codes can either use 'strict' decoding, in which case matching templates must be registered on the decoding end; in `strict => false` mode, the stream can still be decoded without valudation if the custom template is missing.
268 lines
9.3 KiB
Markdown
268 lines
9.3 KiB
Markdown
# GM Serialization
|
|
|
|
Serialization helpers for the Gajumaru.
|
|
|
|
For an overview of the static serializer, see [this document](doc/static.md).
|
|
|
|
## Build
|
|
|
|
$ rebar3 compile
|
|
|
|
|
|
## Test
|
|
|
|
$ rebar3 eunit
|
|
|
|
## Dynamic encoding
|
|
|
|
The module `gmser_dyn` offers dynamic encoding support, encoding most 'regular'
|
|
Erlang data types into an internal RLP representation.
|
|
|
|
Main API:
|
|
* `encode(term()) -> iolist()`
|
|
* `encode_typed(template(), term()) -> iolist()`
|
|
* `decode(iolist()) -> term()`
|
|
|
|
* `serialize(term()) -> binary()`
|
|
* `serialize_typed(template(), term()) -> binary()`
|
|
* `deserialize(binary()) -> term()`
|
|
|
|
In the examples below, we use the `decode` functions, to illustrate
|
|
how the type information is represented. The fully serialized form is
|
|
produced by the `serialize` functions.
|
|
|
|
The basic types supported by the encoder are:
|
|
* `integer()` (`anyint`, code: 246)
|
|
* `neg_integer()` (`negint`, code: 247)
|
|
* `non_neg_integer()` (`int` , code: 248)
|
|
* `binary()` (`binary`, code: 249)
|
|
* `boolean()` (`bool` , code: 250)
|
|
* `list()` (`list` , code: 251)
|
|
* `map()` (`map` , code: 252)
|
|
* `tuple()` (`tuple` , code: 253)
|
|
* `gmser_id:id()` (`id` , code: 254)
|
|
* `atom()` (`label` , code: 255)
|
|
|
|
(The range of codes is chosen because the `gmser_chain_objects` codes
|
|
range from 10 to 200, and also to stay within 1 byte.)
|
|
|
|
When encoding `map` types, the map elements are first sorted.
|
|
|
|
When specifying a map type for template-driven encoding, use
|
|
the `#{items => [{Key, ValueType} | {opt, Key, ValueType}]}` construct.
|
|
The key names are included in the encoding, and are match against the item
|
|
specs during decoding. If the key names don't match, the decoding fails, unless
|
|
for an `{opt, K, V}` item, in which case that item spec is skipped.
|
|
|
|
```erlang
|
|
T = #{items => [{a,int},{opt,b,int},{c,int}]}
|
|
E1 = gmser_dyn:encode_typed(T, #{a => 1, b => 2, c => 3}) ->
|
|
[<<0>>,<<1>>,[<<252>>,
|
|
[[[<<255>>,<<97>>],[<<248>>,<<1>>]],
|
|
[[<<255>>,<<98>>],[<<248>>,<<2>>]],
|
|
[[<<255>>,<<99>>],[<<248>>,<<3>>]]]]]
|
|
E2 = gmser_dyn:encode_typed(T, #{a => 1, c => 3}) ->
|
|
[<<0>>,<<1>>,[<<252>>,
|
|
[[[<<255>>,<<97>>],[<<248>>,<<1>>]],
|
|
[[<<255>>,<<99>>],[<<248>>,<<3>>]]]]]
|
|
gmser_dyn:decode_typed(T,E2) ->
|
|
#{c => 3,a => 1}
|
|
```
|
|
|
|
## Labels
|
|
|
|
Labels correspond to (existing) atoms in Erlang.
|
|
Decoding of a label results in a call to `binary_to_existing_atom/2`, so will
|
|
fail if the corresponding atom does not already exist.
|
|
|
|
This behavior can be modified using the option `#{missing_labels => fail | create | convert}`,
|
|
where `fail` is the default, as described above, `convert` means that missing atoms are
|
|
converted to binaries, and `create` means that the atom is created dynamically.
|
|
|
|
The option can be passed e.g.:
|
|
```erlang
|
|
gmser_dyn:deserialize(Binary, gmser_dyn:set_opts(#{missing_labels => convert}))
|
|
```
|
|
|
|
or
|
|
```erlang
|
|
gmser_dyn:deserialize(Binary, gmser_dyn:set_opts(#{missing_labels => convert}, Types))
|
|
```
|
|
|
|
By calling `gmser_dyn:register_types/1`, after having added options to the type map,
|
|
the options can be made to take effect automatically.
|
|
|
|
|
|
It's possible to cache labels for more compact encoding.
|
|
Note that when caching labels, the same cache mapping needs to be used on the
|
|
decoder side.
|
|
|
|
Labels are encoded as `[<<255>>, << AtomToBinary/binary >>]`.
|
|
If a cached label is used, the encoding becomes `[<<255>, [Ix]]`, where
|
|
`Ix` is the integer-encoded index value of the cached label.
|
|
|
|
## Examples
|
|
|
|
Dynamically encoded objects have the basic structure `[<<0>>,V,Obj]`, where `V` is the
|
|
integer-coded version, and `Obj` is the top-level encoding on the form `[Tag,Data]`.
|
|
|
|
```erlang
|
|
E = fun(T) -> io:fwrite("~w~n", [gmser_dyn:encode(T)]) end.
|
|
|
|
E(17) -> [<<0>>,<<1>>,[<<248>>,<<17>>]]
|
|
E(<<"abc">>) -> [<<0>>,<<1>>,[<<249>>,<<97,98,99>>]]
|
|
E(true) -> [<<0>>,<<1>>,[<<250>>,<<1>>]]
|
|
E(false) -> [<<0>>,<<1>>,[<<250>>,<<0>>]]
|
|
E([1,2]) -> [<<0>>,<<1>>,[<<251>>,[[<<248>>,<<1>>],[<<248>>,<<2>>]]]]
|
|
E({1,2}) -> [<<0>>,<<1>>,[<<253>>,[[<<248>>,<<1>>],[<<248>>,<<2>>]]]]
|
|
E(#{a=>1, b=>2}) ->
|
|
[<<0>>,<<1>>,[<<252>>,[[[<<255>>,<<97>>],[<<248>>,<<1>>]],[[<<255>>,<<98>>],[<<248>>,<<2>>]]]]]
|
|
E(gmser_id:create(account,<<1:256>>)) ->
|
|
[<<0>>,<<1>>,[<<254>>,<<1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1>>]]
|
|
```
|
|
|
|
Note that tuples and list are encoded the same way, except for the initial type tag.
|
|
Maps are encoded as `[<Map>, [KV1, KV2, ...]]`, where `[KV1, KV2, ...]` is the sorted
|
|
list of key-value tuples from `map:to_list(Map)`, but with the `tuple` type tag omitted.
|
|
|
|
## Template-driven encoding
|
|
|
|
Templates can be provided to the encoder by either naming an already registered
|
|
type, or by passing a template directly. In both cases, the encoder will enforce
|
|
the type information in the template.
|
|
|
|
If the template has been registered, the encoder uses the registered type specification
|
|
to drive the encoding. The code of the registered template is embedded in the encoded
|
|
output:
|
|
|
|
```erlang
|
|
gmser_dyn:encode_typed({int,int,int}, {1,2,3}) ->
|
|
[<<0>>,<<1>>,[<<253>>,
|
|
[[<<248>>,<<1>>],[<<248>>,<<2>>],[<<248>>,<<3>>]]]]
|
|
|
|
Types = gmser_dyn_types:add_type(t3,1013,{int,int,int}).
|
|
gmser_dyn:encode_typed(t3, {1,2,3}, Types) ->
|
|
[<<0>>,<<1>>,[[<<3,245>>,<<253>>],
|
|
[[<<248>>,<<1>>],[<<248>>,<<2>>],[<<248>>,<<3>>]]]]
|
|
```
|
|
|
|
Note that the original `<<253>>` type code is wrapped as `[<<3,245>>,<<253>>]`,
|
|
where `<<3,245>>` corresponds to the custom code `1013`.
|
|
|
|
Using the default option `#{strict => true}`, the decoder will extract the custom
|
|
type spec, and validate the encoded data against it. If the custom code is missing,
|
|
the decoder aborts. Using `#{strict => false}`, the custom code is used if it exists,
|
|
but otherwise, it's ignored, and the encoded data is decoded using the dynamic type
|
|
info.
|
|
|
|
### Alternative types
|
|
|
|
The dynamic encoder supports a few additions to the `gmserialization` template
|
|
language: `any`, `#{list => Type}`, `#{alt => [AltTypes]}` and `#{switch => [AltTypes]}`.
|
|
|
|
#### `any`
|
|
|
|
The `any` type doesn't have an associated code, but enforces dynamic encoding.
|
|
|
|
#### `list`
|
|
|
|
The original list type notation expects a key-value list, e.g.
|
|
|
|
`[{name, binary}, {age, int}]`
|
|
|
|
```erlang
|
|
EL = gmser_dyn:encode_typed([{name,binary},{age,int}], [{name,<<"Ulf">>},{age,29}]) ->
|
|
[<<0>>,<<1>>,[<<251>>,
|
|
[[<<253>>,[[<<255>>,<<110,97,109,101>>],[<<249>>,<<85,108,102>>]]],
|
|
[<<253>>,[[<<255>>,<<97,103,101>>],[<<248>>,<<29>>]]]]]]
|
|
```
|
|
Note that the encoding explicitly lays out a `[{Key, Value}]` structure, all
|
|
dynamically typed. This means it can be dynamically decoded without templates.
|
|
|
|
```erlang
|
|
gmser_dyn:decode(EL).
|
|
[{name,<<"Ulf">>},{age,29}]
|
|
```
|
|
|
|
In order to specify something like Erlang's `[integer()]` type, we can use
|
|
the following:
|
|
|
|
```erlang
|
|
gmser_dyn:encode_typed(#{list => int}, [1,2,3,4]) ->
|
|
[<<0>>,<<1>>,[<<251>>,
|
|
[[<<248>>,<<1>>],[<<248>>,<<2>>],[<<248>>,<<3>>],[<<248>>,<<4>>]]]]
|
|
```
|
|
|
|
#### `alt`
|
|
|
|
The `#{alt => [Type]}` construct also enforces dynamic encoding, and will try
|
|
to encode as each type in the list, in the specified order, until one matches.
|
|
|
|
```erlang
|
|
gmser_dyn:encode_typed(#{alt => [negint,int]}, 5) -> [<<0>>,<<1>>,[<<247>>,<<5>>]]
|
|
gmser_dyn:encode_typed(#{alt => [negint,int]}, 5) -> [<<0>>,<<1>>,[<<248>>,<<5>>]]
|
|
|
|
gmser_dyn:encode_typed(anyint,-5) -> [<<0>>,<<1>>,[<<246>>,[<<247>>,<<5>>]]]
|
|
gmser_dyn:encode_typed(anyint,5) -> [<<0>>,<<1>>,[<<246>>,[<<248>>,<<5>>]]]
|
|
```
|
|
|
|
#### `switch`
|
|
|
|
The `switch` type allows for encoding a 'tagged' object, where the tag determines
|
|
the type.
|
|
|
|
```erlang
|
|
E1 = gmser_dyn:encode_typed(#{switch => #{name => binary, age => int}}, #{age => 29}) ->
|
|
[<<0>>,<<1>>,[<<252>>,[[[<<255>>,<<97,103,101>>],[<<248>>,<<29>>]]]]]
|
|
gmser_dyn:decode_typed(#{switch => #{name => binary, age => int}}, E1) ->
|
|
#{age => 29}
|
|
E2 = gmser_dyn:encode_typed(#{switch => #{name => binary, age => int}}, #{name => <<"Ulf">>}) ->
|
|
[<<0>>,<<1>>,[<<252>>,[[[<<255>>,<<110,97,109,101>>],[<<249>>,<<85,108,102>>]]]]]
|
|
gmser_dyn:decode_typed(#{switch => #{name => binary, age => int}}, E1) ->
|
|
#{name => <<"Ulf">>}
|
|
```
|
|
|
|
A practical use of `switch` would be in a protocol schema:
|
|
|
|
```erlang
|
|
t_msg(_) ->
|
|
#{switch => #{ call => t_call
|
|
, reply => t_reply
|
|
, notification => t_notification }}.
|
|
|
|
t_call(_) ->
|
|
#{items => [ {id, anyint}
|
|
, {req, t_req} ]}.
|
|
|
|
t_reply(_) ->
|
|
#{alt => [#{items => [ {id, anyint}
|
|
, {result, t_result} ]},
|
|
#{items => [ {id, anyint}
|
|
, {code, anyint}
|
|
, {message, binary} ]}
|
|
]}.
|
|
```
|
|
|
|
In this scenario, messages are 'taggged' as 1-element maps, e.g.:
|
|
|
|
```erlang
|
|
async_request(Msg) ->
|
|
Id = erlang:unique_integer(),
|
|
gmmp_cp:to_server(
|
|
whereis(gmmp_core_connector),
|
|
#{call => #{ id => Id
|
|
, req => Msg }}),
|
|
Id.
|
|
```
|
|
|
|
### Notes
|
|
|
|
Note that `anyint` is a standard type. The static serializer supports only
|
|
positive integers (`int`), as negative numbers are forbidden on-chain.
|
|
For dynamic encoding e.g. in messaging protocols, handling negative numbers can
|
|
be useful, so the `negint` type was added as a dynamic type. To encode a full-range
|
|
integer, the `alt` construct is needed.
|
|
|
|
(Floats are not supported, as they are non-deterministic. Rationals and fixed-point
|
|
numbers could easily be handled as high-level types, e.g. as `{int,int}`.)
|