237 lines
8.4 KiB
Markdown
237 lines
8.4 KiB
Markdown
# GM Serialization
|
|
|
|
Serialization helpers for the Gajumaru.
|
|
|
|
For an overview of the static serializer, see [this document](doc/static.md).
|
|
|
|
## Build
|
|
|
|
$ rebar3 compile
|
|
|
|
|
|
## Test
|
|
|
|
$ rebar3 eunit
|
|
|
|
## Dynamic encoding
|
|
|
|
The module `gmser_dyn` offers dynamic encoding support, encoding most 'regular'
|
|
Erlang data types into an internal RLP representation.
|
|
|
|
Main API:
|
|
* `encode(term()) -> iolist()`
|
|
* `encode_typed(template(), term()) -> iolist()`
|
|
* `decode(iolist()) -> term()`
|
|
|
|
* `serialize(term()) -> binary()`
|
|
* `serialize_typed(template(), term()) -> binary()`
|
|
* `deserialize(binary()) -> term()`
|
|
|
|
In the examples below, we use the `decode` functions, to illustrate
|
|
how the type information is represented. The fully serialized form is
|
|
produced by the `serialize` functions.
|
|
|
|
The basic types supported by the encoder are:
|
|
* `integer()` (`anyint`, code: 246)
|
|
* `neg_integer()` (`negint`, code: 247)
|
|
* `non_neg_integer()` (`int` , code: 248)
|
|
* `binary()` (`binary`, code: 249)
|
|
* `boolean()` (`bool` , code: 250)
|
|
* `list()` (`list` , code: 251)
|
|
* `map()` (`map` , code: 252)
|
|
* `tuple()` (`tuple` , code: 253)
|
|
* `gmser_id:id()` (`id` , code: 254)
|
|
* `atom()` (`label` , code: 255)
|
|
|
|
(The range of codes is chosen because the `gmser_chain_objects` codes
|
|
range from 10 to 200, and also to stay within 1 byte.)
|
|
|
|
When encoding `map` types, the map elements are first sorted.
|
|
|
|
When specifying a map type for template-driven encoding, use
|
|
the `#{items => [{Key, ValueType} | {opt, Key, ValueType}]}` construct.
|
|
The key names are included in the encoding, and are match against the item
|
|
specs during decoding. If the key names don't match, the decoding fails, unless
|
|
for an `{opt, K, V}` item, in which case that item spec is skipped.
|
|
|
|
```erlang
|
|
T = #{items => [{a,int},{opt,b,int},{c,int}]}
|
|
E1 = gmser_dyn:encode_typed(T, #{a => 1, b => 2, c => 3}) ->
|
|
[<<0>>,<<1>>,[<<252>>,
|
|
[[[<<255>>,<<97>>],[<<248>>,<<1>>]],
|
|
[[<<255>>,<<98>>],[<<248>>,<<2>>]],
|
|
[[<<255>>,<<99>>],[<<248>>,<<3>>]]]]]
|
|
E2 = gmser_dyn:encode_typed(T, #{a => 1, c => 3}) ->
|
|
[<<0>>,<<1>>,[<<252>>,
|
|
[[[<<255>>,<<97>>],[<<248>>,<<1>>]],
|
|
[[<<255>>,<<99>>],[<<248>>,<<3>>]]]]]
|
|
gmser_dyn:decode_typed(T,E2) ->
|
|
#{c => 3,a => 1}
|
|
```
|
|
|
|
## Labels
|
|
|
|
Labels correspond to (existing) atoms in Erlang.
|
|
Decoding of a label results in a call to `binary_to_existing_atom/2`, so will
|
|
fail if the corresponding atom does not already exist.
|
|
|
|
This behavior can be modified using the option `#{missing_labels => fail | create | convert}`,
|
|
where `fail` is the default, as described above, `convert` means that missing atoms are
|
|
converted to binaries, and `create` means that the atom is created dynamically.
|
|
|
|
The option can be passed e.g.:
|
|
```erlang
|
|
gmser_dyn:deserialize(Binary, set_opts(#{missing_labels => convert}))
|
|
```
|
|
|
|
or
|
|
```erlang
|
|
gmser_dyn:deserialize(Binary, set_opts(#{missing_labels => convert}, Types))
|
|
```
|
|
|
|
By calling `gmser_dyn:register_types/1`, after having added options to the type map,
|
|
the options can be made to take effect automatically.
|
|
|
|
|
|
It's possible to cache labels for more compact encoding.
|
|
Note that when caching labels, the same cache mapping needs to be used on the
|
|
decoder side.
|
|
|
|
Labels are encoded as `[<<255>>, << AtomToBinary/binary >>]`.
|
|
If a cached label is used, the encoding becomes `[<<255>, [Ix]]`, where
|
|
`Ix` is the integer-encoded index value of the cached label.
|
|
|
|
## Examples
|
|
|
|
Dynamically encoded objects have the basic structure `[<<0>>,V,Obj]`, where `V` is the
|
|
integer-coded version, and `Obj` is the top-level encoding on the form `[Tag,Data]`.
|
|
|
|
```erlang
|
|
E = fun(T) -> io:fwrite("~w~n", [gmser_dyn:encode(T)]) end.
|
|
|
|
E(17) -> [<<0>>,<<1>>,[<<248>>,<<17>>]]
|
|
E(<<"abc">>) -> [<<0>>,<<1>>,[<<249>>,<<97,98,99>>]]
|
|
E(true) -> [<<0>>,<<1>>,[<<250>>,<<1>>]]
|
|
E(false) -> [<<0>>,<<1>>,[<<250>>,<<0>>]]
|
|
E([1,2]) -> [<<0>>,<<1>>,[<<251>>,[[<<248>>,<<1>>],[<<248>>,<<2>>]]]]
|
|
E({1,2}) -> [<<0>>,<<1>>,[<<253>>,[[<<248>>,<<1>>],[<<248>>,<<2>>]]]]
|
|
E(#{a=>1, b=>2}) ->
|
|
[<<0>>,<<1>>,[<<252>>,[[[<<255>>,<<97>>],[<<248>>,<<1>>]],[[<<255>>,<<98>>],[<<248>>,<<2>>]]]]]
|
|
E(gmser_id:create(account,<<1:256>>)) ->
|
|
[<<0>>,<<1>>,[<<254>>,<<1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1>>]]
|
|
```
|
|
|
|
Note that tuples and list are encoded the same way, except for the initial type tag.
|
|
Maps are encoded as `[<Map>, [KV1, KV2, ...]]`, where `[KV1, KV2, ...]` is the sorted
|
|
list of key-value tuples from `map:to_list(Map)`, but with the `tuple` type tag omitted.
|
|
|
|
## Template-driven encoding
|
|
|
|
Templates can be provided to the encoder by either naming an already registered
|
|
type, or by passing a template directly. In both cases, the encoder will enforce
|
|
the type information in the template.
|
|
|
|
If the template has been registered, the encoder omits inner type tags (still
|
|
inserting the top-level tag), leading to some compression of the output.
|
|
This also means that the serialized term cannot be decoded without the same
|
|
schema information on the decoder side.
|
|
|
|
In some cases, the type tags will still be emitted. These are when alternative types
|
|
appear, and for enumerated map types (`#{items => ...}`). In the latter case, it is
|
|
due to the support for optional items.
|
|
|
|
In the case of a directly provided template, all type information is inserted,
|
|
such that the serialized term can be decoded without any added type information.
|
|
The template types are still enforced during encoding.
|
|
|
|
```erlang
|
|
ET = fun(Type,Term) -> io:fwrite("~w~n", [gmser_dyn:encode_typed(Type,Term)]) end.
|
|
|
|
ET([{int,int}], [{1,2}]) -> [<<0>>,<<1>>,[<<251>>,[[[<<248>>,<<1>>],[<<248>>,<<2>>]]]]]
|
|
|
|
gmser_dyn:register_type(1000,lt2i,[{int,int}]).
|
|
ET(lt2i, [{1,2}]) -> [<<0>>,<<1>>,[<<3,232>>,[[<<1>>,<<2>>]]]]
|
|
```
|
|
|
|
### Alternative types
|
|
|
|
The dynamic encoder supports two additions to the `gmserialization` template
|
|
language: `any`, `#{alt => [AltTypes]}` and `#{switch => [AltTypes]}`.
|
|
|
|
#### `any`
|
|
|
|
The `any` type doesn't have an associated code, but enforces dynamic encoding.
|
|
|
|
#### `alt`
|
|
|
|
The `#{alt => [Type]}` construct also enforces dynamic encoding, and will try
|
|
to encode as each type in the list, in the specified order, until one matches.
|
|
|
|
```erlang
|
|
gmser_dyn:encode_typed(#{alt => [negint,int]}, 5) -> [<<0>>,<<1>>,[<<247>>,<<5>>]]
|
|
gmser_dyn:encode_typed(#{alt => [negint,int]}, 5) -> [<<0>>,<<1>>,[<<248>>,<<5>>]]
|
|
|
|
gmser_dyn:encode_typed(anyint,-5) -> [<<0>>,<<1>>,[<<246>>,[<<247>>,<<5>>]]]
|
|
gmser_dyn:encode_typed(anyint,5) -> [<<0>>,<<1>>,[<<246>>,[<<248>>,<<5>>]]]
|
|
```
|
|
|
|
#### `switch`
|
|
|
|
The `switch` type allows for encoding a 'tagged' object, where the tag determines
|
|
the type.
|
|
|
|
```erlang
|
|
E1 = gmser_dyn:encode_typed(#{switch => #{name => binary, age => int}}, #{age => 29}) ->
|
|
[<<0>>,<<1>>,[<<252>>,[[[<<255>>,<<97,103,101>>],[<<248>>,<<29>>]]]]]
|
|
gmser_dyn:decode_typed(#{switch => #{name => binary, age => int}}, E1) ->
|
|
#{age => 29}
|
|
E2 = gmser_dyn:encode_typed(#{switch => #{name => binary, age => int}}, #{name => <<"Ulf">>}) ->
|
|
[<<0>>,<<1>>,[<<252>>,[[[<<255>>,<<110,97,109,101>>],[<<249>>,<<85,108,102>>]]]]]
|
|
gmser_dyn:decode_typed(#{switch => #{name => binary, age => int}}, E1) ->
|
|
#{name => <<"Ulf">>}
|
|
```
|
|
|
|
A practical use of `switch` would be in a protocol schema:
|
|
|
|
```erlang
|
|
t_msg(_) ->
|
|
#{switch => #{ call => t_call
|
|
, reply => t_reply
|
|
, notification => t_notification }}.
|
|
|
|
t_call(_) ->
|
|
#{items => [ {id, anyint}
|
|
, {req, t_req} ]}.
|
|
|
|
t_reply(_) ->
|
|
#{alt => [#{items => [ {id, anyint}
|
|
, {result, t_result} ]},
|
|
#{items => [ {id, anyint}
|
|
, {code, anyint}
|
|
, {message, binary} ]}
|
|
]}.
|
|
```
|
|
|
|
In this scenario, messages are 'taggged' as 1-element maps, e.g.:
|
|
|
|
```erlang
|
|
async_request(Msg) ->
|
|
Id = erlang:unique_integer(),
|
|
gmmp_cp:to_server(
|
|
whereis(gmmp_core_connector),
|
|
#{call => #{ id => Id
|
|
, req => Msg }}),
|
|
Id.
|
|
```
|
|
|
|
### Notes
|
|
|
|
Note that `anyint` is a standard type. The static serializer supports only
|
|
positive integers (`int`), as negative numbers are forbidden on-chain.
|
|
For dynamic encoding e.g. in messaging protocols, handling negative numbers can
|
|
be useful, so the `negint` type was added as a dynamic type. To encode a full-range
|
|
integer, the `alt` construct is needed.
|
|
|
|
(Floats are not supported, as they are non-deterministic. Rationals and fixed-point
|
|
numbers could easily be handled as high-level types, e.g. as `{int,int}`.)
|