163 lines
5.9 KiB
Markdown
163 lines
5.9 KiB
Markdown
# GM Serialization
|
|
|
|
Serialization helpers for the Gajumaru.
|
|
|
|
For an overview of the static serializer, see [this document](doc/static.md).
|
|
|
|
## Build
|
|
|
|
$ rebar3 compile
|
|
|
|
|
|
## Test
|
|
|
|
$ rebar3 eunit
|
|
|
|
## Dynamic encoding
|
|
|
|
The module `gmser_dyn` offers dynamic encoding support, encoding most 'regular'
|
|
Erlang data types into an internal RLP representation.
|
|
|
|
Main API:
|
|
* `encode(term()) -> iolist()`
|
|
* `encode_typed(template(), term()) -> iolist()`
|
|
* `decode(iolist()) -> term()`
|
|
|
|
* `serialize(term()) -> binary()`
|
|
* `serialize_typed(template(), term()) -> binary()`
|
|
* `deserialize(binary()) -> term()`
|
|
|
|
In the examples below, we use the `decode` functions, to illustrate
|
|
how the type information is represented. The fully serialized form is
|
|
produced by the `serialize` functions.
|
|
|
|
The basic types supported by the encoder are:
|
|
* `integer()` (`anyint`, code: 246)
|
|
* `neg_integer()` (`negint`, code: 247)
|
|
* `non_neg_integer()` (`int` , code: 248)
|
|
* `binary()` (`binary`, code: 249)
|
|
* `boolean()` (`bool` , code: 250)
|
|
* `list()` (`list` , code: 251)
|
|
* `map()` (`map` , code: 252)
|
|
* `tuple()` (`tuple` , code: 253)
|
|
* `gmser_id:id()` (`id` , code: 254)
|
|
* `atom()` (`label` , code: 255)
|
|
|
|
(The range of codes is chosen because the `gmser_chain_objects` codes
|
|
range from 10 to 200, and also to stay within 1 byte.)
|
|
|
|
When encoding `map` types, the map elements are first sorted.
|
|
|
|
When specifying a map type for template-driven encoding, use
|
|
the `#{items => [{Key, Value}]}` construct.
|
|
|
|
|
|
## Labels
|
|
|
|
Labels correspond to (existing) atoms in Erlang.
|
|
Decoding of a label results in a call to `binary_to_existing_atom/2`, so will
|
|
fail if the corresponding atom does not already exist.
|
|
|
|
This behavior can be modified using the option `#{missing_labels => fail | create | convert}`,
|
|
where `fail` is the default, as described above, `convert` means that missing atoms are
|
|
converted to binaries, and `create` means that the atom is created dynamically.
|
|
|
|
The option can be passed e.g.:
|
|
```erlang
|
|
gmser_dyn:deserialize(Binary, set_opts(#{missing_labels => convert}))
|
|
```
|
|
|
|
or
|
|
```erlang
|
|
gmser_dyn:deserialize(Binary, set_opts(#{missing_labels => convert}, Types))
|
|
```
|
|
|
|
By calling `gmser_dyn:register_types/1`, after having added options to the type map,
|
|
the options can be made to take effect automatically.
|
|
|
|
|
|
It's possible to cache labels for more compact encoding.
|
|
Note that when caching labels, the same cache mapping needs to be used on the
|
|
decoder side.
|
|
|
|
Labels are encoded as `[<<255>>, << AtomToBinary/binary >>]`.
|
|
If a cached label is used, the encoding becomes `[<<255>, [Ix]]`, where
|
|
`Ix` is the integer-encoded index value of the cached label.
|
|
|
|
## Examples
|
|
|
|
Dynamically encoded objects have the basic structure `[<<0>>,V,Obj]`, where `V` is the
|
|
integer-coded version, and `Obj` is the top-level encoding on the form `[Tag,Data]`.
|
|
|
|
```erlang
|
|
E = fun(T) -> io:fwrite("~w~n", [gmser_dyn:encode(T)]) end.
|
|
|
|
E(17) -> [<<0>>,<<1>>,[<<248>>,<<17>>]]
|
|
E(<<"abc">>) -> [<<0>>,<<1>>,[<<249>>,<<97,98,99>>]]
|
|
E(true) -> [<<0>>,<<1>>,[<<250>>,<<1>>]]
|
|
E(false) -> [<<0>>,<<1>>,[<<250>>,<<0>>]]
|
|
E([1,2]) -> [<<0>>,<<1>>,[<<251>>,[[<<248>>,<<1>>],[<<248>>,<<2>>]]]]
|
|
E({1,2}) -> [<<0>>,<<1>>,[<<253>>,[[<<248>>,<<1>>],[<<248>>,<<2>>]]]]
|
|
E(#{a=>1, b=>2}) ->
|
|
[<<0>>,<<1>>,[<<252>>,[[[<<255>>,<<97>>],[<<248>>,<<1>>]],[[<<255>>,<<98>>],[<<248>>,<<2>>]]]]]
|
|
E(gmser_id:create(account,<<1:256>>)) ->
|
|
[<<0>>,<<1>>,[<<254>>,<<1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1>>]]
|
|
```
|
|
|
|
Note that tuples and list are encoded the same way, except for the initial type tag.
|
|
Maps are encoded as `[<Map>, [KV1, KV2, ...]]`, where `[KV1, KV2, ...]` is the sorted
|
|
list of key-value tuples from `map:to_list(Map)`, but with the `tuple` type tag omitted.
|
|
|
|
## Template-driven encoding
|
|
|
|
Templates can be provided to the encoder by either naming an already registered
|
|
type, or by passing a template directly. In both cases, the encoder will enforce
|
|
the type information in the template.
|
|
|
|
If the template has been registered, the encoder omits inner type tags (still
|
|
inserting the top-level tag), leading to some compression of the output.
|
|
This also means that the serialized term cannot be decoded without the same
|
|
schema information on the decoder side.
|
|
|
|
In the case of a directly provided template, all type information is inserted,
|
|
such that the serialized term can be decoded without any added type information.
|
|
The template types are still enforced during encoding.
|
|
|
|
```erlang
|
|
ET = fun(Type,Term) -> io:fwrite("~w~n", [gmser_dyn:encode_typed(Type,Term)]) end.
|
|
|
|
ET([{int,int}], [{1,2}]) -> [<<0>>,<<1>>,[<<251>>,[[[<<248>>,<<1>>],[<<248>>,<<2>>]]]]]
|
|
|
|
gmser_dyn:register_type(1000,lt2i,[{int,int}]).
|
|
ET(lt2i, [{1,2}]) -> [<<0>>,<<1>>,[<<3,232>>,[[<<1>>,<<2>>]]]]
|
|
```
|
|
|
|
### Alternative types
|
|
|
|
The dynamic encoder supports two additions to the `gmserialization` template
|
|
language: `any` and `#{alt => [AltTypes]}`.
|
|
|
|
The `any` type doesn't have an associated code, but enforces dynamic encoding.
|
|
|
|
The `#{alt => [Type]}` construct also enforces dynamic encoding, and will try
|
|
to encode as each type in the list, in the specified order, until one matches.
|
|
|
|
```erlang
|
|
gmser_dyn:encode_typed(#{alt => [negint,int]}, 5) -> [<<0>>,<<1>>,[<<247>>,<<5>>]]
|
|
gmser_dyn:encode_typed(#{alt => [negint,int]}, 5) -> [<<0>>,<<1>>,[<<248>>,<<5>>]]
|
|
|
|
gmser_dyn:encode_typed(anyint,-5) -> [<<0>>,<<1>>,[<<246>>,[<<247>>,<<5>>]]]
|
|
gmser_dyn:encode_typed(anyint,5) -> [<<0>>,<<1>>,[<<246>>,[<<248>>,<<5>>]]]
|
|
```
|
|
|
|
### Notes
|
|
|
|
Note that `anyint` is a standard type. The static serializer supports only
|
|
positive integers (`int`), as negative numbers are forbidden on-chain.
|
|
For dynamic encoding e.g. in messaging protocols, handling negative numbers can
|
|
be useful, so the `negint` type was added as a dynamic type. To encode a full-range
|
|
integer, the `alt` construct is needed.
|
|
|
|
(Floats are not supported, as they are non-deterministic. Rationals and fixed-point
|
|
numbers could easily be handled as high-level types, e.g. as `{int,int}`.)
|