gmserialization/README.md
Ulf Wiger dd1c2455f0
All checks were successful
Gajumaru Serialization Tests / tests (push) Successful in 48m53s
Fix type-driven encode, more docs
2025-04-05 21:44:36 +02:00

115 lines
3.9 KiB
Markdown

GM Serialization
=====
Serialization helpers for the Gajumaru.
Build
-----
$ rebar3 compile
Test
----
$ rebar3 eunit
Dynamic encoding
----
The module `gmser_dyn` offers dynamic encoding support, encoding most 'regular'
Erlang data types into an internal RLP representation.
Main API:
* `encode(term()) -> iolist()`
* `encode_typed(template(), term()) -> iolist()`
* `decode(iolist()) -> term()`
* `serialize(term()) -> binary()`
* `serialize_typed(template(), term()) -> binary()`
* `deserialize(binary()) -> term()`
In the examples below, we use the `decode` functions, to illustrate
how the type information is represented. The fully serialized form is
produced by the `serialize` functions.
The basic types supported by the encoder are:
* `non_neg_integer()` (`int` , code: 248)
* `binary()` (`binary`, code: 249)
* `boolean()` (`bool` , code: 250)
* `list()` (`list` , code: 251)
* `map()` (`map` , code: 252)
* `tuple()` (`tuple` , code: 253)
* `gmser_id:id()` (`id` , code: 254)
* `atom()` (`label` , code: 255)
When encoding `map` types, the map elements are first sorted.
When specifying a map type for template-driven encoding, use
the `#{items => [{Key, Value}]}` construct.
Labels
----
Labels correspond to (existing) atoms in Erlang.
Decoding of a label results in a call to `binary_to_existing_atom/2`, so will
fail if the corresponding atom does not already exist.
It's possible to cache labels for more compact encoding.
Note that when caching labels, the same cache mapping needs to be used on the
decoder side.
Labels are encoded as `[<<255>>, << AtomToBinary/binary >>]`.
If a cached label is used, the encoding becomes `[<<255>, [Ix]]`, where
`Ix` is the integer-encoded index value of the cached label.
Examples
----
Dynamically encoded objects have the basic structure `[<<0>>,V,Obj]`, where `V` is the
integer-coded version, and `Obj` is the top-level encoding on the form `[Tag,Data]`.
```erlang
E = fun(T) -> io:fwrite("~w~n", [gmser_dyn:encode(T)]) end.
E(17) -> [<<0>>,<<1>>,[<<248>>,<<17>>]]
E(<<"abc">>) -> [<<0>>,<<1>>,[<<249>>,<<97,98,99>>]]
E(true) -> [<<0>>,<<1>>,[<<250>>,<<1>>]]
E(false) -> [<<0>>,<<1>>,[<<250>>,<<0>>]]
E([1,2]) -> [<<0>>,<<1>>,[<<251>>,[[<<248>>,<<1>>],[<<248>>,<<2>>]]]]
E({1,2}) -> [<<0>>,<<1>>,[<<253>>,[[<<248>>,<<1>>],[<<248>>,<<2>>]]]]
E(#{a=>1, b=>2}) ->
[<<0>>,<<1>>,[<<252>>,[[[<<255>>,<<97>>],[<<248>>,<<1>>]],[[<<255>>,<<98>>],[<<248>>,<<2>>]]]]]
E(gmser_id:create(account,<<1:256>>)) ->
[<<0>>,<<1>>,[<<254>>,<<1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1>>]]
```
Note that tuples and list are encoded the same way, except for the initial type tag.
Maps are encoded as `[<Map>, [KV1, KV2, ...]]`, where `[KV1, KV2, ...]` is the sorted
list of key-value tuples from `map:to_list(Map)`, but with the `tuple` type tag omitted.
Template-driven encoding
----
Templates can be provided to the encoder by either naming an already registered
type, or by passing a template directly. In both cases, the encoder will enforce
the type information in the template.
If the template has been registered, the encoder omits inner type tags (still
inserting the top-level tag), leading to some compression of the output.
This also means that the serialized term cannot be decoded without the same
schema information on the decoder side.
In the case of a directly provided template, all type information is inserted,
such that the serialized term can be decoded without any added type information.
The template types are still enforced during encoding.
```erlang
ET = fun(Type,Term) -> io:fwrite("~w~n", [gmser_dyn:encode_typed(Type,Term)]) end.
ET([{int,int}], [{1,2}]) -> [<<0>>,<<1>>,[<<251>>,[[[<<248>>,<<1>>],[<<248>>,<<2>>]]]]]
gmser_dyn:register_type(1000,lt2i,[{int,int}]).
ET(lt2i, [{1,2}]) -> [<<0>>,<<1>>,[<<3,232>>,[[<<1>>,<<2>>]]]]
```