API Encoding (`xy_ABCD` strings)

tl;dr

%% DECODE: pull the binary data out of an xy_ABCD string
%% deconstructing and pulling useful data out of the resultant bytestring is a
%% different task (see Protocol page)
%% https://git.qpq.swiss/QPQ-AG/gmserialization/src/commit/dda5cac7a91ba0ad105815ec3cb1de551a04d719/src/gmser_api_encoder.erl#L69-L82
{KnownType, Payload} = gmser_api_encoder:decode(ApiStr)

%% ENCODE: from actual data to API garbage
%% https://git.qpq.swiss/QPQ-AG/gmserialization/src/commit/dda5cac7a91ba0ad105815ec3cb1de551a04d719/src/gmser_api_encoder.erl#L57-L67
ApiStr = gmser_api_encoder:encode(KnownType, Payload)

%% ApiStr    :: binary()
%% Payload   :: binary()
%% KnownType :: known_type()

%% https://git.qpq.swiss/QPQ-AG/gmserialization/src/commit/dda5cac7a91ba0ad105815ec3cb1de551a04d719/src/gmser_api_encoder.erl#L18-L46
-type known_type() :: key_block_hash
                    | micro_block_hash
                    | block_pof_hash
                    | block_tx_hash
                    | block_state_hash
                    | block_witness_hash
                    | channel
                    | contract_bytearray
                    | contract_pubkey
                    | contract_store_key
                    | contract_store_value
                    | contract_source
                    | transaction
                    | tx_hash
                    | account_pubkey
                    | account_seckey
                    | associate_chain
                    | entry
                    | signature
                    | name
                    | native_token
                    | commitment
                    | peer_pubkey
                    | state
                    | poi
                    | state_trees
                    | call_state_tree
                    | mp_tree_hash
                    | bytearray.

Introduction

When you are interacting with Gajumaru you often encounter garbage strings like

cb_OwQELwGfAKAgjs50MOABi6flmiNru6qg/5U9bjKymkvlywP7RtPnWASG88h7
th_2H8EreT7LNw43jEG9yL7yvSw3AHbG2TTimMgVTYVFwqh6ucSeV
ak_2CNR6NcNj5cFUa28wmVNyptiadtqcsqhG8qoQpPwULfyLiFHD6

These are called "API strings" or "API encoding" in official lingo. Suppose you have a string xy_ABCD

The xy prefix (cb, th, etc) indicates what sort of data is contained in the rest of the string, and whether it is Base64 or Base58. (See BaseN).

Generally, anything that is both fixed-length and likely to be input manually (e.g. public keys) is going to be Base58, else it will be Base64.
The ABCD part is binary data (plus some check bytes) that is encoded either in Base64 or Base58.
```
add_check_bytes(Bin) when is_binary(Bin) ->
    <<CheckBytes:4/bytes, _/binary>> = crypto:hash(sha256, crypto:hash(sha256, Bin)),
    <<Bin/binary, CheckBytes/binary>>.
```
When you decode the ABCD stuff, it decodes to the binary data suffixed by the 4 check bytes (i.e. the output of the add_check_bytes/1 function above).
Sometimes the binary data is plain data (e.g. account public keys). Sometimes it's compound data (e.g. a transaction).

Example: Decoding Public Keys

For instance, account public keys are encoded in Base58 and have the prefix ak_

To decode one, get out the actual bytes of the public key, and check the check bytes, I wrote this:

do(["akd", AkStr]) ->
    "ak_" ++ Base58Shit = AkStr,
    CheckedBytes = gw_b58:dec(Base58Shit),
    ShaSha4 =
        fun(Bytes) ->
            <<CheckBytes:4/bytes, _/binary>> = crypto:hash(sha256, crypto:hash(sha256, Bytes)),
            CheckBytes
        end,
    <<DataBytes:(byte_size(CheckedBytes) - 4)/bytes,
      CheckBytes:4/bytes>> = CheckedBytes,
    io:format("~p~n", [DataBytes]),
    case ShaSha4(DataBytes) =:= CheckBytes of
        true  -> io:format("checksum: passed~n", []);
        false -> io:format("checksum: failed~n", [])
    end;

[~] % gw akd ak_A3aMregStEULMXyPzNXWEfq1u75yM7BaQ5k8qVhCCpcvCr9Rx
<<20,137,82,130,217,195,19,25,115,137,60,225,221,88,168,194,156,8,88,244,17,30,
  121,7,114,180,61,27,194,44,94,166>>
checksum: passed

Compound Data (e.g. transactions)

If it's compound data (i.e. it has fields), then the binary data you get out of the decode process is going to be encoded in RLP. (See RLP)

RLP-decode is going to give you a list-of-lists-of-\dots-of-binaries.

RLP data you get from Gajumaru is going to be of the format

[Tag, Version | Fields]

The Tag and Version are going to be binaries that you need to pretend are integers.

The tag and version tell you what the format of the fields are going to be.

These formats are documented in Serialization.

The Representation Problem

This problem of what is conceptually one piece of data having 4 different representations that you have to juggle between in your code, I call The Representation Problem. This is the hardest practical problem to deal with as a developer using Gajumaru.

There is no real consistency or convention about when you use one representation over another.

For instance, the public key I showed you above. When you encode it as an ak_... string, you use just the 32-byte public key.

But when you are encoding that same key as a field to use in a spend transaction, you need to encode it as <<1:8, Pubkey/binary>>.

Some functions you call from Erlang code take the public key as an argument. There is no consistency about whether they want the ak_... string or if they want the 32 bytes, etc.

Sometimes the function you call wants some weird data structure. Like the 1:8 thing above corresponds to a convention about what sort of object the public key points to (because you can spend to a normal account, to a contract, to a name, etc; the number in the first byte indicates which type this is). There was some function I called the other day which needed that information, but instead of sending it the 33-byte augmented public key, it wanted a tuple that was something like {{type, account_pubkey}, {key, Bytes}}.

There is no consistency across different modules about which Erlang atoms are used. Sometimes it's account_pubkey, sometimes it's account. It's hell. In practice you have to just look at the source code of the function you're calling and see what it expects.

You also can't expect error messages that tell you what mistake you made and where, particularly when you're interacting with a node over HTTP (generally, HTTP wants API-encoded versions of things).

The Representation Problem is a (moderate) annoyance to excellent programmers. The worst thing it can do is nerdsnipe you and foolishly make you think the problem has some elegant solution and then you spend months trying to solve it before you realize that there is no solution because the problem itself is wrong.

This problem absolutely cripples average programmers trying to use Gajumaru. It's an open question about whether or not that's a good thing (average programmers probably shouldn't be anywhere near code that handles people's money), but also a question that is out of scope.

There are many layers to the onion. Also there are multiple differnt onion schema that sometimes collapse down to the same inner onion and sometimes you have to traverse down one onion and up another and also the onion hates you and it's rotten and poisonous.

All I can really do here is give you a field guide to how to deal with this problem in practice. The best practical solution in general is to quit trying to make sense of computers and give up on this whole programming thing and become a beekeeper instead.