RLP: draft
parent
09f6a49111
commit
e52ef2072e
281
RLP.md
281
RLP.md
@ -0,0 +1,281 @@
|
|||||||
|
# RLP: Ethereum Recursive-Length Prefix Codec
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
## Quick Reference
|
||||||
|
|
||||||
|
1. Erlang implementation:
|
||||||
|
<https://github.com/aeternity/Vanillae/blob/d5fc02c7e6314b6d32ba914f9bc9ea90bb86c7dd/utils/vw/src/vrlp.erl>
|
||||||
|
|
||||||
|
2. TypeScript implementation:
|
||||||
|
<https://github.com/aeternity/Vanillae/blob/d5fc02c7e6314b6d32ba914f9bc9ea90bb86c7dd/bindings/typescript/src/rlp.ts>
|
||||||
|
|
||||||
|
3. Ethereum docs:
|
||||||
|
<https://ethereum.org/en/developers/docs/data-structures-and-encoding/rlp/>
|
||||||
|
|
||||||
|
## RLP: overview
|
||||||
|
|
||||||
|
RLP is a standard for taking arbitrary-depth
|
||||||
|
lists-of-lists-of-...-of-binaries and encoding them as bytestrings.
|
||||||
|
|
||||||
|
We start by defining what the decoded data is:
|
||||||
|
|
||||||
|
```erlang
|
||||||
|
-type decoded_data() :: binary() | [decoded_data()].
|
||||||
|
```
|
||||||
|
|
||||||
|
## RLP: decode
|
||||||
|
|
||||||
|
Easier to start with the unfamiliar thing and get back out the familiar
|
||||||
|
thing.
|
||||||
|
|
||||||
|
```erlang
|
||||||
|
-spec decode(RLP) -> {Data, Rest}
|
||||||
|
when RLP :: binary(),
|
||||||
|
Data :: decoded_data(),
|
||||||
|
Rest :: binary().
|
||||||
|
```
|
||||||
|
|
||||||
|
So we decode however much data we can, hand back the decoded data plus
|
||||||
|
whatever data we didn't consume.
|
||||||
|
|
||||||
|
In the encoded (binary) data, there's a 1-byte prefix that indicates
|
||||||
|
both
|
||||||
|
|
||||||
|
1. what type of data it is (list or binary)
|
||||||
|
|
||||||
|
2. how long the payload is (as in how long it encodes to in bytes)
|
||||||
|
|
||||||
|
This is our first branch point.
|
||||||
|
|
||||||
|
Suppose we are decoding a bytestring and the first byte is $B$. Remember
|
||||||
|
a byte is an integer $0 \le B \le 255$.
|
||||||
|
|
||||||
|
If $0 \le B \le 191$, we are decoding a byte array. If
|
||||||
|
$191 \le B \le 255$, we are decoding a list. The number $B$ encodes how
|
||||||
|
long the payload is.
|
||||||
|
|
||||||
|
Specifically:
|
||||||
|
|
||||||
|
- If $0 \le B \le 127$, then the decoded data is literally just the
|
||||||
|
byte $B$ (really the bytestring `<<B>>`).
|
||||||
|
|
||||||
|
This case is for single-byte bytestrings where the byte is between
|
||||||
|
$0$ and $127$.
|
||||||
|
|
||||||
|
In the Erlang context, this case is kind of dumb and unnecessary
|
||||||
|
because it could be covered by the next case. But RLP was developed
|
||||||
|
in Ethereum, which mostly uses Go/Python/JS.
|
||||||
|
|
||||||
|
A technically correct interpretation of the Ethereum protocol would
|
||||||
|
have `-type decoded_data() :: 0..127 | binary() | [decoded_data()].`
|
||||||
|
|
||||||
|
There's really no purpose to this case in an Erlang context. It
|
||||||
|
introduces tons of unnecessary complexity for code that interacts
|
||||||
|
with our RLP decode procedure. Converting between binaries and
|
||||||
|
bigints is a big nothing in Erlang. Call `binary:decode_unsigned/1`
|
||||||
|
or `binary:encode_unsigned/2`. In other languages it's a giant
|
||||||
|
chore.
|
||||||
|
|
||||||
|
Put another way: pretending the decoded data is either binaries or
|
||||||
|
LoLs effects no behavioral change in our library or in the calling
|
||||||
|
code, and only serves to make RLP easier to deal with. But
|
||||||
|
definitionally it is incorrect.
|
||||||
|
|
||||||
|
I wrote this example RLP code in 2022 (date of writing is 2025) and
|
||||||
|
have used it in production this entire time, and never once has this
|
||||||
|
issue come up. I did not even notice that I had written something
|
||||||
|
technically incorrect until I went to write out this explainer and
|
||||||
|
was wondering why there were two separate cases that covered
|
||||||
|
single-byte strings.
|
||||||
|
|
||||||
|
I'm still debating to myself whether or not it's a good idea to go
|
||||||
|
back and "correct" my code. Probably no. "If it's a bug that people
|
||||||
|
rely on, it's not a bug, it's a feature." -- Linus Torvalds.
|
||||||
|
|
||||||
|
```erlang
|
||||||
|
decode(<<Byte, Rest/binary>>) when Byte =< 127 ->
|
||||||
|
{<<Byte>>, Rest};
|
||||||
|
```
|
||||||
|
|
||||||
|
- If $128 \le B \le 183$, then the decoded data is a bytestring of
|
||||||
|
length $L$, where $L := B - 128$. Consequently, $0 \le L \le 55$.
|
||||||
|
|
||||||
|
The bytestring (target decoded data) is the following $L$ bytes.
|
||||||
|
|
||||||
|
This case exists for short bytestrings.
|
||||||
|
|
||||||
|
```erlang
|
||||||
|
% the length is Byte - 128
|
||||||
|
decode(<<Byte, Rest/binary>>) when Byte =< 183 ->
|
||||||
|
PayloadByteLength = Byte - 128,
|
||||||
|
%PayloadBitLength = 8 * PayloadByteLength,
|
||||||
|
%io:format("Byte : ~p~n"
|
||||||
|
% "Rest : ~w~n"
|
||||||
|
% "PayloadByteLength : ~p~n",
|
||||||
|
% %"PayloadBitLength : ~p~n",
|
||||||
|
% [Byte, Rest, PayloadByteLength]),
|
||||||
|
<<Payload:PayloadByteLength/binary,
|
||||||
|
Rest2/binary>> = Rest,
|
||||||
|
{Payload, Rest2};
|
||||||
|
```
|
||||||
|
|
||||||
|
- Suppose $184 \le B \le 191$. This is a trickier case.
|
||||||
|
|
||||||
|
This case is for long bytestrings where the payload is $56$ bytes or
|
||||||
|
longer.
|
||||||
|
|
||||||
|
Let $M := B - 183$ ($M$ for meta-length). Note $1 \le M \le 8$.
|
||||||
|
|
||||||
|
The following $M$ bytes encode an integer $56 \le L \le 2^{64} - 1$.
|
||||||
|
|
||||||
|
The $L$ bytes after that correspond to the byte array (target
|
||||||
|
decoded data).
|
||||||
|
|
||||||
|
```erlang
|
||||||
|
% bytestring. The byte length of the byte length of bytestring is FirstByte -
|
||||||
|
% 183. Then pull out the actual data
|
||||||
|
decode(<<Byte, Rest/binary>>) when Byte =< 191 ->
|
||||||
|
ByteLengthOfByteLength = Byte - 183,
|
||||||
|
BitLengthOfByteLength = 8 * ByteLengthOfByteLength,
|
||||||
|
<<ByteLengthInt:BitLengthOfByteLength,
|
||||||
|
Rest2/binary>> = Rest,
|
||||||
|
<<Payload:ByteLengthInt/binary,
|
||||||
|
Rest3/binary>> = Rest2,
|
||||||
|
{Payload, Rest3};
|
||||||
|
```
|
||||||
|
|
||||||
|
- Suppose $192 \le B \le 247$.
|
||||||
|
|
||||||
|
This case is for short lists where the encoded payload is between
|
||||||
|
$0$ and $55$ bytes.
|
||||||
|
|
||||||
|
Let $L = B - 192$. Then $0 \le L \le 55$.
|
||||||
|
|
||||||
|
$L$ is the byte length of the encoded list payload.
|
||||||
|
|
||||||
|
We'll talk separately about how to decode list payloads. Short
|
||||||
|
version is you use this `decode/1` function we're writing to decode
|
||||||
|
individual items. You call it repeatedly on the remainder `Rest`
|
||||||
|
until it's empty.
|
||||||
|
|
||||||
|
```erlang
|
||||||
|
% length of the list-payload is FirstByte - 192. Then the list payload, which
|
||||||
|
% needs to be decoded on its own.
|
||||||
|
decode(<<Byte, Rest/binary>>) when Byte =< 247 ->
|
||||||
|
ByteLengthOfListPayload = Byte - 192,
|
||||||
|
<<ListPayload:ByteLengthOfListPayload/binary,
|
||||||
|
Rest2/binary>> = Rest,
|
||||||
|
List = decode_list(ListPayload),
|
||||||
|
{List, Rest2};
|
||||||
|
```
|
||||||
|
|
||||||
|
- Suppose $248 \le B \le 255$.
|
||||||
|
|
||||||
|
This case is for long lists where the encoded list payload is $56$
|
||||||
|
bytes or longer.
|
||||||
|
|
||||||
|
We have the same deal as above where $B$ corresponds to the byte
|
||||||
|
length of an integer, which is the length of the list payload.
|
||||||
|
|
||||||
|
```erlang
|
||||||
|
% The byte length of the byte length of the list-payload is FirstByte - 247.
|
||||||
|
% Then the byte length of the list. Then the list payload, which needs to be
|
||||||
|
% decoded on its own.
|
||||||
|
decode(<<Byte, Rest/binary>>) ->
|
||||||
|
ByteLengthOfByteLengthOfListPayload_int = Byte - 247,
|
||||||
|
BitLengthOfByteLengthOfListPayload_int = 8 * ByteLengthOfByteLengthOfListPayload_int,
|
||||||
|
<<ByteLengthOfListPayload_int:BitLengthOfByteLengthOfListPayload_int,
|
||||||
|
Rest2/binary>> = Rest,
|
||||||
|
<<ListPayload_bytes:ByteLengthOfListPayload_int/binary,
|
||||||
|
Rest3/binary>> = Rest2,
|
||||||
|
List = decode_list(ListPayload_bytes),
|
||||||
|
{List, Rest3}.
|
||||||
|
```
|
||||||
|
|
||||||
|
### Decoding lists
|
||||||
|
|
||||||
|
Keep decoding items until you run out of payload.
|
||||||
|
|
||||||
|
```erlang
|
||||||
|
decode_list(<<>>) ->
|
||||||
|
[];
|
||||||
|
decode_list(Bytes) ->
|
||||||
|
{Item, Rest} = decode(Bytes),
|
||||||
|
[Item | decode_list(Rest)].
|
||||||
|
```
|
||||||
|
|
||||||
|
## RLP encode procedure
|
||||||
|
|
||||||
|
Now that we understand the decode procedure, the encode procedure should
|
||||||
|
be pretty obvious.
|
||||||
|
|
||||||
|
If it's a binary, we form a prefix depending on its length. If it's a
|
||||||
|
list, we encode the list items individually, then form a prefix
|
||||||
|
depending on the byte-length of all the encoded items concatenated together.
|
||||||
|
|
||||||
|
```erlang
|
||||||
|
-spec encode(Data) -> RLP
|
||||||
|
when Data :: decoded_data(),
|
||||||
|
RLP :: binary().
|
||||||
|
|
||||||
|
encode(Binary) when is_binary(Binary) ->
|
||||||
|
encode_binary(Binary);
|
||||||
|
encode(List) when is_list(List) ->
|
||||||
|
encode_list(List).
|
||||||
|
```
|
||||||
|
|
||||||
|
### Encoding binary data
|
||||||
|
|
||||||
|
```erlang
|
||||||
|
-spec encode_binary(Bytes) -> RLP
|
||||||
|
when Bytes :: binary(),
|
||||||
|
RLP :: binary().
|
||||||
|
|
||||||
|
% single byte case when the byte is between 0..127
|
||||||
|
% result is the byte itself
|
||||||
|
encode_binary(<<Byte>>) when Byte =< 127 ->
|
||||||
|
<<Byte>>;
|
||||||
|
% if the bytestring is 0..55 items long, the first byte is 128 + Length,
|
||||||
|
% the rest of the string is the string
|
||||||
|
encode_binary(Bytes) when byte_size(Bytes) =< 55 ->
|
||||||
|
Size = byte_size(Bytes),
|
||||||
|
<<(128 + Size), Bytes/binary>>;
|
||||||
|
% more than 55 bytes long, first byte is 183 + ByteLengthOfLength
|
||||||
|
% max byte size is 2^64 - 1
|
||||||
|
encode_binary(Bytes) when 55 < byte_size(Bytes), byte_size(Bytes) < (1 bsl 64) ->
|
||||||
|
SizeInt = byte_size(Bytes),
|
||||||
|
SizeBytes = binary:encode_unsigned(SizeInt, big),
|
||||||
|
SizeOfSizeInt = byte_size(SizeBytes),
|
||||||
|
%% 183 = 128 + 55
|
||||||
|
%% SizeOfSizeInt > 0
|
||||||
|
<<(183 + SizeOfSizeInt),
|
||||||
|
SizeBytes/binary,
|
||||||
|
Bytes/binary>>.
|
||||||
|
```
|
||||||
|
|
||||||
|
### Encoding lists
|
||||||
|
|
||||||
|
```erlang
|
||||||
|
-spec encode_list(List) -> RLP
|
||||||
|
when List :: [decoded_data()],
|
||||||
|
RLP :: binary().
|
||||||
|
|
||||||
|
% first we encode the total payload of the list
|
||||||
|
% depending on how long it is, we then branch
|
||||||
|
encode_list(List) ->
|
||||||
|
Payload = << (encode(Item)) || Item <- List>>,
|
||||||
|
Payload_Size = byte_size(Payload),
|
||||||
|
if
|
||||||
|
Payload_Size =< 55 ->
|
||||||
|
<<(192 + Payload_Size), Payload/binary>>;
|
||||||
|
55 < Payload_Size ->
|
||||||
|
SizeBytes = binary:encode_unsigned(Payload_Size, big),
|
||||||
|
SizeOfSizeInt = byte_size(SizeBytes),
|
||||||
|
%% 247 = 192 + 55
|
||||||
|
%% SizeOfSizeInt > 0
|
||||||
|
<<(247 + SizeOfSizeInt),
|
||||||
|
SizeBytes/binary,
|
||||||
|
Payload/binary>>
|
||||||
|
end.
|
||||||
|
```
|
BIN
uploads/rlp-thumb-original-1024x576.png
Normal file
BIN
uploads/rlp-thumb-original-1024x576.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 109 KiB |
Loading…
x
Reference in New Issue
Block a user