RLP: draft

Peter Harpending 2025-03-23 16:20:59 -07:00
parent 09f6a49111
commit e52ef2072e
2 changed files with 281 additions and 0 deletions

281
RLP.md

@ -0,0 +1,281 @@
# RLP: Ethereum Recursive-Length Prefix Codec
![](./uploads/rlp-thumb-original-1024x576.png)
## Quick Reference
1. Erlang implementation:
<https://github.com/aeternity/Vanillae/blob/d5fc02c7e6314b6d32ba914f9bc9ea90bb86c7dd/utils/vw/src/vrlp.erl>
2. TypeScript implementation:
<https://github.com/aeternity/Vanillae/blob/d5fc02c7e6314b6d32ba914f9bc9ea90bb86c7dd/bindings/typescript/src/rlp.ts>
3. Ethereum docs:
<https://ethereum.org/en/developers/docs/data-structures-and-encoding/rlp/>
## RLP: overview
RLP is a standard for taking arbitrary-depth
lists-of-lists-of-...-of-binaries and encoding them as bytestrings.
We start by defining what the decoded data is:
```erlang
-type decoded_data() :: binary() | [decoded_data()].
```
## RLP: decode
Easier to start with the unfamiliar thing and get back out the familiar
thing.
```erlang
-spec decode(RLP) -> {Data, Rest}
when RLP :: binary(),
Data :: decoded_data(),
Rest :: binary().
```
So we decode however much data we can, hand back the decoded data plus
whatever data we didn't consume.
In the encoded (binary) data, there's a 1-byte prefix that indicates
both
1. what type of data it is (list or binary)
2. how long the payload is (as in how long it encodes to in bytes)
This is our first branch point.
Suppose we are decoding a bytestring and the first byte is $B$. Remember
a byte is an integer $0 \le B \le 255$.
If $0 \le B \le 191$, we are decoding a byte array. If
$191 \le B \le 255$, we are decoding a list. The number $B$ encodes how
long the payload is.
Specifically:
- If $0 \le B \le 127$, then the decoded data is literally just the
byte $B$ (really the bytestring `<<B>>`).
This case is for single-byte bytestrings where the byte is between
$0$ and $127$.
In the Erlang context, this case is kind of dumb and unnecessary
because it could be covered by the next case. But RLP was developed
in Ethereum, which mostly uses Go/Python/JS.
A technically correct interpretation of the Ethereum protocol would
have `-type decoded_data() :: 0..127 | binary() | [decoded_data()].`
There's really no purpose to this case in an Erlang context. It
introduces tons of unnecessary complexity for code that interacts
with our RLP decode procedure. Converting between binaries and
bigints is a big nothing in Erlang. Call `binary:decode_unsigned/1`
or `binary:encode_unsigned/2`. In other languages it's a giant
chore.
Put another way: pretending the decoded data is either binaries or
LoLs effects no behavioral change in our library or in the calling
code, and only serves to make RLP easier to deal with. But
definitionally it is incorrect.
I wrote this example RLP code in 2022 (date of writing is 2025) and
have used it in production this entire time, and never once has this
issue come up. I did not even notice that I had written something
technically incorrect until I went to write out this explainer and
was wondering why there were two separate cases that covered
single-byte strings.
I'm still debating to myself whether or not it's a good idea to go
back and "correct" my code. Probably no. "If it's a bug that people
rely on, it's not a bug, it's a feature." -- Linus Torvalds.
```erlang
decode(<<Byte, Rest/binary>>) when Byte =< 127 ->
{<<Byte>>, Rest};
```
- If $128 \le B \le 183$, then the decoded data is a bytestring of
length $L$, where $L := B - 128$. Consequently, $0 \le L \le 55$.
The bytestring (target decoded data) is the following $L$ bytes.
This case exists for short bytestrings.
```erlang
% the length is Byte - 128
decode(<<Byte, Rest/binary>>) when Byte =< 183 ->
PayloadByteLength = Byte - 128,
%PayloadBitLength = 8 * PayloadByteLength,
%io:format("Byte : ~p~n"
% "Rest : ~w~n"
% "PayloadByteLength : ~p~n",
% %"PayloadBitLength : ~p~n",
% [Byte, Rest, PayloadByteLength]),
<<Payload:PayloadByteLength/binary,
Rest2/binary>> = Rest,
{Payload, Rest2};
```
- Suppose $184 \le B \le 191$. This is a trickier case.
This case is for long bytestrings where the payload is $56$ bytes or
longer.
Let $M := B - 183$ ($M$ for meta-length). Note $1 \le M \le 8$.
The following $M$ bytes encode an integer $56 \le L \le 2^{64} - 1$.
The $L$ bytes after that correspond to the byte array (target
decoded data).
```erlang
% bytestring. The byte length of the byte length of bytestring is FirstByte -
% 183. Then pull out the actual data
decode(<<Byte, Rest/binary>>) when Byte =< 191 ->
ByteLengthOfByteLength = Byte - 183,
BitLengthOfByteLength = 8 * ByteLengthOfByteLength,
<<ByteLengthInt:BitLengthOfByteLength,
Rest2/binary>> = Rest,
<<Payload:ByteLengthInt/binary,
Rest3/binary>> = Rest2,
{Payload, Rest3};
```
- Suppose $192 \le B \le 247$.
This case is for short lists where the encoded payload is between
$0$ and $55$ bytes.
Let $L = B - 192$. Then $0 \le L \le 55$.
$L$ is the byte length of the encoded list payload.
We'll talk separately about how to decode list payloads. Short
version is you use this `decode/1` function we're writing to decode
individual items. You call it repeatedly on the remainder `Rest`
until it's empty.
```erlang
% length of the list-payload is FirstByte - 192. Then the list payload, which
% needs to be decoded on its own.
decode(<<Byte, Rest/binary>>) when Byte =< 247 ->
ByteLengthOfListPayload = Byte - 192,
<<ListPayload:ByteLengthOfListPayload/binary,
Rest2/binary>> = Rest,
List = decode_list(ListPayload),
{List, Rest2};
```
- Suppose $248 \le B \le 255$.
This case is for long lists where the encoded list payload is $56$
bytes or longer.
We have the same deal as above where $B$ corresponds to the byte
length of an integer, which is the length of the list payload.
```erlang
% The byte length of the byte length of the list-payload is FirstByte - 247.
% Then the byte length of the list. Then the list payload, which needs to be
% decoded on its own.
decode(<<Byte, Rest/binary>>) ->
ByteLengthOfByteLengthOfListPayload_int = Byte - 247,
BitLengthOfByteLengthOfListPayload_int = 8 * ByteLengthOfByteLengthOfListPayload_int,
<<ByteLengthOfListPayload_int:BitLengthOfByteLengthOfListPayload_int,
Rest2/binary>> = Rest,
<<ListPayload_bytes:ByteLengthOfListPayload_int/binary,
Rest3/binary>> = Rest2,
List = decode_list(ListPayload_bytes),
{List, Rest3}.
```
### Decoding lists
Keep decoding items until you run out of payload.
```erlang
decode_list(<<>>) ->
[];
decode_list(Bytes) ->
{Item, Rest} = decode(Bytes),
[Item | decode_list(Rest)].
```
## RLP encode procedure
Now that we understand the decode procedure, the encode procedure should
be pretty obvious.
If it's a binary, we form a prefix depending on its length. If it's a
list, we encode the list items individually, then form a prefix
depending on the byte-length of all the encoded items concatenated together.
```erlang
-spec encode(Data) -> RLP
when Data :: decoded_data(),
RLP :: binary().
encode(Binary) when is_binary(Binary) ->
encode_binary(Binary);
encode(List) when is_list(List) ->
encode_list(List).
```
### Encoding binary data
```erlang
-spec encode_binary(Bytes) -> RLP
when Bytes :: binary(),
RLP :: binary().
% single byte case when the byte is between 0..127
% result is the byte itself
encode_binary(<<Byte>>) when Byte =< 127 ->
<<Byte>>;
% if the bytestring is 0..55 items long, the first byte is 128 + Length,
% the rest of the string is the string
encode_binary(Bytes) when byte_size(Bytes) =< 55 ->
Size = byte_size(Bytes),
<<(128 + Size), Bytes/binary>>;
% more than 55 bytes long, first byte is 183 + ByteLengthOfLength
% max byte size is 2^64 - 1
encode_binary(Bytes) when 55 < byte_size(Bytes), byte_size(Bytes) < (1 bsl 64) ->
SizeInt = byte_size(Bytes),
SizeBytes = binary:encode_unsigned(SizeInt, big),
SizeOfSizeInt = byte_size(SizeBytes),
%% 183 = 128 + 55
%% SizeOfSizeInt > 0
<<(183 + SizeOfSizeInt),
SizeBytes/binary,
Bytes/binary>>.
```
### Encoding lists
```erlang
-spec encode_list(List) -> RLP
when List :: [decoded_data()],
RLP :: binary().
% first we encode the total payload of the list
% depending on how long it is, we then branch
encode_list(List) ->
Payload = << (encode(Item)) || Item <- List>>,
Payload_Size = byte_size(Payload),
if
Payload_Size =< 55 ->
<<(192 + Payload_Size), Payload/binary>>;
55 < Payload_Size ->
SizeBytes = binary:encode_unsigned(Payload_Size, big),
SizeOfSizeInt = byte_size(SizeBytes),
%% 247 = 192 + 55
%% SizeOfSizeInt > 0
<<(247 + SizeOfSizeInt),
SizeBytes/binary,
Payload/binary>>
end.
```

Binary file not shown.

After

Width:  |  Height:  |  Size: 109 KiB