Update RLP

Craig Everett 2026-01-07 16:50:13 +09:00
parent 474dacdf2d
commit 9a47d263e0

562
RLP.md

@ -1,281 +1,281 @@
# RLP: Ethereum Recursive-Length Prefix Codec # RLP: Ethereum Recursive-Length Prefix Codec
![](./uploads/rlp-thumb-original-1024x576.png) ![](./uploads/rlp-thumb-original-1024x576.png)
## Quick Reference ## Quick Reference
1. Erlang implementation: 1. Erlang implementation:
<https://github.com/aeternity/Vanillae/blob/d5fc02c7e6314b6d32ba914f9bc9ea90bb86c7dd/utils/vw/src/vrlp.erl> <https://git.qpq.swiss/QPQ-AG/public-wiki/src/branch/master/snippets/vrlp.erl>
2. TypeScript implementation: 2. TypeScript implementation:
<https://github.com/aeternity/Vanillae/blob/d5fc02c7e6314b6d32ba914f9bc9ea90bb86c7dd/bindings/typescript/src/rlp.ts> <https://git.qpq.swiss/QPQ-AG/public-wiki/src/branch/master/snippets/rlp.ts>
3. Ethereum docs: 3. Ethereum docs:
<https://ethereum.org/en/developers/docs/data-structures-and-encoding/rlp/> <https://ethereum.org/en/developers/docs/data-structures-and-encoding/rlp/>
## RLP: overview ## RLP: overview
RLP is a standard for taking arbitrary-depth RLP is a standard for taking arbitrary-depth
lists-of-lists-of-...-of-binaries and encoding them as bytestrings. lists-of-lists-of-...-of-binaries and encoding them as bytestrings.
We start by defining what the decoded data is: We start by defining what the decoded data is:
```erlang ```erlang
-type decoded_data() :: binary() | [decoded_data()]. -type decoded_data() :: binary() | [decoded_data()].
``` ```
## RLP: decode ## RLP: decode
Easier to start with the unfamiliar thing and get back out the familiar Easier to start with the unfamiliar thing and get back out the familiar
thing. thing.
```erlang ```erlang
-spec decode(RLP) -> {Data, Rest} -spec decode(RLP) -> {Data, Rest}
when RLP :: binary(), when RLP :: binary(),
Data :: decoded_data(), Data :: decoded_data(),
Rest :: binary(). Rest :: binary().
``` ```
So we decode however much data we can, hand back the decoded data plus So we decode however much data we can, hand back the decoded data plus
whatever data we didn't consume. whatever data we didn't consume.
In the encoded (binary) data, there's a 1-byte prefix that indicates In the encoded (binary) data, there's a 1-byte prefix that indicates
both both
1. what type of data it is (list or binary) 1. what type of data it is (list or binary)
2. how long the payload is (as in how long it encodes to in bytes) 2. how long the payload is (as in how long it encodes to in bytes)
This is our first branch point. This is our first branch point.
Suppose we are decoding a bytestring and the first byte is $B$. Remember Suppose we are decoding a bytestring and the first byte is $B$. Remember
a byte is an integer $0 \le B \le 255$. a byte is an integer $0 \le B \le 255$.
If $0 \le B \le 191$, we are decoding a byte array. If If $0 \le B \le 191$, we are decoding a byte array. If
$192 \le B \le 255$, we are decoding a list. The number $B$ encodes how $192 \le B \le 255$, we are decoding a list. The number $B$ encodes how
long the payload is. long the payload is.
Specifically: Specifically:
- If $0 \le B \le 127$, then the decoded data is literally just the - If $0 \le B \le 127$, then the decoded data is literally just the
byte $B$ (really the bytestring `<<B>>`). byte $B$ (really the bytestring `<<B>>`).
This case is for single-byte bytestrings where the byte is between This case is for single-byte bytestrings where the byte is between
$0$ and $127$. $0$ and $127$.
In the Erlang context, this case is kind of dumb and unnecessary In the Erlang context, this case is kind of dumb and unnecessary
because it could be covered by the next case. But RLP was developed because it could be covered by the next case. But RLP was developed
in Ethereum, which mostly uses Go/Python/JS. in Ethereum, which mostly uses Go/Python/JS.
A technically correct interpretation of the Ethereum protocol would A technically correct interpretation of the Ethereum protocol would
have `-type decoded_data() :: 0..127 | binary() | [decoded_data()].` have `-type decoded_data() :: 0..127 | binary() | [decoded_data()].`
There's really no purpose to this case in an Erlang context. It There's really no purpose to this case in an Erlang context. It
introduces tons of unnecessary complexity for code that interacts introduces tons of unnecessary complexity for code that interacts
with our RLP decode procedure. Converting between binaries and with our RLP decode procedure. Converting between binaries and
bigints is a big nothing in Erlang. Call `binary:decode_unsigned/1` bigints is a big nothing in Erlang. Call `binary:decode_unsigned/1`
or `binary:encode_unsigned/2`. In other languages it's a giant or `binary:encode_unsigned/2`. In other languages it's a giant
chore. chore.
Put another way: pretending the decoded data is either binaries or Put another way: pretending the decoded data is either binaries or
LoLs effects no behavioral change in our library or in the calling LoLs effects no behavioral change in our library or in the calling
code, and only serves to make RLP easier to deal with. But code, and only serves to make RLP easier to deal with. But
definitionally it is incorrect. definitionally it is incorrect.
I wrote this example RLP code in 2022 (date of writing is 2025) and I wrote this example RLP code in 2022 (date of writing is 2025) and
have used it in production this entire time, and never once has this have used it in production this entire time, and never once has this
issue come up. I did not even notice that I had written something issue come up. I did not even notice that I had written something
technically incorrect until I went to write out this explainer and technically incorrect until I went to write out this explainer and
was wondering why there were two separate cases that covered was wondering why there were two separate cases that covered
single-byte strings. single-byte strings.
I'm still debating to myself whether or not it's a good idea to go I'm still debating to myself whether or not it's a good idea to go
back and "correct" my code. Probably no. "If it's a bug that people back and "correct" my code. Probably no. "If it's a bug that people
rely on, it's not a bug, it's a feature." -- Linus Torvalds. rely on, it's not a bug, it's a feature." -- Linus Torvalds.
```erlang ```erlang
decode(<<Byte, Rest/binary>>) when Byte =< 127 -> decode(<<Byte, Rest/binary>>) when Byte =< 127 ->
{<<Byte>>, Rest}; {<<Byte>>, Rest};
``` ```
- If $128 \le B \le 183$, then the decoded data is a bytestring of - If $128 \le B \le 183$, then the decoded data is a bytestring of
length $L$, where $L := B - 128$. Consequently, $0 \le L \le 55$. length $L$, where $L := B - 128$. Consequently, $0 \le L \le 55$.
The bytestring (target decoded data) is the following $L$ bytes. The bytestring (target decoded data) is the following $L$ bytes.
This case exists for short bytestrings. This case exists for short bytestrings.
```erlang ```erlang
% the length is Byte - 128 % the length is Byte - 128
decode(<<Byte, Rest/binary>>) when Byte =< 183 -> decode(<<Byte, Rest/binary>>) when Byte =< 183 ->
PayloadByteLength = Byte - 128, PayloadByteLength = Byte - 128,
%PayloadBitLength = 8 * PayloadByteLength, %PayloadBitLength = 8 * PayloadByteLength,
%io:format("Byte : ~p~n" %io:format("Byte : ~p~n"
% "Rest : ~w~n" % "Rest : ~w~n"
% "PayloadByteLength : ~p~n", % "PayloadByteLength : ~p~n",
% %"PayloadBitLength : ~p~n", % %"PayloadBitLength : ~p~n",
% [Byte, Rest, PayloadByteLength]), % [Byte, Rest, PayloadByteLength]),
<<Payload:PayloadByteLength/binary, <<Payload:PayloadByteLength/binary,
Rest2/binary>> = Rest, Rest2/binary>> = Rest,
{Payload, Rest2}; {Payload, Rest2};
``` ```
- Suppose $184 \le B \le 191$. This is a trickier case. - Suppose $184 \le B \le 191$. This is a trickier case.
This case is for long bytestrings where the payload is $56$ bytes or This case is for long bytestrings where the payload is $56$ bytes or
longer. longer.
Let $M := B - 183$ ($M$ for meta-length). Note $1 \le M \le 8$. Let $M := B - 183$ ($M$ for meta-length). Note $1 \le M \le 8$.
The following $M$ bytes encode an integer $56 \le L \le 2^{64} - 1$. The following $M$ bytes encode an integer $56 \le L \le 2^{64} - 1$.
The $L$ bytes after that correspond to the byte array (target The $L$ bytes after that correspond to the byte array (target
decoded data). decoded data).
```erlang ```erlang
% bytestring. The byte length of the byte length of bytestring is FirstByte - % bytestring. The byte length of the byte length of bytestring is FirstByte -
% 183. Then pull out the actual data % 183. Then pull out the actual data
decode(<<Byte, Rest/binary>>) when Byte =< 191 -> decode(<<Byte, Rest/binary>>) when Byte =< 191 ->
ByteLengthOfByteLength = Byte - 183, ByteLengthOfByteLength = Byte - 183,
BitLengthOfByteLength = 8 * ByteLengthOfByteLength, BitLengthOfByteLength = 8 * ByteLengthOfByteLength,
<<ByteLengthInt:BitLengthOfByteLength, <<ByteLengthInt:BitLengthOfByteLength,
Rest2/binary>> = Rest, Rest2/binary>> = Rest,
<<Payload:ByteLengthInt/binary, <<Payload:ByteLengthInt/binary,
Rest3/binary>> = Rest2, Rest3/binary>> = Rest2,
{Payload, Rest3}; {Payload, Rest3};
``` ```
- Suppose $192 \le B \le 247$. - Suppose $192 \le B \le 247$.
This case is for short lists where the encoded payload is between This case is for short lists where the encoded payload is between
$0$ and $55$ bytes. $0$ and $55$ bytes.
Let $L = B - 192$. Then $0 \le L \le 55$. Let $L = B - 192$. Then $0 \le L \le 55$.
$L$ is the byte length of the encoded list payload. $L$ is the byte length of the encoded list payload.
We'll talk separately about how to decode list payloads. Short We'll talk separately about how to decode list payloads. Short
version is you use this `decode/1` function we're writing to decode version is you use this `decode/1` function we're writing to decode
individual items. You call it repeatedly on the remainder `Rest` individual items. You call it repeatedly on the remainder `Rest`
until it's empty. until it's empty.
```erlang ```erlang
% length of the list-payload is FirstByte - 192. Then the list payload, which % length of the list-payload is FirstByte - 192. Then the list payload, which
% needs to be decoded on its own. % needs to be decoded on its own.
decode(<<Byte, Rest/binary>>) when Byte =< 247 -> decode(<<Byte, Rest/binary>>) when Byte =< 247 ->
ByteLengthOfListPayload = Byte - 192, ByteLengthOfListPayload = Byte - 192,
<<ListPayload:ByteLengthOfListPayload/binary, <<ListPayload:ByteLengthOfListPayload/binary,
Rest2/binary>> = Rest, Rest2/binary>> = Rest,
List = decode_list(ListPayload), List = decode_list(ListPayload),
{List, Rest2}; {List, Rest2};
``` ```
- Suppose $248 \le B \le 255$. - Suppose $248 \le B \le 255$.
This case is for long lists where the encoded list payload is $56$ This case is for long lists where the encoded list payload is $56$
bytes or longer. bytes or longer.
We have the same deal as above where $B$ corresponds to the byte We have the same deal as above where $B$ corresponds to the byte
length of an integer, which is the length of the list payload. length of an integer, which is the length of the list payload.
```erlang ```erlang
% The byte length of the byte length of the list-payload is FirstByte - 247. % The byte length of the byte length of the list-payload is FirstByte - 247.
% Then the byte length of the list. Then the list payload, which needs to be % Then the byte length of the list. Then the list payload, which needs to be
% decoded on its own. % decoded on its own.
decode(<<Byte, Rest/binary>>) -> decode(<<Byte, Rest/binary>>) ->
ByteLengthOfByteLengthOfListPayload_int = Byte - 247, ByteLengthOfByteLengthOfListPayload_int = Byte - 247,
BitLengthOfByteLengthOfListPayload_int = 8 * ByteLengthOfByteLengthOfListPayload_int, BitLengthOfByteLengthOfListPayload_int = 8 * ByteLengthOfByteLengthOfListPayload_int,
<<ByteLengthOfListPayload_int:BitLengthOfByteLengthOfListPayload_int, <<ByteLengthOfListPayload_int:BitLengthOfByteLengthOfListPayload_int,
Rest2/binary>> = Rest, Rest2/binary>> = Rest,
<<ListPayload_bytes:ByteLengthOfListPayload_int/binary, <<ListPayload_bytes:ByteLengthOfListPayload_int/binary,
Rest3/binary>> = Rest2, Rest3/binary>> = Rest2,
List = decode_list(ListPayload_bytes), List = decode_list(ListPayload_bytes),
{List, Rest3}. {List, Rest3}.
``` ```
### Decoding lists ### Decoding lists
Keep decoding items until you run out of payload. Keep decoding items until you run out of payload.
```erlang ```erlang
decode_list(<<>>) -> decode_list(<<>>) ->
[]; [];
decode_list(Bytes) -> decode_list(Bytes) ->
{Item, Rest} = decode(Bytes), {Item, Rest} = decode(Bytes),
[Item | decode_list(Rest)]. [Item | decode_list(Rest)].
``` ```
## RLP encode procedure ## RLP encode procedure
Now that we understand the decode procedure, the encode procedure should Now that we understand the decode procedure, the encode procedure should
be pretty obvious. be pretty obvious.
If it's a binary, we form a prefix depending on its length. If it's a If it's a binary, we form a prefix depending on its length. If it's a
list, we encode the list items individually, then form a prefix list, we encode the list items individually, then form a prefix
depending on the byte-length of all the encoded items concatenated together. depending on the byte-length of all the encoded items concatenated together.
```erlang ```erlang
-spec encode(Data) -> RLP -spec encode(Data) -> RLP
when Data :: decoded_data(), when Data :: decoded_data(),
RLP :: binary(). RLP :: binary().
encode(Binary) when is_binary(Binary) -> encode(Binary) when is_binary(Binary) ->
encode_binary(Binary); encode_binary(Binary);
encode(List) when is_list(List) -> encode(List) when is_list(List) ->
encode_list(List). encode_list(List).
``` ```
### Encoding binary data ### Encoding binary data
```erlang ```erlang
-spec encode_binary(Bytes) -> RLP -spec encode_binary(Bytes) -> RLP
when Bytes :: binary(), when Bytes :: binary(),
RLP :: binary(). RLP :: binary().
% single byte case when the byte is between 0..127 % single byte case when the byte is between 0..127
% result is the byte itself % result is the byte itself
encode_binary(<<Byte>>) when Byte =< 127 -> encode_binary(<<Byte>>) when Byte =< 127 ->
<<Byte>>; <<Byte>>;
% if the bytestring is 0..55 items long, the first byte is 128 + Length, % if the bytestring is 0..55 items long, the first byte is 128 + Length,
% the rest of the string is the string % the rest of the string is the string
encode_binary(Bytes) when byte_size(Bytes) =< 55 -> encode_binary(Bytes) when byte_size(Bytes) =< 55 ->
Size = byte_size(Bytes), Size = byte_size(Bytes),
<<(128 + Size), Bytes/binary>>; <<(128 + Size), Bytes/binary>>;
% more than 55 bytes long, first byte is 183 + ByteLengthOfLength % more than 55 bytes long, first byte is 183 + ByteLengthOfLength
% max byte size is 2^64 - 1 % max byte size is 2^64 - 1
encode_binary(Bytes) when 55 < byte_size(Bytes), byte_size(Bytes) < (1 bsl 64) -> encode_binary(Bytes) when 55 < byte_size(Bytes), byte_size(Bytes) < (1 bsl 64) ->
SizeInt = byte_size(Bytes), SizeInt = byte_size(Bytes),
SizeBytes = binary:encode_unsigned(SizeInt, big), SizeBytes = binary:encode_unsigned(SizeInt, big),
SizeOfSizeInt = byte_size(SizeBytes), SizeOfSizeInt = byte_size(SizeBytes),
%% 183 = 128 + 55 %% 183 = 128 + 55
%% SizeOfSizeInt > 0 %% SizeOfSizeInt > 0
<<(183 + SizeOfSizeInt), <<(183 + SizeOfSizeInt),
SizeBytes/binary, SizeBytes/binary,
Bytes/binary>>. Bytes/binary>>.
``` ```
### Encoding lists ### Encoding lists
```erlang ```erlang
-spec encode_list(List) -> RLP -spec encode_list(List) -> RLP
when List :: [decoded_data()], when List :: [decoded_data()],
RLP :: binary(). RLP :: binary().
% first we encode the total payload of the list % first we encode the total payload of the list
% depending on how long it is, we then branch % depending on how long it is, we then branch
encode_list(List) -> encode_list(List) ->
Payload = << (encode(Item)) || Item <- List>>, Payload = << (encode(Item)) || Item <- List>>,
Payload_Size = byte_size(Payload), Payload_Size = byte_size(Payload),
if if
Payload_Size =< 55 -> Payload_Size =< 55 ->
<<(192 + Payload_Size), Payload/binary>>; <<(192 + Payload_Size), Payload/binary>>;
55 < Payload_Size -> 55 < Payload_Size ->
SizeBytes = binary:encode_unsigned(Payload_Size, big), SizeBytes = binary:encode_unsigned(Payload_Size, big),
SizeOfSizeInt = byte_size(SizeBytes), SizeOfSizeInt = byte_size(SizeBytes),
%% 247 = 192 + 55 %% 247 = 192 + 55
%% SizeOfSizeInt > 0 %% SizeOfSizeInt > 0
<<(247 + SizeOfSizeInt), <<(247 + SizeOfSizeInt),
SizeBytes/binary, SizeBytes/binary,
Payload/binary>> Payload/binary>>
end. end.
``` ```