diff --git a/RLP.md b/RLP.md index e69de29..45a3090 100644 --- a/RLP.md +++ b/RLP.md @@ -0,0 +1,281 @@ +# RLP: Ethereum Recursive-Length Prefix Codec + +![](./uploads/rlp-thumb-original-1024x576.png) + +## Quick Reference + +1. Erlang implementation: + + +2. TypeScript implementation: + + +3. Ethereum docs: + + +## RLP: overview + +RLP is a standard for taking arbitrary-depth +lists-of-lists-of-...-of-binaries and encoding them as bytestrings. + +We start by defining what the decoded data is: + +```erlang +-type decoded_data() :: binary() | [decoded_data()]. +``` + +## RLP: decode + +Easier to start with the unfamiliar thing and get back out the familiar +thing. + +```erlang +-spec decode(RLP) -> {Data, Rest} + when RLP :: binary(), + Data :: decoded_data(), + Rest :: binary(). +``` + +So we decode however much data we can, hand back the decoded data plus +whatever data we didn't consume. + +In the encoded (binary) data, there's a 1-byte prefix that indicates +both + +1. what type of data it is (list or binary) + +2. how long the payload is (as in how long it encodes to in bytes) + +This is our first branch point. + +Suppose we are decoding a bytestring and the first byte is $B$. Remember +a byte is an integer $0 \le B \le 255$. + +If $0 \le B \le 191$, we are decoding a byte array. If +$191 \le B \le 255$, we are decoding a list. The number $B$ encodes how +long the payload is. + +Specifically: + +- If $0 \le B \le 127$, then the decoded data is literally just the + byte $B$ (really the bytestring `<>`). + + This case is for single-byte bytestrings where the byte is between + $0$ and $127$. + + In the Erlang context, this case is kind of dumb and unnecessary + because it could be covered by the next case. But RLP was developed + in Ethereum, which mostly uses Go/Python/JS. + + A technically correct interpretation of the Ethereum protocol would + have `-type decoded_data() :: 0..127 | binary() | [decoded_data()].` + + There's really no purpose to this case in an Erlang context. It + introduces tons of unnecessary complexity for code that interacts + with our RLP decode procedure. Converting between binaries and + bigints is a big nothing in Erlang. Call `binary:decode_unsigned/1` + or `binary:encode_unsigned/2`. In other languages it's a giant + chore. + + Put another way: pretending the decoded data is either binaries or + LoLs effects no behavioral change in our library or in the calling + code, and only serves to make RLP easier to deal with. But + definitionally it is incorrect. + + I wrote this example RLP code in 2022 (date of writing is 2025) and + have used it in production this entire time, and never once has this + issue come up. I did not even notice that I had written something + technically incorrect until I went to write out this explainer and + was wondering why there were two separate cases that covered + single-byte strings. + + I'm still debating to myself whether or not it's a good idea to go + back and "correct" my code. Probably no. "If it's a bug that people + rely on, it's not a bug, it's a feature." -- Linus Torvalds. + + ```erlang + decode(<>) when Byte =< 127 -> + {<>, Rest}; + ``` + +- If $128 \le B \le 183$, then the decoded data is a bytestring of + length $L$, where $L := B - 128$. Consequently, $0 \le L \le 55$. + + The bytestring (target decoded data) is the following $L$ bytes. + + This case exists for short bytestrings. + + ```erlang + % the length is Byte - 128 + decode(<>) when Byte =< 183 -> + PayloadByteLength = Byte - 128, + %PayloadBitLength = 8 * PayloadByteLength, + %io:format("Byte : ~p~n" + % "Rest : ~w~n" + % "PayloadByteLength : ~p~n", + % %"PayloadBitLength : ~p~n", + % [Byte, Rest, PayloadByteLength]), + <> = Rest, + {Payload, Rest2}; + ``` + +- Suppose $184 \le B \le 191$. This is a trickier case. + + This case is for long bytestrings where the payload is $56$ bytes or + longer. + + Let $M := B - 183$ ($M$ for meta-length). Note $1 \le M \le 8$. + + The following $M$ bytes encode an integer $56 \le L \le 2^{64} - 1$. + + The $L$ bytes after that correspond to the byte array (target + decoded data). + + ```erlang + % bytestring. The byte length of the byte length of bytestring is FirstByte - + % 183. Then pull out the actual data + decode(<>) when Byte =< 191 -> + ByteLengthOfByteLength = Byte - 183, + BitLengthOfByteLength = 8 * ByteLengthOfByteLength, + <> = Rest, + <> = Rest2, + {Payload, Rest3}; + ``` + +- Suppose $192 \le B \le 247$. + + This case is for short lists where the encoded payload is between + $0$ and $55$ bytes. + + Let $L = B - 192$. Then $0 \le L \le 55$. + + $L$ is the byte length of the encoded list payload. + + We'll talk separately about how to decode list payloads. Short + version is you use this `decode/1` function we're writing to decode + individual items. You call it repeatedly on the remainder `Rest` + until it's empty. + + ```erlang + % length of the list-payload is FirstByte - 192. Then the list payload, which + % needs to be decoded on its own. + decode(<>) when Byte =< 247 -> + ByteLengthOfListPayload = Byte - 192, + <> = Rest, + List = decode_list(ListPayload), + {List, Rest2}; + ``` + +- Suppose $248 \le B \le 255$. + + This case is for long lists where the encoded list payload is $56$ + bytes or longer. + + We have the same deal as above where $B$ corresponds to the byte + length of an integer, which is the length of the list payload. + + ```erlang + % The byte length of the byte length of the list-payload is FirstByte - 247. + % Then the byte length of the list. Then the list payload, which needs to be + % decoded on its own. + decode(<>) -> + ByteLengthOfByteLengthOfListPayload_int = Byte - 247, + BitLengthOfByteLengthOfListPayload_int = 8 * ByteLengthOfByteLengthOfListPayload_int, + <> = Rest, + <> = Rest2, + List = decode_list(ListPayload_bytes), + {List, Rest3}. + ``` + +### Decoding lists + +Keep decoding items until you run out of payload. + +```erlang +decode_list(<<>>) -> + []; +decode_list(Bytes) -> + {Item, Rest} = decode(Bytes), + [Item | decode_list(Rest)]. +``` + +## RLP encode procedure + +Now that we understand the decode procedure, the encode procedure should +be pretty obvious. + +If it's a binary, we form a prefix depending on its length. If it's a +list, we encode the list items individually, then form a prefix +depending on the byte-length of all the encoded items concatenated together. + +```erlang +-spec encode(Data) -> RLP + when Data :: decoded_data(), + RLP :: binary(). + +encode(Binary) when is_binary(Binary) -> + encode_binary(Binary); +encode(List) when is_list(List) -> + encode_list(List). +``` + +### Encoding binary data + +```erlang +-spec encode_binary(Bytes) -> RLP + when Bytes :: binary(), + RLP :: binary(). + +% single byte case when the byte is between 0..127 +% result is the byte itself +encode_binary(<>) when Byte =< 127 -> + <>; +% if the bytestring is 0..55 items long, the first byte is 128 + Length, +% the rest of the string is the string +encode_binary(Bytes) when byte_size(Bytes) =< 55 -> + Size = byte_size(Bytes), + <<(128 + Size), Bytes/binary>>; +% more than 55 bytes long, first byte is 183 + ByteLengthOfLength +% max byte size is 2^64 - 1 +encode_binary(Bytes) when 55 < byte_size(Bytes), byte_size(Bytes) < (1 bsl 64) -> + SizeInt = byte_size(Bytes), + SizeBytes = binary:encode_unsigned(SizeInt, big), + SizeOfSizeInt = byte_size(SizeBytes), + %% 183 = 128 + 55 + %% SizeOfSizeInt > 0 + <<(183 + SizeOfSizeInt), + SizeBytes/binary, + Bytes/binary>>. +``` + +### Encoding lists + +```erlang +-spec encode_list(List) -> RLP + when List :: [decoded_data()], + RLP :: binary(). + +% first we encode the total payload of the list +% depending on how long it is, we then branch +encode_list(List) -> + Payload = << (encode(Item)) || Item <- List>>, + Payload_Size = byte_size(Payload), + if + Payload_Size =< 55 -> + <<(192 + Payload_Size), Payload/binary>>; + 55 < Payload_Size -> + SizeBytes = binary:encode_unsigned(Payload_Size, big), + SizeOfSizeInt = byte_size(SizeBytes), + %% 247 = 192 + 55 + %% SizeOfSizeInt > 0 + <<(247 + SizeOfSizeInt), + SizeBytes/binary, + Payload/binary>> + end. +``` diff --git a/uploads/rlp-thumb-original-1024x576.png b/uploads/rlp-thumb-original-1024x576.png new file mode 100644 index 0000000..9f2023d Binary files /dev/null and b/uploads/rlp-thumb-original-1024x576.png differ