diff --git a/RLP.md b/RLP.md index de5449d..32f0246 100644 --- a/RLP.md +++ b/RLP.md @@ -1,281 +1,281 @@ -# RLP: Ethereum Recursive-Length Prefix Codec - -![](./uploads/rlp-thumb-original-1024x576.png) - -## Quick Reference - -1. Erlang implementation: - - -2. TypeScript implementation: - - -3. Ethereum docs: - - -## RLP: overview - -RLP is a standard for taking arbitrary-depth -lists-of-lists-of-...-of-binaries and encoding them as bytestrings. - -We start by defining what the decoded data is: - -```erlang --type decoded_data() :: binary() | [decoded_data()]. -``` - -## RLP: decode - -Easier to start with the unfamiliar thing and get back out the familiar -thing. - -```erlang --spec decode(RLP) -> {Data, Rest} - when RLP :: binary(), - Data :: decoded_data(), - Rest :: binary(). -``` - -So we decode however much data we can, hand back the decoded data plus -whatever data we didn't consume. - -In the encoded (binary) data, there's a 1-byte prefix that indicates -both - -1. what type of data it is (list or binary) - -2. how long the payload is (as in how long it encodes to in bytes) - -This is our first branch point. - -Suppose we are decoding a bytestring and the first byte is $B$. Remember -a byte is an integer $0 \le B \le 255$. - -If $0 \le B \le 191$, we are decoding a byte array. If -$192 \le B \le 255$, we are decoding a list. The number $B$ encodes how -long the payload is. - -Specifically: - -- If $0 \le B \le 127$, then the decoded data is literally just the - byte $B$ (really the bytestring `<>`). - - This case is for single-byte bytestrings where the byte is between - $0$ and $127$. - - In the Erlang context, this case is kind of dumb and unnecessary - because it could be covered by the next case. But RLP was developed - in Ethereum, which mostly uses Go/Python/JS. - - A technically correct interpretation of the Ethereum protocol would - have `-type decoded_data() :: 0..127 | binary() | [decoded_data()].` - - There's really no purpose to this case in an Erlang context. It - introduces tons of unnecessary complexity for code that interacts - with our RLP decode procedure. Converting between binaries and - bigints is a big nothing in Erlang. Call `binary:decode_unsigned/1` - or `binary:encode_unsigned/2`. In other languages it's a giant - chore. - - Put another way: pretending the decoded data is either binaries or - LoLs effects no behavioral change in our library or in the calling - code, and only serves to make RLP easier to deal with. But - definitionally it is incorrect. - - I wrote this example RLP code in 2022 (date of writing is 2025) and - have used it in production this entire time, and never once has this - issue come up. I did not even notice that I had written something - technically incorrect until I went to write out this explainer and - was wondering why there were two separate cases that covered - single-byte strings. - - I'm still debating to myself whether or not it's a good idea to go - back and "correct" my code. Probably no. "If it's a bug that people - rely on, it's not a bug, it's a feature." -- Linus Torvalds. - - ```erlang - decode(<>) when Byte =< 127 -> - {<>, Rest}; - ``` - -- If $128 \le B \le 183$, then the decoded data is a bytestring of - length $L$, where $L := B - 128$. Consequently, $0 \le L \le 55$. - - The bytestring (target decoded data) is the following $L$ bytes. - - This case exists for short bytestrings. - - ```erlang - % the length is Byte - 128 - decode(<>) when Byte =< 183 -> - PayloadByteLength = Byte - 128, - %PayloadBitLength = 8 * PayloadByteLength, - %io:format("Byte : ~p~n" - % "Rest : ~w~n" - % "PayloadByteLength : ~p~n", - % %"PayloadBitLength : ~p~n", - % [Byte, Rest, PayloadByteLength]), - <> = Rest, - {Payload, Rest2}; - ``` - -- Suppose $184 \le B \le 191$. This is a trickier case. - - This case is for long bytestrings where the payload is $56$ bytes or - longer. - - Let $M := B - 183$ ($M$ for meta-length). Note $1 \le M \le 8$. - - The following $M$ bytes encode an integer $56 \le L \le 2^{64} - 1$. - - The $L$ bytes after that correspond to the byte array (target - decoded data). - - ```erlang - % bytestring. The byte length of the byte length of bytestring is FirstByte - - % 183. Then pull out the actual data - decode(<>) when Byte =< 191 -> - ByteLengthOfByteLength = Byte - 183, - BitLengthOfByteLength = 8 * ByteLengthOfByteLength, - <> = Rest, - <> = Rest2, - {Payload, Rest3}; - ``` - -- Suppose $192 \le B \le 247$. - - This case is for short lists where the encoded payload is between - $0$ and $55$ bytes. - - Let $L = B - 192$. Then $0 \le L \le 55$. - - $L$ is the byte length of the encoded list payload. - - We'll talk separately about how to decode list payloads. Short - version is you use this `decode/1` function we're writing to decode - individual items. You call it repeatedly on the remainder `Rest` - until it's empty. - - ```erlang - % length of the list-payload is FirstByte - 192. Then the list payload, which - % needs to be decoded on its own. - decode(<>) when Byte =< 247 -> - ByteLengthOfListPayload = Byte - 192, - <> = Rest, - List = decode_list(ListPayload), - {List, Rest2}; - ``` - -- Suppose $248 \le B \le 255$. - - This case is for long lists where the encoded list payload is $56$ - bytes or longer. - - We have the same deal as above where $B$ corresponds to the byte - length of an integer, which is the length of the list payload. - - ```erlang - % The byte length of the byte length of the list-payload is FirstByte - 247. - % Then the byte length of the list. Then the list payload, which needs to be - % decoded on its own. - decode(<>) -> - ByteLengthOfByteLengthOfListPayload_int = Byte - 247, - BitLengthOfByteLengthOfListPayload_int = 8 * ByteLengthOfByteLengthOfListPayload_int, - <> = Rest, - <> = Rest2, - List = decode_list(ListPayload_bytes), - {List, Rest3}. - ``` - -### Decoding lists - -Keep decoding items until you run out of payload. - -```erlang -decode_list(<<>>) -> - []; -decode_list(Bytes) -> - {Item, Rest} = decode(Bytes), - [Item | decode_list(Rest)]. -``` - -## RLP encode procedure - -Now that we understand the decode procedure, the encode procedure should -be pretty obvious. - -If it's a binary, we form a prefix depending on its length. If it's a -list, we encode the list items individually, then form a prefix -depending on the byte-length of all the encoded items concatenated together. - -```erlang --spec encode(Data) -> RLP - when Data :: decoded_data(), - RLP :: binary(). - -encode(Binary) when is_binary(Binary) -> - encode_binary(Binary); -encode(List) when is_list(List) -> - encode_list(List). -``` - -### Encoding binary data - -```erlang --spec encode_binary(Bytes) -> RLP - when Bytes :: binary(), - RLP :: binary(). - -% single byte case when the byte is between 0..127 -% result is the byte itself -encode_binary(<>) when Byte =< 127 -> - <>; -% if the bytestring is 0..55 items long, the first byte is 128 + Length, -% the rest of the string is the string -encode_binary(Bytes) when byte_size(Bytes) =< 55 -> - Size = byte_size(Bytes), - <<(128 + Size), Bytes/binary>>; -% more than 55 bytes long, first byte is 183 + ByteLengthOfLength -% max byte size is 2^64 - 1 -encode_binary(Bytes) when 55 < byte_size(Bytes), byte_size(Bytes) < (1 bsl 64) -> - SizeInt = byte_size(Bytes), - SizeBytes = binary:encode_unsigned(SizeInt, big), - SizeOfSizeInt = byte_size(SizeBytes), - %% 183 = 128 + 55 - %% SizeOfSizeInt > 0 - <<(183 + SizeOfSizeInt), - SizeBytes/binary, - Bytes/binary>>. -``` - -### Encoding lists - -```erlang --spec encode_list(List) -> RLP - when List :: [decoded_data()], - RLP :: binary(). - -% first we encode the total payload of the list -% depending on how long it is, we then branch -encode_list(List) -> - Payload = << (encode(Item)) || Item <- List>>, - Payload_Size = byte_size(Payload), - if - Payload_Size =< 55 -> - <<(192 + Payload_Size), Payload/binary>>; - 55 < Payload_Size -> - SizeBytes = binary:encode_unsigned(Payload_Size, big), - SizeOfSizeInt = byte_size(SizeBytes), - %% 247 = 192 + 55 - %% SizeOfSizeInt > 0 - <<(247 + SizeOfSizeInt), - SizeBytes/binary, - Payload/binary>> - end. -``` +# RLP: Ethereum Recursive-Length Prefix Codec + +![](./uploads/rlp-thumb-original-1024x576.png) + +## Quick Reference + +1. Erlang implementation: + + +2. TypeScript implementation: + + +3. Ethereum docs: + + +## RLP: overview + +RLP is a standard for taking arbitrary-depth +lists-of-lists-of-...-of-binaries and encoding them as bytestrings. + +We start by defining what the decoded data is: + +```erlang +-type decoded_data() :: binary() | [decoded_data()]. +``` + +## RLP: decode + +Easier to start with the unfamiliar thing and get back out the familiar +thing. + +```erlang +-spec decode(RLP) -> {Data, Rest} + when RLP :: binary(), + Data :: decoded_data(), + Rest :: binary(). +``` + +So we decode however much data we can, hand back the decoded data plus +whatever data we didn't consume. + +In the encoded (binary) data, there's a 1-byte prefix that indicates +both + +1. what type of data it is (list or binary) + +2. how long the payload is (as in how long it encodes to in bytes) + +This is our first branch point. + +Suppose we are decoding a bytestring and the first byte is $B$. Remember +a byte is an integer $0 \le B \le 255$. + +If $0 \le B \le 191$, we are decoding a byte array. If +$192 \le B \le 255$, we are decoding a list. The number $B$ encodes how +long the payload is. + +Specifically: + +- If $0 \le B \le 127$, then the decoded data is literally just the + byte $B$ (really the bytestring `<>`). + + This case is for single-byte bytestrings where the byte is between + $0$ and $127$. + + In the Erlang context, this case is kind of dumb and unnecessary + because it could be covered by the next case. But RLP was developed + in Ethereum, which mostly uses Go/Python/JS. + + A technically correct interpretation of the Ethereum protocol would + have `-type decoded_data() :: 0..127 | binary() | [decoded_data()].` + + There's really no purpose to this case in an Erlang context. It + introduces tons of unnecessary complexity for code that interacts + with our RLP decode procedure. Converting between binaries and + bigints is a big nothing in Erlang. Call `binary:decode_unsigned/1` + or `binary:encode_unsigned/2`. In other languages it's a giant + chore. + + Put another way: pretending the decoded data is either binaries or + LoLs effects no behavioral change in our library or in the calling + code, and only serves to make RLP easier to deal with. But + definitionally it is incorrect. + + I wrote this example RLP code in 2022 (date of writing is 2025) and + have used it in production this entire time, and never once has this + issue come up. I did not even notice that I had written something + technically incorrect until I went to write out this explainer and + was wondering why there were two separate cases that covered + single-byte strings. + + I'm still debating to myself whether or not it's a good idea to go + back and "correct" my code. Probably no. "If it's a bug that people + rely on, it's not a bug, it's a feature." -- Linus Torvalds. + + ```erlang + decode(<>) when Byte =< 127 -> + {<>, Rest}; + ``` + +- If $128 \le B \le 183$, then the decoded data is a bytestring of + length $L$, where $L := B - 128$. Consequently, $0 \le L \le 55$. + + The bytestring (target decoded data) is the following $L$ bytes. + + This case exists for short bytestrings. + + ```erlang + % the length is Byte - 128 + decode(<>) when Byte =< 183 -> + PayloadByteLength = Byte - 128, + %PayloadBitLength = 8 * PayloadByteLength, + %io:format("Byte : ~p~n" + % "Rest : ~w~n" + % "PayloadByteLength : ~p~n", + % %"PayloadBitLength : ~p~n", + % [Byte, Rest, PayloadByteLength]), + <> = Rest, + {Payload, Rest2}; + ``` + +- Suppose $184 \le B \le 191$. This is a trickier case. + + This case is for long bytestrings where the payload is $56$ bytes or + longer. + + Let $M := B - 183$ ($M$ for meta-length). Note $1 \le M \le 8$. + + The following $M$ bytes encode an integer $56 \le L \le 2^{64} - 1$. + + The $L$ bytes after that correspond to the byte array (target + decoded data). + + ```erlang + % bytestring. The byte length of the byte length of bytestring is FirstByte - + % 183. Then pull out the actual data + decode(<>) when Byte =< 191 -> + ByteLengthOfByteLength = Byte - 183, + BitLengthOfByteLength = 8 * ByteLengthOfByteLength, + <> = Rest, + <> = Rest2, + {Payload, Rest3}; + ``` + +- Suppose $192 \le B \le 247$. + + This case is for short lists where the encoded payload is between + $0$ and $55$ bytes. + + Let $L = B - 192$. Then $0 \le L \le 55$. + + $L$ is the byte length of the encoded list payload. + + We'll talk separately about how to decode list payloads. Short + version is you use this `decode/1` function we're writing to decode + individual items. You call it repeatedly on the remainder `Rest` + until it's empty. + + ```erlang + % length of the list-payload is FirstByte - 192. Then the list payload, which + % needs to be decoded on its own. + decode(<>) when Byte =< 247 -> + ByteLengthOfListPayload = Byte - 192, + <> = Rest, + List = decode_list(ListPayload), + {List, Rest2}; + ``` + +- Suppose $248 \le B \le 255$. + + This case is for long lists where the encoded list payload is $56$ + bytes or longer. + + We have the same deal as above where $B$ corresponds to the byte + length of an integer, which is the length of the list payload. + + ```erlang + % The byte length of the byte length of the list-payload is FirstByte - 247. + % Then the byte length of the list. Then the list payload, which needs to be + % decoded on its own. + decode(<>) -> + ByteLengthOfByteLengthOfListPayload_int = Byte - 247, + BitLengthOfByteLengthOfListPayload_int = 8 * ByteLengthOfByteLengthOfListPayload_int, + <> = Rest, + <> = Rest2, + List = decode_list(ListPayload_bytes), + {List, Rest3}. + ``` + +### Decoding lists + +Keep decoding items until you run out of payload. + +```erlang +decode_list(<<>>) -> + []; +decode_list(Bytes) -> + {Item, Rest} = decode(Bytes), + [Item | decode_list(Rest)]. +``` + +## RLP encode procedure + +Now that we understand the decode procedure, the encode procedure should +be pretty obvious. + +If it's a binary, we form a prefix depending on its length. If it's a +list, we encode the list items individually, then form a prefix +depending on the byte-length of all the encoded items concatenated together. + +```erlang +-spec encode(Data) -> RLP + when Data :: decoded_data(), + RLP :: binary(). + +encode(Binary) when is_binary(Binary) -> + encode_binary(Binary); +encode(List) when is_list(List) -> + encode_list(List). +``` + +### Encoding binary data + +```erlang +-spec encode_binary(Bytes) -> RLP + when Bytes :: binary(), + RLP :: binary(). + +% single byte case when the byte is between 0..127 +% result is the byte itself +encode_binary(<>) when Byte =< 127 -> + <>; +% if the bytestring is 0..55 items long, the first byte is 128 + Length, +% the rest of the string is the string +encode_binary(Bytes) when byte_size(Bytes) =< 55 -> + Size = byte_size(Bytes), + <<(128 + Size), Bytes/binary>>; +% more than 55 bytes long, first byte is 183 + ByteLengthOfLength +% max byte size is 2^64 - 1 +encode_binary(Bytes) when 55 < byte_size(Bytes), byte_size(Bytes) < (1 bsl 64) -> + SizeInt = byte_size(Bytes), + SizeBytes = binary:encode_unsigned(SizeInt, big), + SizeOfSizeInt = byte_size(SizeBytes), + %% 183 = 128 + 55 + %% SizeOfSizeInt > 0 + <<(183 + SizeOfSizeInt), + SizeBytes/binary, + Bytes/binary>>. +``` + +### Encoding lists + +```erlang +-spec encode_list(List) -> RLP + when List :: [decoded_data()], + RLP :: binary(). + +% first we encode the total payload of the list +% depending on how long it is, we then branch +encode_list(List) -> + Payload = << (encode(Item)) || Item <- List>>, + Payload_Size = byte_size(Payload), + if + Payload_Size =< 55 -> + <<(192 + Payload_Size), Payload/binary>>; + 55 < Payload_Size -> + SizeBytes = binary:encode_unsigned(Payload_Size, big), + SizeOfSizeInt = byte_size(SizeBytes), + %% 247 = 192 + 55 + %% SizeOfSizeInt > 0 + <<(247 + SizeOfSizeInt), + SizeBytes/binary, + Payload/binary>> + end. +```