Clone
3
RLP
Peter Harpending edited this page 2025-03-23 16:23:11 -07:00

RLP: Ethereum Recursive-Length Prefix Codec

Quick Reference

  1. Erlang implementation: https://github.com/aeternity/Vanillae/blob/d5fc02c7e6314b6d32ba914f9bc9ea90bb86c7dd/utils/vw/src/vrlp.erl

  2. TypeScript implementation: https://github.com/aeternity/Vanillae/blob/d5fc02c7e6314b6d32ba914f9bc9ea90bb86c7dd/bindings/typescript/src/rlp.ts

  3. Ethereum docs: https://ethereum.org/en/developers/docs/data-structures-and-encoding/rlp/

RLP: overview

RLP is a standard for taking arbitrary-depth lists-of-lists-of-...-of-binaries and encoding them as bytestrings.

We start by defining what the decoded data is:

-type decoded_data() :: binary() | [decoded_data()].

RLP: decode

Easier to start with the unfamiliar thing and get back out the familiar thing.

-spec decode(RLP) -> {Data, Rest}
    when RLP  :: binary(),
         Data :: decoded_data(),
         Rest :: binary().

So we decode however much data we can, hand back the decoded data plus whatever data we didn't consume.

In the encoded (binary) data, there's a 1-byte prefix that indicates both

  1. what type of data it is (list or binary)

  2. how long the payload is (as in how long it encodes to in bytes)

This is our first branch point.

Suppose we are decoding a bytestring and the first byte is B. Remember a byte is an integer 0 \le B \le 255.

If 0 \le B \le 191, we are decoding a byte array. If 192 \le B \le 255, we are decoding a list. The number B encodes how long the payload is.

Specifically:

  • If 0 \le B \le 127, then the decoded data is literally just the byte B (really the bytestring <<B>>).

    This case is for single-byte bytestrings where the byte is between 0 and 127.

    In the Erlang context, this case is kind of dumb and unnecessary because it could be covered by the next case. But RLP was developed in Ethereum, which mostly uses Go/Python/JS.

    A technically correct interpretation of the Ethereum protocol would have -type decoded_data() :: 0..127 | binary() | [decoded_data()].

    There's really no purpose to this case in an Erlang context. It introduces tons of unnecessary complexity for code that interacts with our RLP decode procedure. Converting between binaries and bigints is a big nothing in Erlang. Call binary:decode_unsigned/1 or binary:encode_unsigned/2. In other languages it's a giant chore.

    Put another way: pretending the decoded data is either binaries or LoLs effects no behavioral change in our library or in the calling code, and only serves to make RLP easier to deal with. But definitionally it is incorrect.

    I wrote this example RLP code in 2022 (date of writing is 2025) and have used it in production this entire time, and never once has this issue come up. I did not even notice that I had written something technically incorrect until I went to write out this explainer and was wondering why there were two separate cases that covered single-byte strings.

    I'm still debating to myself whether or not it's a good idea to go back and "correct" my code. Probably no. "If it's a bug that people rely on, it's not a bug, it's a feature." -- Linus Torvalds.

    decode(<<Byte, Rest/binary>>) when Byte =< 127 ->
        {<<Byte>>, Rest};
    
  • If 128 \le B \le 183, then the decoded data is a bytestring of length L, where L := B - 128. Consequently, 0 \le L \le 55.

    The bytestring (target decoded data) is the following L bytes.

    This case exists for short bytestrings.

    % the length is Byte - 128
    decode(<<Byte, Rest/binary>>) when Byte =< 183 ->
        PayloadByteLength = Byte - 128,
        %PayloadBitLength  = 8 * PayloadByteLength,
        %io:format("Byte              : ~p~n"
        %          "Rest              : ~w~n"
        %          "PayloadByteLength : ~p~n",
        %          %"PayloadBitLength  : ~p~n",
        %          [Byte, Rest, PayloadByteLength]),
        <<Payload:PayloadByteLength/binary,
          Rest2/binary>>  = Rest,
        {Payload, Rest2};
    
  • Suppose 184 \le B \le 191. This is a trickier case.

    This case is for long bytestrings where the payload is 56 bytes or longer.

    Let M := B - 183 (M for meta-length). Note 1 \le M \le 8.

    The following M bytes encode an integer 56 \le L \le 2^{64} - 1.

    The L bytes after that correspond to the byte array (target decoded data).

    % bytestring. The byte length of the byte length of bytestring is FirstByte -
    % 183. Then pull out the actual data
    decode(<<Byte, Rest/binary>>) when Byte =< 191 ->
        ByteLengthOfByteLength = Byte - 183,
        BitLengthOfByteLength  = 8 * ByteLengthOfByteLength,
        <<ByteLengthInt:BitLengthOfByteLength,
          Rest2/binary>> = Rest,
        <<Payload:ByteLengthInt/binary,
          Rest3/binary>> = Rest2,
        {Payload, Rest3};
    
  • Suppose 192 \le B \le 247.

    This case is for short lists where the encoded payload is between 0 and 55 bytes.

    Let L = B - 192. Then 0 \le L \le 55.

    L is the byte length of the encoded list payload.

    We'll talk separately about how to decode list payloads. Short version is you use this decode/1 function we're writing to decode individual items. You call it repeatedly on the remainder Rest until it's empty.

    % length of the list-payload is FirstByte - 192. Then the list payload, which
    % needs to be decoded on its own.
    decode(<<Byte, Rest/binary>>) when Byte =< 247 ->
        ByteLengthOfListPayload = Byte - 192,
        <<ListPayload:ByteLengthOfListPayload/binary,
          Rest2/binary>> = Rest,
        List = decode_list(ListPayload),
        {List, Rest2};
    
  • Suppose 248 \le B \le 255.

    This case is for long lists where the encoded list payload is 56 bytes or longer.

    We have the same deal as above where B corresponds to the byte length of an integer, which is the length of the list payload.

    % The byte length of the byte length of the list-payload is FirstByte - 247.
    % Then the byte length of the list. Then the list payload, which needs to be
    % decoded on its own.
    decode(<<Byte, Rest/binary>>) ->
        ByteLengthOfByteLengthOfListPayload_int = Byte - 247,
        BitLengthOfByteLengthOfListPayload_int  = 8 * ByteLengthOfByteLengthOfListPayload_int,
        <<ByteLengthOfListPayload_int:BitLengthOfByteLengthOfListPayload_int,
          Rest2/binary>> = Rest,
        <<ListPayload_bytes:ByteLengthOfListPayload_int/binary,
          Rest3/binary>> = Rest2,
        List = decode_list(ListPayload_bytes),
        {List, Rest3}.
    

Decoding lists

Keep decoding items until you run out of payload.

decode_list(<<>>) ->
    [];
decode_list(Bytes) ->
    {Item, Rest} = decode(Bytes),
    [Item | decode_list(Rest)].

RLP encode procedure

Now that we understand the decode procedure, the encode procedure should be pretty obvious.

If it's a binary, we form a prefix depending on its length. If it's a list, we encode the list items individually, then form a prefix depending on the byte-length of all the encoded items concatenated together.

-spec encode(Data) -> RLP
    when Data :: decoded_data(),
         RLP  :: binary().

encode(Binary) when is_binary(Binary) ->
    encode_binary(Binary);
encode(List) when is_list(List) ->
    encode_list(List).

Encoding binary data

-spec encode_binary(Bytes) -> RLP
    when Bytes :: binary(),
         RLP   :: binary().

% single byte case when the byte is between 0..127
% result is the byte itself
encode_binary(<<Byte>>) when Byte =< 127 ->
    <<Byte>>;
% if the bytestring is 0..55 items long, the first byte is 128 + Length,
% the rest of the string is the string
encode_binary(Bytes) when byte_size(Bytes) =< 55 ->
    Size = byte_size(Bytes),
    <<(128 + Size), Bytes/binary>>;
% more than 55 bytes long, first byte is 183 + ByteLengthOfLength
% max byte size is 2^64 - 1
encode_binary(Bytes) when 55 < byte_size(Bytes), byte_size(Bytes) < (1 bsl 64) ->
    SizeInt       = byte_size(Bytes),
    SizeBytes     = binary:encode_unsigned(SizeInt, big),
    SizeOfSizeInt = byte_size(SizeBytes),
    %% 183 = 128 + 55
    %% SizeOfSizeInt > 0
    <<(183 + SizeOfSizeInt),
      SizeBytes/binary,
      Bytes/binary>>.

Encoding lists

-spec encode_list(List) -> RLP
    when List :: [decoded_data()],
         RLP  :: binary().

% first we encode the total payload of the list
% depending on how long it is, we then branch
encode_list(List) ->
    Payload      = << (encode(Item)) || Item <- List>>,
    Payload_Size = byte_size(Payload),
    if
        Payload_Size =< 55 ->
            <<(192 + Payload_Size), Payload/binary>>;
        55 < Payload_Size ->
            SizeBytes     = binary:encode_unsigned(Payload_Size, big),
            SizeOfSizeInt = byte_size(SizeBytes),
            %% 247 = 192 + 55
            %% SizeOfSizeInt > 0
            <<(247 + SizeOfSizeInt),
              SizeBytes/binary,
              Payload/binary>>
    end.