BaseN drafting WIP
parent
2c63bdd744
commit
ef7c6be734
373
BaseN.md
373
BaseN.md
@ -1,3 +1,376 @@
|
||||
# Base64 vs Base58 Explained
|
||||
|
||||

|
||||
|
||||
## Quick Reference
|
||||
|
||||
1. Original article: <https://zxq9.com/archives/2688>
|
||||
2. Base64 Erlang:
|
||||
<https://github.com/aeternity/Vanillae/blob/829dd2930ff20ea0473cf2ad562e0a1c2aba0411/utils/vw/src/vb64.erl>
|
||||
3. Base58 Erlang:
|
||||
<https://github.com/aeternity/Vanillae/blob/829dd2930ff20ea0473cf2ad562e0a1c2aba0411/utils/vw/src/vb58.erl>
|
||||
4. Base64 TypeScript:
|
||||
<https://github.com/aeternity/Vanillae/blob/829dd2930ff20ea0473cf2ad562e0a1c2aba0411/bindings/typescript/src/b64.ts>
|
||||
5. Base58 TypeScript:
|
||||
<https://github.com/aeternity/Vanillae/blob/829dd2930ff20ea0473cf2ad562e0a1c2aba0411/bindings/typescript/src/b58.ts>
|
||||
|
||||
## tl;dr
|
||||
|
||||
Base64 and Base58 are two schema for taking binary data and encoding it
|
||||
to and from plain text.
|
||||
|
||||
1. Conceptually:
|
||||
|
||||
1. These are **NOT** two instances of a general "base N" concept.
|
||||
|
||||
2. Base64 thinks of the binary data as a long stream of bytes.
|
||||
|
||||
Base64 converts bytes to/from text 3 bytes at a time in a
|
||||
"fire-and-forget" manner.
|
||||
|
||||
3. Base58 thinks of the binary data as a really really long
|
||||
integer.
|
||||
|
||||
Base58 requires processing the entire bytestring all at once as
|
||||
a singular unit.
|
||||
|
||||
4. Base58 *is* the general BaseN algorithm. Base64 is simpler
|
||||
because 64 and 256 are both powers of 2 and computers use
|
||||
binary.
|
||||
|
||||
2. Terminologically:
|
||||
|
||||
1. The terms "encode" and "decode" are meant *from the perspective
|
||||
of the computer program*. From the program's perspective, binary
|
||||
data is what makes sense and plain text is gobbletygook. So
|
||||
|
||||
1. we *decode* plain text (gahbage) into binary data (what
|
||||
makes sense)
|
||||
|
||||
2. we *encode* binary data (what makes sense) to plain text
|
||||
(gahbage)
|
||||
|
||||
3. Practically:
|
||||
|
||||
1. Almost every language (including Erlang) has base64 in its
|
||||
standard library, and you should probably just use that.
|
||||
|
||||
2. You probably have to code Base58 yourself.
|
||||
|
||||
3. If you are implementing Base58 in a language that does not have
|
||||
bignum arithmetic, you have to implement it yourself. Thankfully
|
||||
nobody will ever need any language other than Erlang so we can
|
||||
ignore this problem in the context of this wiki.
|
||||
|
||||
4. Base58 is *super* inefficient both space-wise and time-wise, because it
|
||||
requires processing the entirety of the binary string as a single
|
||||
monolithic piece of data.
|
||||
|
||||
5. Base58 exists to preempt `Il10O`-type problems (visual
|
||||
ambiguity) and doesn't involve `=+/` characters (the idea being
|
||||
email clients are likely to break long lines at these
|
||||
characters, increasing the likelihood of input errors).
|
||||
|
||||
6. Consequently, Base58 is only suitable for binary data that is
|
||||
**both**
|
||||
|
||||
1. short in length
|
||||
|
||||
2. likely to be entered manually (e.g. wallet/contract
|
||||
addresses)
|
||||
# Base64
|
||||
|
||||
The term "binary" is misleading, because it leads people to think that
|
||||
the basic unit in computing is a bit (a 1 or a 0).
|
||||
|
||||
It is more accurate to think of the basic unit in computing as a byte,
|
||||
which is a number between 0 (`2#0000_0000`) and 255 (`2#1111_1111`).
|
||||
|
||||
(If you're totally bewildered by binary notation, you might want to read
|
||||
[\[s:qr-algorithm\]](#s:qr-algorithm){reference-type="ref"
|
||||
reference="s:qr-algorithm"} and come back).
|
||||
|
||||
Base64 processes a binary string 3 bytes at a time (which is 24 bits).
|
||||
It regroups those 3 groups of 8 bits into 4 groups of 6 bits.
|
||||
|
||||
ABCDEFGH 12345678 ABCDEFGH
|
||||
ABCDEF GH1234 5678AB CDEFGH
|
||||
|
||||
We now have 4 numbers ranging between 0 (`2#00_0000`) and 63
|
||||
(`2#11_1111`), hence the name "Base64".
|
||||
|
||||
Each number in the range 0..63 is assigned to a character from the
|
||||
alphabet (see [Base64 Alphabet](#base64-alphabet))
|
||||
|
||||
```erlang
|
||||
% abbreviated
|
||||
% 0..25 map to $A..$Z
|
||||
int2char(N) when 0 =< N, N =< 25 -> $A + N;
|
||||
% 26..51 map to $a..$z
|
||||
int2char(N) when 26 =< N, N =< 51 -> $a + (N - 26);
|
||||
% 52..61 map to $0..$9
|
||||
int2char(N) when 52 =< N, N =< 61 -> $0 + (N - 52);
|
||||
% special cases
|
||||
int2char(62) -> $+;
|
||||
int2char(63) -> $/.
|
||||
|
||||
% $A..$Z map to 0..25
|
||||
char2int(C) when $A =< C, C =< $Z -> C - $A;
|
||||
% $a..$z map to 26..51
|
||||
char2int(C) when $a =< C, C =< $z -> C - $a;
|
||||
% $0..$9 map to 52..61
|
||||
char2int(C) when $0 =< C, C =< $9 -> C - $0;
|
||||
% special cases
|
||||
char2int($+) -> 62;
|
||||
char2int($/) -> 63.
|
||||
```
|
||||
|
||||
The only stupid cases arise when the number of bytes in the binary data
|
||||
is not a multiple of 3. In this case there are two padding rules:
|
||||
|
||||
``` {.Erlang language="Erlang"}
|
||||
% general case: at least 3 bytes (24 bits = 6+6+6+6) remaining
|
||||
%
|
||||
% 12345678 abcdefgh 12345678 ...
|
||||
% 123456 78abcd efgh12 345678 ...
|
||||
% A B C D Rest
|
||||
% convert to chars ->
|
||||
% CA CB CC CD
|
||||
enc(<<A:6, B:6, C:6, D:6, Rest/binary>>) ->
|
||||
CA = int2char(A),
|
||||
CB = int2char(B),
|
||||
CC = int2char(C),
|
||||
CD = int2char(D),
|
||||
[CA, CB, CC, CD | enc(Rest)];
|
||||
% terminal case: 2 bytes (16 bits = 6+6+4) remaining
|
||||
%
|
||||
% 12345678 abcdefgh
|
||||
% 123456 78abcd efgh__
|
||||
% A B C bsl 2
|
||||
% convert to chars ->
|
||||
% CA CB CC =
|
||||
enc(<<A:6, B:6, C:4>>) ->
|
||||
CA = int2char(A),
|
||||
CB = int2char(B),
|
||||
CC = int2char(C bsl 2),
|
||||
[CA, CB, CC, $=];
|
||||
% terminal case: 1 byte (8 bits = 6+2) remaining
|
||||
%
|
||||
% 12345678 ->
|
||||
% 123456 78____
|
||||
% A B bsl 4
|
||||
% convert to chars ->
|
||||
% CA CB = =
|
||||
enc(<<A:6, B:2>>) ->
|
||||
CA = int2char(A),
|
||||
CB = int2char(B bsl 4),
|
||||
[CA, CB, $=, $=];
|
||||
% terminal case: 0 bytes remaining
|
||||
enc(<<>>) ->
|
||||
[].
|
||||
```
|
||||
|
||||
By the way, that's the *entire* encode procedure right there.
|
||||
|
||||
The decode procedure is similarly simple but slightly trickier:
|
||||
|
||||
``` {.Erlang language="Erlang"}
|
||||
dec(Base64_String) ->
|
||||
dec(Base64_String, <<>>).
|
||||
|
||||
|
||||
% terminal case: two equal signs at the end = 1 byte (8 bits = 6+2) remaining
|
||||
% input (characters) ->
|
||||
% W X = =
|
||||
% convert to numbers ->
|
||||
% abcdef gh____ = =
|
||||
% NW NX
|
||||
% regroup ->
|
||||
% abcdefgh ____ abcdef gh____
|
||||
% <<LastByte:8, 0:4>> = << NW:6, NX:6 >>
|
||||
dec([W, X, $=, $=], Acc) ->
|
||||
NW = char2int(W),
|
||||
NX = char2int(X),
|
||||
<<LastByte:8, 0:4>> = <<NW:6, NX:6>>,
|
||||
<<Acc/binary, LastByte:8>>;
|
||||
% terminal case: one equal sign at the end = 2 bytes remaining
|
||||
%
|
||||
% input (characters) ->
|
||||
% W X Y =
|
||||
% convert to numbers ->
|
||||
% abcdef gh1234 5678__ =
|
||||
% NW NX NY
|
||||
% regroup ->
|
||||
% abcdefgh 12345678 __ abcdef gh1234 5678__
|
||||
% << B1:8, B2:8, 0:2 >> = << NW:6, NX:6 NY:6 >>
|
||||
dec([W, X, Y, $=], Acc) ->
|
||||
NW = char2int(W),
|
||||
NX = char2int(X),
|
||||
NY = char2int(Y),
|
||||
<<B1:8, B2:8, 0:2>> = <<NW:6, NX:6, NY:6>>,
|
||||
<<Acc/binary, B1:8, B2:8>>;
|
||||
% terminal case: 0 bytes remaining
|
||||
% nothing to do
|
||||
dec([], Acc) ->
|
||||
Acc;
|
||||
% general case: no equal signs = 3 or more bytes remaining
|
||||
%
|
||||
% input (characters) ->
|
||||
% W X Y Z
|
||||
% convert to numbers ->
|
||||
% abcdef gh1234 5678ab cdefgh
|
||||
% NW NX NY NZ
|
||||
% decompose ->
|
||||
% abcdefgh 12345678 abcdefgh abcdef gh1234 5678ab cdefgh
|
||||
% << B1:8, B2:8, B3:2 >> = << NW:6, NX:6 NY:6, NZ:6 >>
|
||||
dec([W, X, Y, Z | Rest], Acc) ->
|
||||
NW = char2int(W),
|
||||
NX = char2int(X),
|
||||
NY = char2int(Y),
|
||||
NZ = char2int(Z),
|
||||
NewAcc = <<Acc/binary, NW:6, NX:6, NY:6, NZ:6>>,
|
||||
dec(Rest, NewAcc).
|
||||
```
|
||||
|
||||
This is marginally trickier because
|
||||
|
||||
1. it needs an accumulator
|
||||
|
||||
2. the order of the cases matters
|
||||
|
||||
|
||||
## Tables etc
|
||||
|
||||
### Base64 Alphabet
|
||||
|
||||
``` {.Erlang language="Erlang"}
|
||||
int2char( 0) -> $A;
|
||||
int2char( 1) -> $B;
|
||||
int2char( 2) -> $C;
|
||||
int2char( 3) -> $D;
|
||||
int2char( 4) -> $E;
|
||||
int2char( 5) -> $F;
|
||||
int2char( 6) -> $G;
|
||||
int2char( 7) -> $H;
|
||||
int2char( 8) -> $I;
|
||||
int2char( 9) -> $J;
|
||||
int2char(10) -> $K;
|
||||
int2char(11) -> $L;
|
||||
int2char(12) -> $M;
|
||||
int2char(13) -> $N;
|
||||
int2char(14) -> $O;
|
||||
int2char(15) -> $P;
|
||||
int2char(16) -> $Q;
|
||||
int2char(17) -> $R;
|
||||
int2char(18) -> $S;
|
||||
int2char(19) -> $T;
|
||||
int2char(20) -> $U;
|
||||
int2char(21) -> $V;
|
||||
int2char(22) -> $W;
|
||||
int2char(23) -> $X;
|
||||
int2char(24) -> $Y;
|
||||
int2char(25) -> $Z;
|
||||
int2char(26) -> $a;
|
||||
int2char(27) -> $b;
|
||||
int2char(28) -> $c;
|
||||
int2char(29) -> $d;
|
||||
int2char(30) -> $e;
|
||||
int2char(31) -> $f;
|
||||
int2char(32) -> $g;
|
||||
int2char(33) -> $h;
|
||||
int2char(34) -> $i;
|
||||
int2char(35) -> $j;
|
||||
int2char(36) -> $k;
|
||||
int2char(37) -> $l;
|
||||
int2char(38) -> $m;
|
||||
int2char(39) -> $n;
|
||||
int2char(40) -> $o;
|
||||
int2char(41) -> $p;
|
||||
int2char(42) -> $q;
|
||||
int2char(43) -> $r;
|
||||
int2char(44) -> $s;
|
||||
int2char(45) -> $t;
|
||||
int2char(46) -> $u;
|
||||
int2char(47) -> $v;
|
||||
int2char(48) -> $w;
|
||||
int2char(49) -> $x;
|
||||
int2char(50) -> $y;
|
||||
int2char(51) -> $z;
|
||||
int2char(52) -> $0;
|
||||
int2char(53) -> $1;
|
||||
int2char(54) -> $2;
|
||||
int2char(55) -> $3;
|
||||
int2char(56) -> $4;
|
||||
int2char(57) -> $5;
|
||||
int2char(58) -> $6;
|
||||
int2char(59) -> $7;
|
||||
int2char(60) -> $8;
|
||||
int2char(61) -> $9;
|
||||
int2char(62) -> $+;
|
||||
int2char(63) -> $/.
|
||||
|
||||
char2int($A) -> 0;
|
||||
char2int($B) -> 1;
|
||||
char2int($C) -> 2;
|
||||
char2int($D) -> 3;
|
||||
char2int($E) -> 4;
|
||||
char2int($F) -> 5;
|
||||
char2int($G) -> 6;
|
||||
char2int($H) -> 7;
|
||||
char2int($I) -> 8;
|
||||
char2int($J) -> 9;
|
||||
char2int($K) -> 10;
|
||||
char2int($L) -> 11;
|
||||
char2int($M) -> 12;
|
||||
char2int($N) -> 13;
|
||||
char2int($O) -> 14;
|
||||
char2int($P) -> 15;
|
||||
char2int($Q) -> 16;
|
||||
char2int($R) -> 17;
|
||||
char2int($S) -> 18;
|
||||
char2int($T) -> 19;
|
||||
char2int($U) -> 20;
|
||||
char2int($V) -> 21;
|
||||
char2int($W) -> 22;
|
||||
char2int($X) -> 23;
|
||||
char2int($Y) -> 24;
|
||||
char2int($Z) -> 25;
|
||||
char2int($a) -> 26;
|
||||
char2int($b) -> 27;
|
||||
char2int($c) -> 28;
|
||||
char2int($d) -> 29;
|
||||
char2int($e) -> 30;
|
||||
char2int($f) -> 31;
|
||||
char2int($g) -> 32;
|
||||
char2int($h) -> 33;
|
||||
char2int($i) -> 34;
|
||||
char2int($j) -> 35;
|
||||
char2int($k) -> 36;
|
||||
char2int($l) -> 37;
|
||||
char2int($m) -> 38;
|
||||
char2int($n) -> 39;
|
||||
char2int($o) -> 40;
|
||||
char2int($p) -> 41;
|
||||
char2int($q) -> 42;
|
||||
char2int($r) -> 43;
|
||||
char2int($s) -> 44;
|
||||
char2int($t) -> 45;
|
||||
char2int($u) -> 46;
|
||||
char2int($v) -> 47;
|
||||
char2int($w) -> 48;
|
||||
char2int($x) -> 49;
|
||||
char2int($y) -> 50;
|
||||
char2int($z) -> 51;
|
||||
char2int($0) -> 52;
|
||||
char2int($1) -> 53;
|
||||
char2int($2) -> 54;
|
||||
char2int($3) -> 55;
|
||||
char2int($4) -> 56;
|
||||
char2int($5) -> 57;
|
||||
char2int($6) -> 58;
|
||||
char2int($7) -> 59;
|
||||
char2int($8) -> 60;
|
||||
char2int($9) -> 61;
|
||||
char2int($+) -> 62;
|
||||
char2int($/) -> 63.
|
||||
```
|
||||
|
Loading…
x
Reference in New Issue
Block a user