gex/README.md
Peter Harpending 9c5b332e00 readme
2025-09-23 16:13:19 -07:00

566 lines
20 KiB
Markdown

# gex = gajumaru exchange
Currently there is only one thing, which is the Gajumaru HTTP Daemon.
## How to run `gex_httpd`
Last updated: September 23, 2025 (PRH).
### Install Erlang and zx/zomp
Source: [*Building Erlang 26.2.5 on Ubuntu 24.04*](https://zxq9.com/archives/2905)
Adapt this to your Linux distribution.
1. **Install necessary build tools**
```bash
sudo apt update
sudo apt upgrade
sudo apt install \
gcc curl g++ dpkg-dev build-essential automake autoconf \
libncurses-dev libssl-dev flex xsltproc libwxgtk3.2-dev \
wget vim git
```
2. **Put [Kerl](https://github.com/kerl/kerl) somewhere
in your `$PATH`**. This is a tool to build Erlang releases.
```bash
wget -O ~/bin/kerl https://raw.githubusercontent.com/kerl/kerl/master/kerl
chmod u+x ~/bin/kerl
```
3. **Build Erlang from source using Kerl**
```bash
kerl update releases
## use the most recent one that looks stable
## you do need to type the number twice, that's not a typo
kerl build 28.1 28.1
kerl install 28.1 ~/.erts/28.1
```
4. **Put Erlang in your `$PATH`**
Update .bashrc or .zshrc or whatever with the following line:
```bash
. $HOME/.erts/28.1/activate
```
5. **Install zx**
```bash
wget -q https://zxq9.com/projects/zomp/get_zx && bash get_zx
```
6. **Test zx works**
zx installs itself to `~/bin`, so make sure that's in your
`$PATH`.
```bash
zx run erltris
```
## Notes
### Convention: \[brackets\] for jargon
- You know how sometimes people will intermix technical jargon which has a very
specific context-local definition with common parlance?
- Our notational convention is to put \[jargon terms\] in square braces to warn
the reader that the word is meant in some extremely precise technical sense,
and the word doesn't necessarily mean what the dictionary says it means.
- Specifically, \[supervisor\] is a jargon term that is standard in Erlang.
- Do not confuse \[supervisor\] with \[manager\]. \[Manager\] is AFAIK
Craig-specific nomenclature.
- A \[supervisor\] is (roughly) a process that is in charge of a bunch of child
processes. It is responsible for restarting processes that crash. (Yes Craig
I know it's more nuanced than that).
- The other common pattern is a \[`gen_server`\].
- If all you take away from this document is that erlang has things called
\[supervisor\]s and things called \[`gen_server`\]s, consider that a good
day.
### Big Picture: telnet chat server -> HTTP server
- The default project (see initial commit) is a telnet echo server. It's like
the most ghetto low-budget chat server imaginable.
![screenshot of `gh_httpd` running](./etc/gh-telnet.png)
Lefty and middley can chat just like normal.
However, righty (`curl`) foolishly thinks he is talking to an HTTP server.
His request is echoed to lefty and middley.
Curl crashed because instead of a valid HTTP response back, he got something
like
```
MESSAGE from YOU: GET / HTTP/1.1
```
- We make this into an HTTP server by replacing the "echo my message to
everyone else" logic with "parse this message as an HTTP request and send
back an HTTP response" logic.
- Our "application logic" or "business logic" or whatever is contained in that
process of how the request is mapped to a response.
- It really is not more complicated than that.
### Basics of Erlang Processes
These are heuristics that are good starting points
- each module ~= 1 process
- it helps to think of erlang as an operating system, and erlang modules as
shell scripts that run in that operating system.
- some modules correspond to fungible processes, some are non-fungible
- in Observer (`observer:start()`)
- named processes are non-fungible (e.g. `gh_client_sup`)
- the name can be anything, but conventionally it's the module name
- fungible processes have numbers (PIDs) (e.g. the
`gh_client` code, which is the Erlang process that was
on the other end of the conversation with the `telnet`
windows)
- named processes also have PIDs, they just also have names
![](./etc/observer.png)
- you will want to get in the habit of any time you read code, always asking
what process context the code is running in.
- it is **NOT** the case that all code in module `foo` runs inside the context
of process `foo`. It is **very** important that you make sure you understand
that distinction, and always know where code is running.
### Following the call chain of `gex_httpd:listen(8080)`
- Reference commit: `49a09d192c6f2380c5186ec7d81e98785d667214`
- By default, the telnet server doesn't occupy a port
- `gex_httpd:listen(8080)` tells it to listen on port 8080
```erlang
%% gex_httpd.erl
-spec listen(PortNum) -> Result
when PortNum :: inet:port_num(),
Result :: ok
| {error, {listening, inet:port_num()}}.
%% @doc
%% Make the server start listening on a port.
%% Returns an {error, Reason} tuple if it is already listening.
listen(PortNum) ->
gh_client_man:listen(PortNum).
%% gh_client_man.erl
listen(PortNum) ->
gen_server:call(?MODULE, {listen, PortNum}).
```
- So this is a bit tricky.
- The code inside that function runs in the context of the `gex_httpd`
process (or whatever calling process)
- the effect of that code is to send a message to the `gh_client_man`
process (`= ?MODULE`)
- that message is `{listen, 8080}`
- in general, it's `gen_server:call(PID, Message)`
- every process has a "mailbox" of messages. the process usually just sits
there doing nothing until it gets a message, and then does something
deterministically in response to the message.
- `gen_server`, \[`supervisor`\], etc, all are standard library factoring-outs
of common patterns of process configuration
- The low-level primitive to receive messages is `receive`. We'll see its
use later when we look at the `gh_client` code.
- See [pingpong example](./etc/pingpong.erl) for a simplified example.
[Permalink.](https://git.qpq.swiss/QPQ-AG/gex/src/commit/28193491e7dabadac9bac756b44ee2ea3869eede/etc/pingpong.erl)
- All of this `gen_server` nonsense is a bunch of boilerplate that rewrites
to a bunch of `receive`s
#### Very Important: casts v. calls
- **Very important:** sometimes you will also see `gen_server:cast(PID,
Message).`
It's very important that you understand the difference
- So in our example
- `gex_httpd` makes a call to `gh_client_man`
- `gex_httpd` sends the message `{listen, 8080}` to `gh_client_man`
- he is going to sit there and wait until `gh_client_man` sends him a
message back. this is what makes a call a call. if `gh_client_man` never
responds, `gex_httpd` will just sit there forever waiting, and never move
on with his life.
- in a cast, the message is sent and you move on with your day
- think of calls like actual phone calls, where the other person has to
answer, but if they don't, there's no voicemail, the phone just rings
forever and you're just stuck listening to the phone ringing like sisyphus
(there's an option of course to call with a timeout, etc... simplifying).
- casts are like text messages. You send them. maybe you get a text back. who
knows.
#### Continuing
- Inside of `gh_client_man`'s own process context, it listens for calls in the
`handle_call` function
```erlang
%% gh_client_man.erl
-spec handle_call(Message, From, State) -> Result
when Message :: term(),
From :: {pid(), reference()},
State :: state(),
Result :: {reply, Response, NewState}
| {noreply, State},
Response :: ok
| {error, {listening, inet:port_number()}},
NewState :: state().
%% @private
%% The gen_server:handle_call/3 callback.
%% See: http://erlang.org/doc/man/gen_server.html#Module:handle_call-3
handle_call({listen, PortNum}, _, State) ->
{Response, NewState} = do_listen(PortNum, State),
{reply, Response, NewState};
handle_call(Unexpected, From, State) ->
ok = io:format("~p Unexpected call from ~tp: ~tp~n", [self(), From, Unexpected]),
{noreply, State}.
```
- following the call chain, we look at `gh_client_man:do_listen/2`
This is running inside the `gh_client_man` process context
```erlang
%% gh_client_man.erl
-spec do_listen(PortNum, State) -> {Result, NewState}
when PortNum :: inet:port_number(),
State :: state(),
Result :: ok
| {error, Reason :: {listening, inet:port_number()}},
NewState :: state().
%% @private
%% The "doer" procedure called when a "listen" message is received.
do_listen(PortNum, State = #s{port_num = none}) ->
SocketOptions =
[inet6,
{packet, line},
{active, once},
{mode, binary},
{keepalive, true},
{reuseaddr, true}],
{ok, Listener} = gen_tcp:listen(PortNum, SocketOptions),
{ok, _} = gh_client:start(Listener),
{ok, State#s{port_num = PortNum, listener = Listener}};
do_listen(_, State = #s{port_num = PortNum}) ->
ok = io:format("~p Already listening on ~p~n", [self(), PortNum]),
{{error, {listening, PortNum}}, State}.
```
- If we're already listening (i.e. our state already has a port), we tell
the calling process to fuck off.
- If we don't have a port number, we
- make a TCP listen socket on that port
- start a `gh_client` process which spawns an acceptor socket on the
listen socket (kind of a "subsocket")
- the `gh_client` process is the erlang process that talks to either
the telnet chat clients, or eventually web browsers.
- send `ok` back to whomever called us
- Next let's look at how clients are started up
- `gh_client` is called `gh_client` because from the perspective of our HTTP
daemon, that is a client. `gh_client` is the representation of clients within
the context of our HTTP server.
- analogously, to disambiguate directionality re "encode"/"decode", usually the
directionality is from the perspective of the program. This can be
counterintuitive, because the program's perspective is usually the opposite
of a human's; e.g. Binary data is clear to a program, but its representation
as plain text is opaque.
- In Erlang you always need to think about perspective
- Last call in the call chain was `gh_client:start()`. We expect it to return
the PID of the process that talks to clients.
```
%% gh_client.erl
-spec start(ListenSocket) -> Result
when ListenSocket :: gen_tcp:socket(),
Result :: {ok, pid()}
| {error, Reason},
Reason :: {already_started, pid()}
| {shutdown, term()}
| term().
%% @private
%% How the gh_client_man or a prior gh_client kicks things off.
%% This is called in the context of gh_client_man or the prior gh_client.
start(ListenSocket) ->
gh_client_sup:start_acceptor(ListenSocket).
%% gh_client_sup.erl
-spec start_acceptor(ListenSocket) -> Result
when ListenSocket :: gen_tcp:socket(),
Result :: {ok, pid()}
| {error, Reason},
Reason :: {already_started, pid()}
| {shutdown, term()}
| term().
%% @private
%% Spawns the first listener at the request of the gh_client_man when
%% gex_httpd:listen/1 is called, or the next listener at the request of the
%% currently listening gh_client when a connection is made.
%%
%% Error conditions, supervision strategies and other important issues are
%% explained in the supervisor module docs:
%% http://erlang.org/doc/man/supervisor.html
start_acceptor(ListenSocket) ->
supervisor:start_child(?MODULE, [ListenSocket]).
```
- Reference:
![](./etc/observer.png)
- `gh_client_sup` is the \[supervisor\] responsible for restarting client
processes when they crash.
- he is tasked at this moment with starting one. Let's see how that goes
```erlang
%% gh_client_sup.erl
-spec start_acceptor(ListenSocket) -> Result
when ListenSocket :: gen_tcp:socket(),
Result :: {ok, pid()}
| {error, Reason},
Reason :: {already_started, pid()}
| {shutdown, term()}
| term().
%% @private
%% Spawns the first listener at the request of the gh_client_man when
%% gex_httpd:listen/1 is called, or the next listener at the request of the
%% currently listening gh_client when a connection is made.
%%
%% Error conditions, supervision strategies and other important issues are
%% explained in the supervisor module docs:
%% http://erlang.org/doc/man/supervisor.html
start_acceptor(ListenSocket) ->
supervisor:start_child(?MODULE, [ListenSocket]).
```
- If we look in the configuration for `gh_client_sup`, we see this:
```erlang
-spec init(none) -> {ok, {supervisor:sup_flags(), [supervisor:child_spec()]}}.
%% @private
%% The OTP init/1 function.
init(none) ->
RestartStrategy = {simple_one_for_one, 1, 60},
Client = {gh_client,
{gh_client, start_link, []},
temporary,
brutal_kill,
worker,
[gh_client]},
{ok, {RestartStrategy, [Client]}}.
```
- my eyes are drawn to `{gh_client, start_link, []}`
- probably that's what's called to spawn one of the worker processes
- let's look
```erlang
%% gh_client.erl
-spec start_link(ListenSocket) -> Result
when ListenSocket :: gen_tcp:socket(),
Result :: {ok, pid()}
| {error, Reason},
Reason :: {already_started, pid()}
| {shutdown, term()}
| term().
%% @private
%% This is called by the gh_client_sup. While start/1 is called to iniate a startup
%% (essentially requesting a new worker be started by the supervisor), this is
%% actually called in the context of the supervisor.
start_link(ListenSocket) ->
proc_lib:start_link(?MODULE, init, [self(), ListenSocket]).
```
- Any time you see a 3-tuple of `{Module, FunctionName, ArgumentList}`,
probably that's information about how to call some function
- In this case, this is saying "to start one of the `gh_client` processes, we
call `gh_client:init(SupervisorPID, ListenSocket)`"
Let's take a look
```erlang
%% gh_client.erl
-spec init(Parent, ListenSocket) -> no_return()
when Parent :: pid(),
ListenSocket :: gen_tcp:socket().
%% @private
%% This is the first code executed in the context of the new worker itself.
%% This function does not have any return value, as the startup return is
%% passed back to the supervisor by calling proc_lib:init_ack/2.
%% We see the initial form of the typical arity-3 service loop form here in the
%% call to listen/3.
init(Parent, ListenSocket) ->
ok = io:format("~p Listening.~n", [self()]),
Debug = sys:debug_options([]),
ok = proc_lib:init_ack(Parent, {ok, self()}),
listen(Parent, Debug, ListenSocket).
```
- Ok let's look at the `listen/3` function
```erlang
-spec listen(Parent, Debug, ListenSocket) -> no_return()
when Parent :: pid(),
Debug :: [sys:dbg_opt()],
ListenSocket :: gen_tcp:socket().
%% @private
%% This function waits for a TCP connection. The owner of the socket is still
%% the gh_client_man (so it can still close it on a call to gh_client_man:ignore/0),
%% but the only one calling gen_tcp:accept/1 on it is this process. Closing the socket
%% is one way a manager process can gracefully unblock child workers that are blocking
%% on a network accept.
%%
%% Once it makes a TCP connection it will call start/1 to spawn its successor.
listen(Parent, Debug, ListenSocket) ->
case gen_tcp:accept(ListenSocket) of
{ok, Socket} ->
{ok, _} = start(ListenSocket),
{ok, Peer} = inet:peername(Socket),
ok = io:format("~p Connection accepted from: ~p~n", [self(), Peer]),
ok = gh_client_man:enroll(),
State = #s{socket = Socket},
loop(Parent, Debug, State);
{error, closed} ->
ok = io:format("~p Retiring: Listen socket closed.~n", [self()]),
exit(normal)
end.
```
- The lines that jump out to me are
```erlang
ok = gh_client_man:enroll(),
State = #s{socket = Socket},
loop(Parent, Debug, State);
```
- The `gh_client_man` module is responsible for keeping track of all the
running clients. So probably `gh_client_man:enroll(self())` is just
informing `gh_client_man` that this `gh_client` instance exists.
If we look, that's precisely what's happening
```erlang
%% gh_client_man.erl
%% remember, enroll/0 is running in the context of the calling code, and
%% do_enroll/2 is running in the context of the gh_client_man process
-spec enroll() -> ok.
%% @doc
%% Clients register here when they establish a connection.
%% Other processes can enroll as well.
enroll() ->
gen_server:cast(?MODULE, {enroll, self()}).
%% ...
-spec do_enroll(Pid, State) -> NewState
when Pid :: pid(),
State :: state(),
NewState :: state().
do_enroll(Pid, State = #s{clients = Clients}) ->
case lists:member(Pid, Clients) of
false ->
Mon = monitor(process, Pid),
ok = io:format("Monitoring ~tp @ ~tp~n", [Pid, Mon]),
State#s{clients = [Pid | Clients]};
true ->
State
end.
```
- Next line is `loop(Parent, Debug, State)`. Let's look at `gh_client:loop/3`
```erlang
-spec loop(Parent, Debug, State) -> no_return()
when Parent :: pid(),
Debug :: [sys:dbg_opt()],
State :: state().
%% @private
%% The service loop itself. This is the service state. The process blocks on receive
%% of Erlang messages, TCP segments being received themselves as Erlang messages.
loop(Parent, Debug, State = #s{socket = Socket}) ->
ok = inet:setopts(Socket, [{active, once}]),
receive
{tcp, Socket, <<"bye\r\n">>} ->
ok = io:format("~p Client saying goodbye. Bye!~n", [self()]),
ok = gen_tcp:send(Socket, "Bye!\r\n"),
ok = gen_tcp:shutdown(Socket, read_write),
exit(normal);
{tcp, Socket, Message} ->
ok = io:format("~p received: ~tp~n", [self(), Message]),
ok = gh_client_man:echo(Message),
loop(Parent, Debug, State);
{relay, Sender, Message} when Sender == self() ->
ok = gen_tcp:send(Socket, ["Message from YOU: ", Message]),
loop(Parent, Debug, State);
{relay, Sender, Message} ->
From = io_lib:format("Message from ~tp: ", [Sender]),
ok = gen_tcp:send(Socket, [From, Message]),
loop(Parent, Debug, State);
{tcp_closed, Socket} ->
ok = io:format("~p Socket closed, retiring.~n", [self()]),
exit(normal);
{system, From, Request} ->
sys:handle_system_msg(Request, From, Parent, ?MODULE, Debug, State);
Unexpected ->
ok = io:format("~p Unexpected message: ~tp", [self(), Unexpected]),
loop(Parent, Debug, State)
end.
```
- I'll let you figure this one out
- I think the picture is clear. There's a lot of moving parts, but the basic
principle is as follows:
- `gh_client` instances are like infinity-spawn receptionists. Every time a
web browser wants to talk to our server, we spawn a `gh_client` instance
that talks to the web browser.
- `gh_client_man` is responsible for any logic that spans across different
`gh_client` instances (e.g. relaying messages).
- Everything else is boilerplate
- So our task is to remove the relay-messages logic, and replace it with http
parse/respond logic.