diff --git a/README.md b/README.md index d81e5e1..2c5162a 100644 --- a/README.md +++ b/README.md @@ -68,6 +68,24 @@ Adapt this to your Linux distribution. ## Notes +### Convention: \[brackets\] for jargon + +- You know how sometimes people will intermix technical jargon which has a very + specific context-local definition with common parlance? +- Our notational convention is to put \[jargon terms\] in square braces to warn + the reader that the word is meant in some extremely precise technical sense, + and the word doesn't necessarily mean what the dictionary says it means. +- Specifically, \[supervisor\] is a jargon term that is standard in Erlang. +- Do not confuse \[supervisor\] with \[manager\]. \[Manager\] is AFAIK + Craig-specific nomenclature. +- A \[supervisor\] is (roughly) a process that is in charge of a bunch of child + processes. It is responsible for restarting processes that crash. (Yes Craig + I know it's more nuanced than that). +- The other common pattern is a \[`gen_server`\]. +- If all you take away from this document is that erlang has things called + \[supervisor\]s and things called \[`gen_server`\]s, consider that a good + day. + ### Big Picture: telnet chat server -> HTTP server - The default project (see initial commit) is a telnet echo server. It's like @@ -105,30 +123,441 @@ These are heuristics that are good starting points - in Observer (`observer:start()`) - named processes are non-fungible (e.g. `gh_client_sup`) - the name can be anything, but conventionally it's the module name - - fungible processes have numbers (PIDs) (e.g. the `gh_client` code) + - fungible processes have numbers (PIDs) (e.g. the + `gh_client` code, which is the Erlang process that was + on the other end of the conversation with the `telnet` + windows) - named processes also have PIDs, they just also have names ![](./etc/observer.png) +- you will want to get in the habit of any time you read code, always asking + what process context the code is running in. +- it is **NOT** the case that all code in module `foo` runs inside the context + of process `foo`. It is **very** important that you make sure you understand + that distinction, and always know where code is running. ### Following the call chain of `gex_httpd:listen(8080)` -- Reference commit: `49a09d192c6f2380c5186ec7d81e98785d667214` -- By default, the telnet server doesn't occupy a port -- `gex_httpd:listen(8080)` tells it to listen on port 8080 +- Reference commit: `49a09d192c6f2380c5186ec7d81e98785d667214` +- By default, the telnet server doesn't occupy a port +- `gex_httpd:listen(8080)` tells it to listen on port 8080 + + ```erlang + %% gex_httpd.erl + -spec listen(PortNum) -> Result + when PortNum :: inet:port_num(), + Result :: ok + | {error, {listening, inet:port_num()}}. + %% @doc + %% Make the server start listening on a port. + %% Returns an {error, Reason} tuple if it is already listening. + + listen(PortNum) -> + gh_client_man:listen(PortNum). + + + %% gh_client_man:listen(8080) + listen(PortNum) -> + gen_server:call(?MODULE, {listen, PortNum}). + ``` + +- So this is a bit tricky. +- The code inside that function runs in the context of the `gex_httpd` + process (or whatever calling process) + + - the effect of that code is to send a message to the `gh_client_man` + process (`= ?MODULE`) + - that message is `{listen, 8080}` + - in general, it's `gen_server:call(PID, Message)` + - every process has a "mailbox" of messages. the process usually just sits + there doing nothing until it gets a message, and then does something + deterministically in response to the message. + - `gen_server`, \[`supervisor`\], etc, all are standard library factoring-outs + of common patterns of process configuration + - The low-level primitive to receive messages is `receive`. We'll see its + use later when we look at the `gh_client` code. + - See [pingpong example](./etc/pingpong.erl) for a simplified example. + [Permalink.](https://git.qpq.swiss/QPQ-AG/gex/src/commit/28193491e7dabadac9bac756b44ee2ea3869eede/etc/pingpong.erl) + - All of this `gen_server` nonsense is a bunch of boilerplate that rewrites + to a bunch of `receive`s + +#### Very Important: casts v. calls + +- **Very important:** sometimes you will also see `gen_server:cast(PID, + Message).` + + It's very important that you understand the difference + +- So in our example + - `gex_httpd` makes a call to `gh_client_man` + - `gex_httpd` sends the message `{listen, 8080}` to `gh_client_man` + - he is going to sit there and wait until `gh_client_man` sends him a + message back. this is what makes a call a call. if `gh_client_man` never + responds, `gex_httpd` will just sit there forever waiting, and never move + on with his life. + +- in a cast, the message is sent and you move on with your day +- think of calls like actual phone calls, where the other person has to + answer, but if they don't, there's no voicemail, the phone just rings + forever and you're just stuck listening to the phone ringing like sisyphus + (there's an option of course to call with a timeout, etc... simplifying). +- casts are like text messages. You send them. maybe you get a text back. who + knows. + +#### Continuing + +- Inside of `gh_client_man`'s own process context, it listens for calls in the + `handle_call` function + + ```erlang + %% gh_client_man.erl + + -spec handle_call(Message, From, State) -> Result + when Message :: term(), + From :: {pid(), reference()}, + State :: state(), + Result :: {reply, Response, NewState} + | {noreply, State}, + Response :: ok + | {error, {listening, inet:port_number()}}, + NewState :: state(). + %% @private + %% The gen_server:handle_call/3 callback. + %% See: http://erlang.org/doc/man/gen_server.html#Module:handle_call-3 + + handle_call({listen, PortNum}, _, State) -> + {Response, NewState} = do_listen(PortNum, State), + {reply, Response, NewState}; + handle_call(Unexpected, From, State) -> + ok = io:format("~p Unexpected call from ~tp: ~tp~n", [self(), From, Unexpected]), + {noreply, State}. + ``` + +- following the call chain, we look at `gh_client_man:do_listen/2` + + This is running inside the `gh_client_man` process context + + ```erlang + %% gh_client_man.erl + + -spec do_listen(PortNum, State) -> {Result, NewState} + when PortNum :: inet:port_number(), + State :: state(), + Result :: ok + | {error, Reason :: {listening, inet:port_number()}}, + NewState :: state(). + %% @private + %% The "doer" procedure called when a "listen" message is received. + + do_listen(PortNum, State = #s{port_num = none}) -> + SocketOptions = + [inet6, + {packet, line}, + {active, once}, + {mode, binary}, + {keepalive, true}, + {reuseaddr, true}], + {ok, Listener} = gen_tcp:listen(PortNum, SocketOptions), + {ok, _} = gh_client:start(Listener), + {ok, State#s{port_num = PortNum, listener = Listener}}; + do_listen(_, State = #s{port_num = PortNum}) -> + ok = io:format("~p Already listening on ~p~n", [self(), PortNum]), + {{error, {listening, PortNum}}, State}. + ``` + + - If we're already listening (i.e. our state already has a port), we tell + the calling process to fuck off. + + - If we don't have a port number, we + - make a TCP listen socket on that port + - start a `gh_client` process which spawns an acceptor socket on the + listen socket (kind of a "subsocket") + - the `gh_client` process is the erlang process that talks to either + the telnet chat clients, or eventually web browsers. + - send `ok` back to whomever called us + + +- Next let's look at how clients are started up + +- `gh_client` is called `gh_client` because from the perspective of our HTTP + daemon, that is a client. `gh_client` is the representation of clients within + the context of our HTTP server. +- analogously, to disambiguate directionality re "encode"/"decode", usually the + directionality is from the perspective of the program. This can be + counterintuitive, because the program's perspective is usually the opposite + of a human's; e.g. Binary data is clear to a program, but its representation + as plain text is opaque. +- In Erlang you always need to think about perspective + +- Last call in the call chain was `gh_client:start()`. We expect it to return + the PID of the process that talks to clients. ``` -%% gex_httpd.erl --spec listen(PortNum) -> Result - when PortNum :: inet:port_num(), - Result :: ok - | {error, {listening, inet:port_num()}}. -%% @doc -%% Make the server start listening on a port. -%% Returns an {error, Reason} tuple if it is already listening. +%% gh_client.erl -listen(PortNum) -> - gh_client_man:listen(PortNum). +-spec start(ListenSocket) -> Result + when ListenSocket :: gen_tcp:socket(), + Result :: {ok, pid()} + | {error, Reason}, + Reason :: {already_started, pid()} + | {shutdown, term()} + | term(). +%% @private +%% How the gh_client_man or a prior gh_client kicks things off. +%% This is called in the context of gh_client_man or the prior gh_client. + +start(ListenSocket) -> + gh_client_sup:start_acceptor(ListenSocket). +%% gh_client_sup.erl + +-spec start_acceptor(ListenSocket) -> Result + when ListenSocket :: gen_tcp:socket(), + Result :: {ok, pid()} + | {error, Reason}, + Reason :: {already_started, pid()} + | {shutdown, term()} + | term(). +%% @private +%% Spawns the first listener at the request of the gh_client_man when +%% gex_httpd:listen/1 is called, or the next listener at the request of the +%% currently listening gh_client when a connection is made. %% +%% Error conditions, supervision strategies and other important issues are +%% explained in the supervisor module docs: +%% http://erlang.org/doc/man/supervisor.html + +start_acceptor(ListenSocket) -> + supervisor:start_child(?MODULE, [ListenSocket]). +``` + +- Reference: + ![](./etc/observer.png) + +- `gh_client_sup` is the \[supervisor\] responsible for restarting client + processes when they crash. +- he is tasked at this moment with starting one. Let's see how that goes + + ```erlang + %% gh_client_sup.erl + + -spec start_acceptor(ListenSocket) -> Result + when ListenSocket :: gen_tcp:socket(), + Result :: {ok, pid()} + | {error, Reason}, + Reason :: {already_started, pid()} + | {shutdown, term()} + | term(). + %% @private + %% Spawns the first listener at the request of the gh_client_man when + %% gex_httpd:listen/1 is called, or the next listener at the request of the + %% currently listening gh_client when a connection is made. + %% + %% Error conditions, supervision strategies and other important issues are + %% explained in the supervisor module docs: + %% http://erlang.org/doc/man/supervisor.html + + start_acceptor(ListenSocket) -> + supervisor:start_child(?MODULE, [ListenSocket]). + ``` + +- If we look in the configuration for `gh_client_sup`, we see this: + + ```erlang + -spec init(none) -> {ok, {supervisor:sup_flags(), [supervisor:child_spec()]}}. + %% @private + %% The OTP init/1 function. + + init(none) -> + RestartStrategy = {simple_one_for_one, 1, 60}, + Client = {gh_client, + {gh_client, start_link, []}, + temporary, + brutal_kill, + worker, + [gh_client]}, + {ok, {RestartStrategy, [Client]}}. + ``` + +- my eyes are drawn to `{gh_client, start_link, []}` +- probably that's what's called to spawn one of the worker processes +- let's look + + + ```erlang + %% gh_client.erl + + -spec start_link(ListenSocket) -> Result + when ListenSocket :: gen_tcp:socket(), + Result :: {ok, pid()} + | {error, Reason}, + Reason :: {already_started, pid()} + | {shutdown, term()} + | term(). + %% @private + %% This is called by the gh_client_sup. While start/1 is called to iniate a startup + %% (essentially requesting a new worker be started by the supervisor), this is + %% actually called in the context of the supervisor. + + start_link(ListenSocket) -> + proc_lib:start_link(?MODULE, init, [self(), ListenSocket]). + ``` + +- Any time you see a 3-tuple of `{Module, FunctionName, ArgumentList}`, + probably that's information about how to call some function + +- In this case, this is saying "to start one of the `gh_client` processes, we + call `gh_client:init(SupervisorPID, ListenSocket)`" + + Let's take a look + + ```erlang + %% gh_client.erl + + -spec init(Parent, ListenSocket) -> no_return() + when Parent :: pid(), + ListenSocket :: gen_tcp:socket(). + %% @private + %% This is the first code executed in the context of the new worker itself. + %% This function does not have any return value, as the startup return is + %% passed back to the supervisor by calling proc_lib:init_ack/2. + %% We see the initial form of the typical arity-3 service loop form here in the + %% call to listen/3. + + init(Parent, ListenSocket) -> + ok = io:format("~p Listening.~n", [self()]), + Debug = sys:debug_options([]), + ok = proc_lib:init_ack(Parent, {ok, self()}), + listen(Parent, Debug, ListenSocket). + ``` + +- Ok let's look at the `listen/3` function + + ```erlang + -spec listen(Parent, Debug, ListenSocket) -> no_return() + when Parent :: pid(), + Debug :: [sys:dbg_opt()], + ListenSocket :: gen_tcp:socket(). + %% @private + %% This function waits for a TCP connection. The owner of the socket is still + %% the gh_client_man (so it can still close it on a call to gh_client_man:ignore/0), + %% but the only one calling gen_tcp:accept/1 on it is this process. Closing the socket + %% is one way a manager process can gracefully unblock child workers that are blocking + %% on a network accept. + %% + %% Once it makes a TCP connection it will call start/1 to spawn its successor. + + listen(Parent, Debug, ListenSocket) -> + case gen_tcp:accept(ListenSocket) of + {ok, Socket} -> + {ok, _} = start(ListenSocket), + {ok, Peer} = inet:peername(Socket), + ok = io:format("~p Connection accepted from: ~p~n", [self(), Peer]), + ok = gh_client_man:enroll(), + State = #s{socket = Socket}, + loop(Parent, Debug, State); + {error, closed} -> + ok = io:format("~p Retiring: Listen socket closed.~n", [self()]), + exit(normal) + end. + ``` + +- The lines that jump out to me are + + ```erlang + ok = gh_client_man:enroll(), + State = #s{socket = Socket}, + loop(Parent, Debug, State); + ``` + +- The `gh_client_man` module is responsible for keeping track of all the + running clients. So probably `gh_client_man:enroll(self())` is just + informing `gh_client_man` that this `gh_client` instance exists. + + If we look, that's precisely what's happening + + ```erlang + %% gh_client_man.erl + %% remember, enroll/0 is running in the context of the calling code, and + %% do_enroll/2 is running in the context of the gh_client_man process + + -spec enroll() -> ok. + %% @doc + %% Clients register here when they establish a connection. + %% Other processes can enroll as well. + + enroll() -> + gen_server:cast(?MODULE, {enroll, self()}). + + %% ... + -spec do_enroll(Pid, State) -> NewState + when Pid :: pid(), + State :: state(), + NewState :: state(). + + do_enroll(Pid, State = #s{clients = Clients}) -> + case lists:member(Pid, Clients) of + false -> + Mon = monitor(process, Pid), + ok = io:format("Monitoring ~tp @ ~tp~n", [Pid, Mon]), + State#s{clients = [Pid | Clients]}; + true -> + State + end. + ``` + +- Next line is `loop(Parent, Debug, State)`. Let's look at `gh_client:loop/3` + + ```erlang + -spec loop(Parent, Debug, State) -> no_return() + when Parent :: pid(), + Debug :: [sys:dbg_opt()], + State :: state(). + %% @private + %% The service loop itself. This is the service state. The process blocks on receive + %% of Erlang messages, TCP segments being received themselves as Erlang messages. + + loop(Parent, Debug, State = #s{socket = Socket}) -> + ok = inet:setopts(Socket, [{active, once}]), + receive + {tcp, Socket, <<"bye\r\n">>} -> + ok = io:format("~p Client saying goodbye. Bye!~n", [self()]), + ok = gen_tcp:send(Socket, "Bye!\r\n"), + ok = gen_tcp:shutdown(Socket, read_write), + exit(normal); + {tcp, Socket, Message} -> + ok = io:format("~p received: ~tp~n", [self(), Message]), + ok = gh_client_man:echo(Message), + loop(Parent, Debug, State); + {relay, Sender, Message} when Sender == self() -> + ok = gen_tcp:send(Socket, ["Message from YOU: ", Message]), + loop(Parent, Debug, State); + {relay, Sender, Message} -> + From = io_lib:format("Message from ~tp: ", [Sender]), + ok = gen_tcp:send(Socket, [From, Message]), + loop(Parent, Debug, State); + {tcp_closed, Socket} -> + ok = io:format("~p Socket closed, retiring.~n", [self()]), + exit(normal); + {system, From, Request} -> + sys:handle_system_msg(Request, From, Parent, ?MODULE, Debug, State); + Unexpected -> + ok = io:format("~p Unexpected message: ~tp", [self(), Unexpected]), + loop(Parent, Debug, State) + end. + ``` + +- I'll let you figure this one out +- I think the picture is clear. There's a lot of moving parts, but the basic + principle is as follows: + + - `gh_client` instances are like infinity-spawn receptionists. Every time a + web browser wants to talk to our server, we spawn a `gh_client` instance + that talks to the web browser. + - `gh_client_man` is responsible for any logic that spans across different + `gh_client` instances (e.g. relaying messages). + - Everything else is boilerplate +- So our task is to remove the relay-messages logic, and replace it with http + parse/respond logic.