Make Hakuzaru Great Again #22

Merged
zxq9 merged 46 commits from parser into master 2026-05-10 15:26:44 +09:00
Owner

This is a significant refactor, feature enhancement and set of bug and documentation fixes -- all in one. v0.9.x is cray cray.

This is a significant refactor, feature enhancement and set of bug and documentation fixes -- all in one. v0.9.x is cray cray.
zxq9 added 40 commits 2026-05-10 12:34:01 +09:00
So far the interface to hz.erl is mostly unchanged, apart from prepare_aaci/1

Maybe prepare_aaci should be re-exported, but using it is exactly in line with the
'inconvenient but more flexible primitives' that hz_aaci.erl is meant to represent,
so, maybe that is a fine place to have to go for it, dunno.
Also renamed coerce_bindings to erlang_args_to_fate, to match.
Just a little thing I noticed could be improved.
hz_aaci:aaci_get_function_signature is a bit redundant.
We tokenize, and then do the simplest possible recursive descent.

We don't want to evaluate anything, so infix operators are out,
meaning no shunting yard or tree rearranging or LR(1) shenanigans
are necessary, just write the code.

If we want to 'peek', just take the next token, and pass it around
from that point on, until it can actually be consumed.
Slowly chipping away at cases...
Now tests compare the literal parser against the output of the
compiler. The little example contracts we are compiling for the
AACI already had the FATE value in them, in the form of the
instruction
	{'RETURNR', {immediate, FateValue}}
so we just extract that and use it for the tests.
I don't handle underscores in bytes correctly... Nor in integers, for that matter.
This forces us to test for alpha/num/hex enough times that it's now worth making macros for these things.
This saves some effort and probably some performance for things like integers, but I'm mainly doing this in anticipation of string literals, because it would just be ridiculous to read code that lexes string literals twice.
Records are a simple case to detect and handle correctly.

Tuples took an entire rewrite of the little tuple parsing bit of the code.
This doesn't work super consistently in the compiler, for codepoints above 127, but it should work fine for us, so, oh well!
Also signatures.
This seemed like it was going to be insanely insanely complex, but
then it turns out the compiler doesn't accept spaces in qualified
names, so I can just dump periods in the lexer and hit it with
string:split/3. Easy.
There are four major fixes here:
1. some eof tokens were being pattern matched with the wrong arity
2. tuples that are too long actually speculatively parse as an untyped tuple, and then complain that there were too many elements,
3. singleton tuples with a trailing comma are now handled differently to grouping parentheses, consistently between typed and untyped logic
4. the extra return values used to detect untyped singleton tuples are also used to pass the close paren position, so that too_many_elements can report the correct file position too.

Point 4. also completely removes the need for tracking open paren positions that I was doing, and that I thought I would need to do even more of in the ambiguous-open-paren-stack case.
Character literals were the main complexity here, but I threw booleans in as well, since that covers all the major literals.
Sophia bitstrings aren't really something you initialize manually, so we have to make up a literal format for them. Failing that, we just accept arbitrary integers and bytearrays as bitstrings.
I think all of the tests roundtrip now, so if my parser was thorough, the pretty printer should be as thorough.
I reversed the argument order here, since the Format option is sort of kind of almost optional, but I am not sure if that was a good idea.
Any error reasons or paths are just term() still, and ACI doesn't have a defined spec in the compiler, so whatever, but the AACI types, the erlang representation of terms, and the four different kinds of coerce function are all spec'd now.

Also some internal type substitution functions were given types, just in the hopes of catching some errors, but dyalizer doesn't seem to complain at all no matter how badly I break my code. Strange approach to making a type system, but oh well.
zxq9 added 1 commit 2026-05-10 13:16:36 +09:00
zxq9 added 1 commit 2026-05-10 15:02:19 +09:00
zxq9 added 2 commits 2026-05-10 15:10:03 +09:00
zxq9 added 1 commit 2026-05-10 15:15:34 +09:00
zxq9 added 1 commit 2026-05-10 15:22:27 +09:00
zxq9 merged commit b950bb8a67 into master 2026-05-10 15:26:44 +09:00
zxq9 deleted branch parser 2026-05-10 15:26:45 +09:00
Sign in to join this conversation.