Bug 159

Summary: 56k keywords not allowed as symbol names in non-56k code
Product: RMAC Reporter: James Jones <atari>
Component: CoreAssignee: Shamus Hammons <jlhamm>
Severity: enhancement CC: ggnkua, jlhamm
Priority: Normal    
Version: unspecified   
Hardware: PC   
OS: Linux   
Attachments: Patch to ignore 56k keywords in !56k mode.
Tentative patch
A better patch
Fix for disabled code

Description James Jones 2020-05-04 11:47:00 CDT
Created attachment 118 [details]
Patch to ignore 56k keywords in !56k mode.

When assembling non-56k code, the 56k-specific keywords are still treated as reserved, preventing their use as symbol names/labels.  This prevents, for example, the Atari 3D library/demo code from being compiled without modification because it uses the symbol names "x" and "y", which appear to be keywords for modifier markup in 56k code.

To avoid this, the attached patch excludes 56k keywords from being tokenized as keywords when not assembling in 56k mode.  This may cause weird situations where a symbol with one of these names is accessible in non-56k code blocks, but can't be accessed within 56k code blocks, but that seems less bad than the current situation where existing code can't be compiled without modification and common variable names like "x" can not be assembled/referenced at all.
Comment 1 ggn 2020-05-08 12:42:24 CDT
Created attachment 119 [details]
Tentative patch

So, hello there James and thanks for the patch.

As we discussed in the rmac/rln thread at Atariage, here's my proposed .patch. However I'm not happy with it. It still fails with the following source:

a:dc.w 0
x:dc.w 0
y:dc.w 1
z:dc.w 2
.a:dc.w 0
.x:dc.w 0
.y:dc.w 1
.z:dc.w 2
dc.w a
dc.w x
dc.w y
dc.w z
dc.w .a
dc.w .x
dc.w .y
dc.w .z

In particular, lines "dc.w a", "dc.w x", "dc.w y", "dc.w .x" will not assemble correctly. I didn't try it but I have a feeling that your patch will fail as well.

The first three are obvious: the symbols are aliased to registers and the tokenizer doesn't know how to distinguish between a legit label name and a register, so it goes with the later. Boom.

The fourth one is more sneaky: the tokenizer thinks that the .x is an extension to a mnemonic. For example in "move.w" the ".w" is its own token. You might ask "but 68000 doesn't have .x, where does this come from?". The answer is 68020+, where ".x" is a legit size for the FPU. So the only thing I can think of is adding weirdo cases throughout the tokenizer and parser.

These are enough cases for me to formulate an opinion about this: I don't like either solution. Yours just adds extra complications to the tokenizer that will have to be extended at some point. Mine misses all cases aliased variable names might be used inside a line. 

For example, I'm thinking of adding Z80 support to rmac. Some of the registers in Z80 are named a,b,c,d,e,f,h,l. For your solution we'll have to extend that "if" clause. For my solution we'll have to add even more crazy checks at various places.

That makes the whole program's quality worse: if people who aren't fluent with the codebase want to start adding features or fix bugs there's more things they should know about, and maybe nobody can explain or remember why that weird snippet of code exists.

I encourage you or anyone to go read up on bug #148 and see how out of control the expression evaluator logic is. Hence we have a whole issue dedicated to rewriting it, even if it's working "fine".

Which sadly brings me to my recommendation to reject both patches. Shamus can add a third opinion here of course because perhaps there is something simple to be done that I haven't figured out yet...
Comment 2 ggn 2022-03-24 06:58:59 CDT
Oh hello, this bug has been here for only 2 years. Nevertheless, time for an update!

Don't think that this issue hasn't been annoying me all this time. It has! But only recently I had an idea on how to tackle it without adding hacks or destroying everything.

In short, I've now split the keywords table (kw.tab) to one file with register names per architecture (68k/gpu-dsp/56001/6502). So in the tokeniser now instead of scanning for all possible register names+keywords we now scan for the specific architecture's register names+keywords. This solves most of the problems in a 'kick the can down the road' way, but at least now it'll be a much smaller problem than what it was before. For example you still won't be able to define a label named 'a' in 6502, but given on how few people currently use the 6502 part of rmac.... I don't foresee any problems.

The ".x" case I mentioned above still remains, but we'll see, perhaps I will add some of the things I did in the first patch I posted here.

In terms of real changes (as there had to be a lot of changes in the code generators due to the register defines changing, these are not very important): the tokeniser now will scan twice in the worst case (i.e. if it doesn't detect a register it'll re-scan for a keyword). This can be fixed by adding the keywords in each register .tab file, but I wanted to get this patch approved as is right now because it's grown enough and I'm worried I'll break something and not be able to go back (git commit without push has been the cause to losing work in the past!).

Patch will be attached after I complete this post. Bear in mind that I based this patch on top of changes of #193, so applying that patch first is probably a necessity.

Anyway, tests, comments etc are welcome!
Comment 3 ggn 2022-03-24 07:00:04 CDT
Created attachment 168 [details]
A better patch
Comment 4 James Jones 2022-04-17 00:07:26 CDT
Thanks for putting more thought into this ggn. I tried to apply your patch and test it today, but it doesn't appear to apply cleanly to top-of-tree rmac (tag v2.1.3). I attempted to apply it as follows:

git am register.diff

But it fails on the second patch in the series (Full git output below). I took a quick look at the patch and stripped out all the trailing windows newline characters (I'm doing this on Linux), but it still failed the same way. Did you accidentally create the patch on top of some other local changes, or forget to include a change in the series? The first failure is here:

direct.c, in "int dgpu(void)":

    robjproc = 0;       // Unset OP assembly
    dsp56001 = 0;       // Unset 56001 assembly
    regbank = BANK_N;   // Set no default register bank
    return 0;

patch is expecting this:

        robjproc = 0;           // Unset OP assembly
        dsp56001 = 0;           // Unset 56001 assembly
+       regbase = regriscbase;  // Update register DFA tables
+       regtab = regrisctab;
+       regcheck = regrisccheck;
+       regaccept = regriscaccept;
        return 0;

--- Full git output from git am register.diff ---

Applying: Get rid of some old and deprecated macros
Applying: Fix for #159: Split register sets according to architecture into different tables so they don't clash with label/symbol names. Modified tokeniser to use different tables when scanning for registers
/home/jjones/jaguar-sdk/.git/modules/tools/src/rmac/rebase-apply/patch:2788: trailing whitespace.

/home/jjones/jaguar-sdk/.git/modules/tools/src/rmac/rebase-apply/patch:2794: trailing whitespace.

/home/jjones/jaguar-sdk/.git/modules/tools/src/rmac/rebase-apply/patch:2800: trailing whitespace.

/home/jjones/jaguar-sdk/.git/modules/tools/src/rmac/rebase-apply/patch:98: new blank line at EOF.
/home/jjones/jaguar-sdk/.git/modules/tools/src/rmac/rebase-apply/patch:265: new blank line at EOF.
error: direct.c: does not match index
error: patch failed: kw.tab:82
error: kw.tab: patch does not apply
error: patch failed: procln.c:437
error: procln.c: patch does not apply
error: patch failed: riscasm.c:202
error: riscasm.c: patch does not apply
error: patch failed: rmac.c:45
error: rmac.c: patch does not apply
error: patch failed: sect.c:18
error: sect.c: patch does not apply
error: patch failed: token.c:1171
error: token.c: patch does not apply
Patch failed at 0002 Fix for #159: Split register sets according to architecture into different tables so they don't clash with label/symbol names. Modified tokeniser to use different tables when scanning for registers
hint: Use 'git am --show-current-patch=diff' to see the failed patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
Comment 5 ggn 2022-04-17 00:24:26 CDT
Happy Easter!

I've built this .patch on top of the work I did for issue #193, so I would suggest we address that one first.
Comment 6 James Jones 2022-04-17 00:36:11 CDT
OK, thanks. It appears to work, but I can't fully confirm as the patch series in issue #193 causes other problems, as noted there.
Comment 7 ggn 2022-04-17 01:52:37 CDT
Created attachment 171 [details]
Fix for disabled code

Ok, that was an interesting edge case.

The tokeniser was trying to be a bit too clever by replacing an already equr'd symbol with a register inside a disabled code block, which then the line parser rejected as an invalid line. We probably shouldn't tokenise anything inside an '.if 0' block except scan for an '.endif' but that's for another issue as this one has done quite a few changes so far.

Bottom line: try the new attached patch. It should fix your reported problem.
Comment 8 ggn 2022-04-17 01:59:23 CDT
You can tell I'm not fully awake yet. I posted the above on the wrong issue. I've re-posted the patch in #193. Sorry for the confusion!
Comment 9 James Jones 2022-04-17 23:52:32 CDT
With all three patch sets applied, I can confirm this allows me to build code with 'x' used as an identifier, and it runs correctly as well.
Comment 10 Shamus Hammons 2022-05-30 16:06:43 CDT
Thanks for the patch; I know at least Linkovitch will be excited about this fix.  :-)