Bug 159 - 56k keywords not allowed as symbol names in non-56k code
Summary: 56k keywords not allowed as symbol names in non-56k code
Status: CONFIRMED
Alias: None
Product: RMAC
Classification: Unclassified
Component: Core (show other bugs)
Version: unspecified
Hardware: PC Linux
: Normal enhancement
Assignee: Shamus Hammons
URL:
Depends on:
Blocks:
 
Reported: 2020-05-04 11:47 CDT by James Jones
Modified: 2020-05-08 12:42 CDT (History)
2 users (show)

See Also:


Attachments
Patch to ignore 56k keywords in !56k mode. (1.27 KB, application/octet-stream)
2020-05-04 11:47 CDT, James Jones
Details
Tentative patch (2.71 KB, patch)
2020-05-08 12:42 CDT, ggn
Details

Note You need to log in before you can comment on or make changes to this bug.
Description James Jones 2020-05-04 11:47:00 CDT
Created attachment 118 [details]
Patch to ignore 56k keywords in !56k mode.

When assembling non-56k code, the 56k-specific keywords are still treated as reserved, preventing their use as symbol names/labels.  This prevents, for example, the Atari 3D library/demo code from being compiled without modification because it uses the symbol names "x" and "y", which appear to be keywords for modifier markup in 56k code.

To avoid this, the attached patch excludes 56k keywords from being tokenized as keywords when not assembling in 56k mode.  This may cause weird situations where a symbol with one of these names is accessible in non-56k code blocks, but can't be accessed within 56k code blocks, but that seems less bad than the current situation where existing code can't be compiled without modification and common variable names like "x" can not be assembled/referenced at all.
Comment 1 ggn 2020-05-08 12:42:24 CDT
Created attachment 119 [details]
Tentative patch

So, hello there James and thanks for the patch.

As we discussed in the rmac/rln thread at Atariage, here's my proposed .patch. However I'm not happy with it. It still fails with the following source:

a:dc.w 0
x:dc.w 0
y:dc.w 1
z:dc.w 2
.a:dc.w 0
.x:dc.w 0
.y:dc.w 1
.z:dc.w 2
dc.w a
dc.w x
dc.w y
dc.w z
dc.w .a
dc.w .x
dc.w .y
dc.w .z

In particular, lines "dc.w a", "dc.w x", "dc.w y", "dc.w .x" will not assemble correctly. I didn't try it but I have a feeling that your patch will fail as well.

The first three are obvious: the symbols are aliased to registers and the tokenizer doesn't know how to distinguish between a legit label name and a register, so it goes with the later. Boom.

The fourth one is more sneaky: the tokenizer thinks that the .x is an extension to a mnemonic. For example in "move.w" the ".w" is its own token. You might ask "but 68000 doesn't have .x, where does this come from?". The answer is 68020+, where ".x" is a legit size for the FPU. So the only thing I can think of is adding weirdo cases throughout the tokenizer and parser.

These are enough cases for me to formulate an opinion about this: I don't like either solution. Yours just adds extra complications to the tokenizer that will have to be extended at some point. Mine misses all cases aliased variable names might be used inside a line. 

For example, I'm thinking of adding Z80 support to rmac. Some of the registers in Z80 are named a,b,c,d,e,f,h,l. For your solution we'll have to extend that "if" clause. For my solution we'll have to add even more crazy checks at various places.

That makes the whole program's quality worse: if people who aren't fluent with the codebase want to start adding features or fix bugs there's more things they should know about, and maybe nobody can explain or remember why that weird snippet of code exists.

I encourage you or anyone to go read up on bug #148 and see how out of control the expression evaluator logic is. Hence we have a whole issue dedicated to rewriting it, even if it's working "fine".

Which sadly brings me to my recommendation to reject both patches. Shamus can add a third opinion here of course because perhaps there is something simple to be done that I haven't figured out yet...