[Feature Request] Address to source file mappings #498

mid-kid · 2020-04-02T19:44:08Z

I'd like to have some way to figure out in what source file, and on what line a certain ROM address is located.

The reason behind this is because I do a lot of disassembly, and lately have been porting different version of crystal to pokecrystal, and have noticed I do this a lot.
I have a tool that helps me find the nearest label by reading the symfile, and figures out the filename through a dirty grep, and that helps me, but isn't always accurate and oftentimes requires hand-decoding the instructions by comparing with the hex editor. This is time-consuming.
I've also written tools around the formerly mentioned method, but they of course aren't as accurate as would be preferred.

I know this is a rather specific and maybe a bit out-of-the-way request for an assembler to implement properly, so I'd also be appreciative to get pointers as to how this could be achieved by hacking the codebase.

The text was updated successfully, but these errors were encountered:

LIJI32 · 2020-04-26T22:47:25Z

The ISX format handles address-to-source mapping using the various Debug Record types (0x20, 0x21, and 0x22). These aren't documented, but I've reversed the 0x20 record type. It starts with a type byte (containing 0x20) followed by a uint32_t size. Followed by the size comes encrypted data size bytes long. The entire buffer can be decrypted with this C snippet (Assumes little endian):

{
    FILE *f = stdin;
    
    uint32_t key1 = 0;
    uint32_t key2 = 0;
    uint8_t type;
    fread(&type, 1, 1, f);
    assert(type == 0x20);
    
    uint32_t size;
    fread(&size, sizeof(size), 1, f);
    key1 = ((int32_t)(int8_t)size) ^ 0xAA;
    
    for (unsigned i = 0; i < size; i++) {
        uint8_t byte;
        fread(&byte, 1, 1, f);
        putchar(byte ^ key1);
        unsigned temp = key1 + key2 + 0x43;
        key1 = temp;
        key2 = (temp >> 1);
    }
    
    return 0;
}

Once decrypted, the buffer is structured as follows:

A uint16_t count of files, followed by count NULL-terminated path names of the source files
A uint16_t count of structures, followed by count 12-byte structures

The 12-byte structures contain a 16-bit index to the file list from before, followed by a zero-based 16-bit line number, then follow 32-bit start and end addresses.

(All data types a LE)

So, for example, if at index 2 we have the path "/Users/lior/GameBoy/Demos/demo.asm", the entry 02 00 09 00 8A FF 00 00 8B FF 00 00 means that the two bytes starting at 0xFF8A were generated by the line at demo.asm:10.

Of course, you don't have to use the ISX format, others might find it more convenient to use a custom, more compact, and non-intrusive (Doesn't alter the ROM itself) format. Personally, I'd use a similar format, but make the paths relative (Use the same path that the rgbasm invocation uses), replace all numbers with uleb128, and replace the start-end format with a uleb128 size field. The file index should become 1-based instead of 0-based, so "file 0" can be used to represent bytes that were not generated by a specific line (e.g. padding bytes for banks). A very similar format is used by DWARF on ELF- and Mach-O-based platforms.

mid-kid · 2020-04-26T23:51:21Z

Can't bytes not generated by source code simply have no entry in the table?

aaaaaa123456789 · 2020-04-27T00:00:03Z

How would you handle macros, rept blocks, and basically everything that isn't just an instruction or a db? In general there's a stack of things generating code, and it's not uncommon for several layers of this stack to be needed to understand what's going on.

LIJI32 · 2020-04-27T07:42:44Z

@mid-kid If you only use a size field, instead of start and end, you have to have some kind of way to skip

@aaaaaa123456789 I don't think rept specifically needs special handling, but as for the rest, isas appears to always use the "deepest" line, e.g. the eventual d* or instruction that generate the byte. This behavior can be refined to allow some sort of a "nofollow" attribute for macros, which is helpful for pseudo instructions such as callab. Another option is to complicate the format a bit more and let it store the entire code generation stack instead of a single line.

aaaaaa123456789 · 2020-04-27T07:44:55Z

Another option is to complicate the format a bit more and let it store the entire code generation stack instead of a single line.

I vastly prefer this option, given that macro stacks (and even include stacks) can grow quite large.

LIJI32 · 2020-04-27T07:52:27Z

You do have to think about how consumers of this formats (a.k.a debugging emulators) will be able to use this data. For the most part, they will use this information to display the "correct" line when stepping through code or show a back trace. When the format provides more than one line and no canonical line to show, it will have to guess and will probably display a less "helpful" line.

mid-kid · 2020-04-27T12:07:40Z

Personally I think the only line that matters is the top of the macro stack, after any INCBINs. I don't know why ISX would have it descend all the way to the bottom of the stack, as that's the least interesting to anyone doing source file debugging.
I should also point out how ELF does the same thing. If you open any debuggable file in GDB, it'll show you the line with the macro before expansion, regardless of whether the macro expands to an expression or multiple lines of code.

LIJI32 · 2020-04-27T16:59:05Z

On assemblers it's a bit more tricky as macros are often used as inline functions, and you want to be able to debug them.

ISSOtm · 2020-05-06T14:39:22Z

I think embedding the full stack should be useful, but also providing info on what kind of stack level it is. Imo, the "canonical" line is the last one that isn't coming from a macro; so if you have INCLUDE -> INCLUDE -> macro -> REPT -> INCLUDE -> macro, it'd be the second INCLUDE.

The problem of representing file stacks in a compact way feels awfully similar to #491, so I'd favor fixing the problem there, and then integrating that solution into those mappings. This would also probably favor code reuse.

mid-kid · 2020-05-06T15:33:10Z

That sounds like the best course of action to me, if you're intent in embedding the full file stack. And yeah, a "canonical" pointer would be nice, but it could be simply documented as "the entry right before the first non-INCLUDE entry in the stack", and should be very easy to implement by anyone as a simple loop+break.

daid · 2020-07-25T14:09:48Z

I have a very dirty patch that adds a symbol per cpu instruction
https://github.com/daid/rgbds-live/blob/master/rgbds.patch
It is far from ideal as output, but it works as a proof of concept. Even with the limited info, and not the full stack, it is very useful.

avivace · 2023-09-13T19:17:30Z

@ISSOtm @Rangi42 I'm interested in this, especially given the way it enables integrating RGBDS with other tools (e.g. rgbds-live). Do we have a clear enough specification in mind? I could assign a bounty if that could help.

aaaaaa123456789 · 2023-09-13T19:18:59Z

I don't mind writing the actual spec if this kind of help is needed, but I wouldn't do that without input from the people actually implementing this.

ISSOtm · 2023-09-13T19:51:31Z

The main obstacle I've found with this, is that I don't know what kind of format would be desirable. I'd like to make it easier to consume than to produce, and especially to be fast to traverse.

I'm not sure how to design the format such that going from ROM addr to line, or vice-versa, is reasonably efficient; especially for bigger / more complex ROMs.

I'm not sure how to take the "file stack" into account, either.

aaaaaa123456789 · 2023-09-30T18:58:03Z

I just realised I never mentioned here that I've been working on this.

ISSOtm added enhancement Typically new features; lesser priority than bugs rgbasm This affects RGBASM rgblink This affects RGBLINK labels Apr 3, 2020

ISSOtm added this to the v0.5.1 milestone Mar 6, 2021

Rangi42 modified the milestones: v0.5.1, v0.6.0, v1.0.0 Apr 25, 2021

Rangi42 removed this from the v1.0.0 milestone Aug 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Address to source file mappings #498

[Feature Request] Address to source file mappings #498

mid-kid commented Apr 2, 2020

LIJI32 commented Apr 26, 2020 •

edited

Loading

mid-kid commented Apr 26, 2020

aaaaaa123456789 commented Apr 27, 2020

LIJI32 commented Apr 27, 2020

aaaaaa123456789 commented Apr 27, 2020

LIJI32 commented Apr 27, 2020

mid-kid commented Apr 27, 2020

LIJI32 commented Apr 27, 2020

ISSOtm commented May 6, 2020

mid-kid commented May 6, 2020

daid commented Jul 25, 2020

avivace commented Sep 13, 2023

aaaaaa123456789 commented Sep 13, 2023

ISSOtm commented Sep 13, 2023

aaaaaa123456789 commented Sep 30, 2023

[Feature Request] Address to source file mappings #498

[Feature Request] Address to source file mappings #498

Comments

mid-kid commented Apr 2, 2020

LIJI32 commented Apr 26, 2020 • edited Loading

mid-kid commented Apr 26, 2020

aaaaaa123456789 commented Apr 27, 2020

LIJI32 commented Apr 27, 2020

aaaaaa123456789 commented Apr 27, 2020

LIJI32 commented Apr 27, 2020

mid-kid commented Apr 27, 2020

LIJI32 commented Apr 27, 2020

ISSOtm commented May 6, 2020

mid-kid commented May 6, 2020

daid commented Jul 25, 2020

avivace commented Sep 13, 2023

aaaaaa123456789 commented Sep 13, 2023

ISSOtm commented Sep 13, 2023

aaaaaa123456789 commented Sep 30, 2023

LIJI32 commented Apr 26, 2020 •

edited

Loading