Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Address to source file mappings #498

Open
mid-kid opened this issue Apr 2, 2020 · 15 comments
Open

[Feature Request] Address to source file mappings #498

mid-kid opened this issue Apr 2, 2020 · 15 comments
Labels
enhancement Typically new features; lesser priority than bugs rgbasm This affects RGBASM rgblink This affects RGBLINK

Comments

@mid-kid
Copy link
Contributor

mid-kid commented Apr 2, 2020

I'd like to have some way to figure out in what source file, and on what line a certain ROM address is located.

The reason behind this is because I do a lot of disassembly, and lately have been porting different version of crystal to pokecrystal, and have noticed I do this a lot.
I have a tool that helps me find the nearest label by reading the symfile, and figures out the filename through a dirty grep, and that helps me, but isn't always accurate and oftentimes requires hand-decoding the instructions by comparing with the hex editor. This is time-consuming.
I've also written tools around the formerly mentioned method, but they of course aren't as accurate as would be preferred.

I know this is a rather specific and maybe a bit out-of-the-way request for an assembler to implement properly, so I'd also be appreciative to get pointers as to how this could be achieved by hacking the codebase.

@ISSOtm ISSOtm added enhancement Typically new features; lesser priority than bugs rgbasm This affects RGBASM rgblink This affects RGBLINK labels Apr 3, 2020
@LIJI32
Copy link
Member

LIJI32 commented Apr 26, 2020

The ISX format handles address-to-source mapping using the various Debug Record types (0x20, 0x21, and 0x22). These aren't documented, but I've reversed the 0x20 record type. It starts with a type byte (containing 0x20) followed by a uint32_t size. Followed by the size comes encrypted data size bytes long. The entire buffer can be decrypted with this C snippet (Assumes little endian):

{
    FILE *f = stdin;
    
    uint32_t key1 = 0;
    uint32_t key2 = 0;
    uint8_t type;
    fread(&type, 1, 1, f);
    assert(type == 0x20);
    
    uint32_t size;
    fread(&size, sizeof(size), 1, f);
    key1 = ((int32_t)(int8_t)size) ^ 0xAA;
    
    for (unsigned i = 0; i < size; i++) {
        uint8_t byte;
        fread(&byte, 1, 1, f);
        putchar(byte ^ key1);
        unsigned temp = key1 + key2 + 0x43;
        key1 = temp;
        key2 = (temp >> 1);
    }
    
    return 0;
}

Once decrypted, the buffer is structured as follows:

  • A uint16_t count of files, followed by count NULL-terminated path names of the source files
  • A uint16_t count of structures, followed by count 12-byte structures

The 12-byte structures contain a 16-bit index to the file list from before, followed by a zero-based 16-bit line number, then follow 32-bit start and end addresses.

(All data types a LE)

So, for example, if at index 2 we have the path "/Users/lior/GameBoy/Demos/demo.asm", the entry 02 00 09 00 8A FF 00 00 8B FF 00 00 means that the two bytes starting at 0xFF8A were generated by the line at demo.asm:10.

Of course, you don't have to use the ISX format, others might find it more convenient to use a custom, more compact, and non-intrusive (Doesn't alter the ROM itself) format. Personally, I'd use a similar format, but make the paths relative (Use the same path that the rgbasm invocation uses), replace all numbers with uleb128, and replace the start-end format with a uleb128 size field. The file index should become 1-based instead of 0-based, so "file 0" can be used to represent bytes that were not generated by a specific line (e.g. padding bytes for banks). A very similar format is used by DWARF on ELF- and Mach-O-based platforms.

@mid-kid
Copy link
Contributor Author

mid-kid commented Apr 26, 2020

Can't bytes not generated by source code simply have no entry in the table?

@aaaaaa123456789
Copy link
Member

How would you handle macros, rept blocks, and basically everything that isn't just an instruction or a db? In general there's a stack of things generating code, and it's not uncommon for several layers of this stack to be needed to understand what's going on.

@LIJI32
Copy link
Member

LIJI32 commented Apr 27, 2020

@mid-kid If you only use a size field, instead of start and end, you have to have some kind of way to skip

@aaaaaa123456789 I don't think rept specifically needs special handling, but as for the rest, isas appears to always use the "deepest" line, e.g. the eventual d* or instruction that generate the byte. This behavior can be refined to allow some sort of a "nofollow" attribute for macros, which is helpful for pseudo instructions such as callab. Another option is to complicate the format a bit more and let it store the entire code generation stack instead of a single line.

@aaaaaa123456789
Copy link
Member

Another option is to complicate the format a bit more and let it store the entire code generation stack instead of a single line.

I vastly prefer this option, given that macro stacks (and even include stacks) can grow quite large.

@LIJI32
Copy link
Member

LIJI32 commented Apr 27, 2020

You do have to think about how consumers of this formats (a.k.a debugging emulators) will be able to use this data. For the most part, they will use this information to display the "correct" line when stepping through code or show a back trace. When the format provides more than one line and no canonical line to show, it will have to guess and will probably display a less "helpful" line.

@mid-kid
Copy link
Contributor Author

mid-kid commented Apr 27, 2020

Personally I think the only line that matters is the top of the macro stack, after any INCBINs. I don't know why ISX would have it descend all the way to the bottom of the stack, as that's the least interesting to anyone doing source file debugging.
I should also point out how ELF does the same thing. If you open any debuggable file in GDB, it'll show you the line with the macro before expansion, regardless of whether the macro expands to an expression or multiple lines of code.

@LIJI32
Copy link
Member

LIJI32 commented Apr 27, 2020

On assemblers it's a bit more tricky as macros are often used as inline functions, and you want to be able to debug them.

@ISSOtm
Copy link
Member

ISSOtm commented May 6, 2020

I think embedding the full stack should be useful, but also providing info on what kind of stack level it is. Imo, the "canonical" line is the last one that isn't coming from a macro; so if you have INCLUDE -> INCLUDE -> macro -> REPT -> INCLUDE -> macro, it'd be the second INCLUDE.

The problem of representing file stacks in a compact way feels awfully similar to #491, so I'd favor fixing the problem there, and then integrating that solution into those mappings. This would also probably favor code reuse.

@mid-kid
Copy link
Contributor Author

mid-kid commented May 6, 2020

That sounds like the best course of action to me, if you're intent in embedding the full file stack. And yeah, a "canonical" pointer would be nice, but it could be simply documented as "the entry right before the first non-INCLUDE entry in the stack", and should be very easy to implement by anyone as a simple loop+break.

@daid
Copy link
Contributor

daid commented Jul 25, 2020

I have a very dirty patch that adds a symbol per cpu instruction
https://github.com/daid/rgbds-live/blob/master/rgbds.patch
It is far from ideal as output, but it works as a proof of concept. Even with the limited info, and not the full stack, it is very useful.

@ISSOtm ISSOtm added this to the v0.5.1 milestone Mar 6, 2021
@Rangi42 Rangi42 modified the milestones: v0.5.1, v0.6.0, v1.0.0 Apr 25, 2021
@avivace
Copy link
Member

avivace commented Sep 13, 2023

@ISSOtm @Rangi42 I'm interested in this, especially given the way it enables integrating RGBDS with other tools (e.g. rgbds-live). Do we have a clear enough specification in mind? I could assign a bounty if that could help.

@aaaaaa123456789
Copy link
Member

I don't mind writing the actual spec if this kind of help is needed, but I wouldn't do that without input from the people actually implementing this.

@ISSOtm
Copy link
Member

ISSOtm commented Sep 13, 2023

The main obstacle I've found with this, is that I don't know what kind of format would be desirable. I'd like to make it easier to consume than to produce, and especially to be fast to traverse.

I'm not sure how to design the format such that going from ROM addr to line, or vice-versa, is reasonably efficient; especially for bigger / more complex ROMs.

I'm not sure how to take the "file stack" into account, either.

@aaaaaa123456789
Copy link
Member

I just realised I never mentioned here that I've been working on this.

@Rangi42 Rangi42 removed this from the v1.0.0 milestone Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Typically new features; lesser priority than bugs rgbasm This affects RGBASM rgblink This affects RGBLINK
Projects
None yet
Development

No branches or pull requests

7 participants