Quote:
Originally Posted by Mega Byte
TLDR; How could a program know how many arguments a function has at and where those arguments come from, construct the relevant hook and log them at runtime then use that information to generate C++ hook template source.
|
You might want to take a look at

which is an open source reverse engineering tool, with quite a decent decompiler. If you where lazy you could basically just copy their code for detecting functions, but I don't know if this really satisfies you.
What you are trying to archive is not really trivial, but let's get through this step by step
Quote:
Originally Posted by Mega Byte
A DLL once injected and given a memory range in the code section will locate functions, identify their calling convention and args, hook them dynamically and log when they are called, what the args contain and returned value.
All at runtime, okay static analysis assisted might be fine too.
|
Let's start off with the obvious, how does a call work on x86. Im by no means an assembly pro, so this is very superficial, but I think it should be enough to answer your question
Whenever calling a function the parameter are pushed onto the stack or written into register (or both) on x86_64 you have much more registers, giving you more possibilities. The called function basically can address the arguments than by accessing the values below (well technically above, the stack grows down in the addressspace) the return address (well theres also a stack cookie, but for this explaination we will ignore it) on the stack. The calling convention than basically explains how these values are pushed on the stack (order, etc.), which registers are used (usually not so many under x86, but under x86_64 this is a whole different story) and how the stack in general is prepared for the call, and who (and how) this is cleaned up after the call. The details are pretty neat explained in

. Also you don't need to deduce the original calling convention, you just need to find a calling convention that fits (i.e. afaik pascal and stdcall calling conventions are basically the same except for the order in which the arguments are pushed onto the stack, therefore you can replace one with the other by just excanging the order of arguments)
Basically your best chance to deduce this is to search in the code page for call instructions to a given location, and than analyze how the call is prepared (i.e. what is pushed onto the stack, and how). I.e. scan the whole binary, for each call instruction look at the preparation just before the call and deduce thereby the (or better a) signature of the called function
Quote:
Originally Posted by Mega Byte
With a disassembly engine it could process the bytes back into opcodes and run scans for patterns of interest there are also some nifty ones that have good attempts at "decompiling" it back into code.
|
Basically how Ghidra works.
Quote:
Originally Posted by Mega Byte
Even better if it can auto locate vtables, generate signatures to scan for the function addresses in modified versions of the executable target and provide an API to hook them.
|
Thats more or less impossible. vtables are a high level construct, which are soly handled on the frontend. E.g. assume you have a programm written in Rust using gcc C++, intel C++ and Fortran Libraries. Fortran doesn't have vtables at all, and I don't know much about rust, but it's safe to assume that gcc C++, rust and intel C++ might completely use different vtable formats, making it impossible to recognize vtables from just the resulting assembly. Even different versions of the same compiler might be completely incompatible with each other.
And than there is also the Optimizer. If the optimizer can proof that you don't need a vtable, you won't have a vtable, simple as that. In general does optimization makes it usually harder to disassemble
Quote:
Originally Posted by Mega Byte
Ideally the end product scenario is, once a user has a list of all functions and have seen some interesting things in the args they could simply select the ones they are interested in from a list and click a button to generate C++ source for a dll base to further build on containing all of the typedefs and hook functions that just call the original method by default.
|
I think this is not ase usefull as you might think. In a non trivial project there are hundreds of functions that have the same signature, and having a list of them will just be completely useless. I just fired up ghidra and looked into the function list, and having a whole list of functions like:
Code:
void FUN_100002b30(undefined8 uParm1,code *pcParm2)
undefined8 FUN_100002b40(void)
void FUN_100002b50(undefined8 uParm1)
ulonglong FUN_100002be0(void)
void FUN_100002c20(void)
undefined8 FUN_100002c30(void)
void FUN_100002c40(undefined8 *puParm1,undefined8 *puParm2,ulonglong uParm3)
void FUN_100002f40(longlong *plParm1,ulonglong uParm2,byte bParm3)
undefined * FUN_100003050(ulonglong uParm1,undefined *puParm2,char cParm3)
ulonglong FUN_1000030c0(ulonglong uParm1,ulonglong uParm2,short sParm3)
...
Doesn't tell you shit without closer inspection of these functions. And if you analyze these functions to see whether or not it is what you are searching for, you can also easiely deduce the signature, meaning you don't gain shit from fully automated signature recognicion
Quote:
Originally Posted by Mega Byte
Also I can't seem to get the hang of hooking vardic functions either (variable number of args e.g. think sprintf with the ... and a format string) getting the number of arguments seems tricky?
|
You simply can't get the number of arguments, thats the fucking joke.
Varargs are an abomination, as they basically just say: push as many arguments on the stack as you like. It's the callees job to somehow make sense of it.
Thats why you can use printf to read arbitrary memory locations. e.g. take a look at the following Code
Code:
char buff[1024];
scanf("%s", buff);
printf(buff);
if you supply as string %d%d%d%d, it will start dumping the stack, even though no variable was given to printf, because there is no way to check the amount of variables. By using printf you make the "promise" that you won't have more formatters in there than variables you pass. If you don't... well printf doesn't care. This get's especially interesting if you write adresses into the string, because after a few formatters, the next address read will be buffer itself, meaning you can print things from your buffer. Using %s it also can chase pointers, this means by writing an arbitrary address into the buffer and reading it via %s, you can literally read any arbitrary memory location. It is even worse if you use %n, as this writes the number of printed charactes into the address provided, meaning with the code above, you can read and write arbitrary memory locations.