need a VERY fast way to copy array of bytes

.Verkauf´ · 04/12/2016, 20:40

Hey guys,

atm I'm reversing a game and for this I need a very fast way to copy bytes from src(memory in game) to dst, because the src is changing very fast, that fast i can only recv the first 3 bytes ..

atm Im injecting a DLL to the main thread and checking (while true) if the src has changed. If its changed I need that changed bytes.

My ideas:
1.) I need a way to copy the data very fast, but tried memcpy... and so on but nothing is fast as needed.

2.)
Other idea would be to freeze the mainthread of the game(like a breakpoint) until I finally copied the bytes, but I dont know a method to do that

Do you have any ideas how to proceed?maybe other ideas?

Jeoni · 04/12/2016, 21:29

The fastest copy method should be an optimized implementation in assembler using "rep movsd / movsq" if you can assure that the source and destination buffers have their size aligned to 4 / 8 bytes. Without that, I guess that memcpy should be the way with the highest performance if properly implemented. That may also be implemented partly by memcpy, but with memcpy you surely get some (may not be much though) overhead.
It does not matter how fast the method is, it won't be instant. At least not if you want to copy more than 8 (or 16) bytes. Therefor you cannot rely on the results you get in your destination buffer since you may have some bytes of an old source buffer state and some of some new state.
Due to that I guess that you don't have any other chance than synchronizing with the threads that write to the source buffer. You can find out which code does that using a debugger (x64dbg, OllyDbg or even CheatEngine). When you did that, you can intercept this code using Breakpoints ("int 3" instructions, or any kind of CPU exception, or hardware breakpoints) and catching their exception with an exception handler (Windows: Vectored Exception Handler or SetUnhandledExceptionFilter). Or, more conveniently, using (mid function) hooks for the interception.
You could also use hardware breakpoints "on write" (or some kind of page_guard hook) directly on the source buffer, so you don't need to know the code that is writing to it. However, I would go for the normal (mid function) hooks (they are easier to set up and also offer the highest performance).
If you intercept the thread, you can implement your own synchronisation to access the source buffer.
With best regards
Jeoni

Ende! · 04/18/2016, 16:42

Quote:

Originally Posted by Jeoni

The fastest copy method should be an optimized implementation in assembler using "rep movsd / movsq" if you can assure that the source and destination buffers have their size aligned to 4 / 8 bytes. Without that, I guess that memcpy should be the way with the highest performance if properly implemented. That may also be implemented partly by memcpy, but with memcpy you surely get some (may not be much though) overhead.

memcpy has been subject of extreme optimization by compilers in the past X years and you can be pretty sure that in 99,9% of use cases it will emit better code than a common human being is capable of writing. Depending on various parameters, the compiler chooses between different approaches. For example, memcpying an 8 byte buffer will probably emit two/one (32/64 bit env) movs (or equivalent SSE instructions) rather than calling the CRTs memcpy. memcpy is only invoked on large or dynamically sized buffers and is a **** complex construct that chooses a highly optimized way to copy stuff using available CPU extensions etc. depending on a ton of factors. On modern processors, "rep" loops are in fact slower than "regular" ones due to processor manufacturers optimizing their CPUs for a RISC-style subset (which is commonly encountered in compiler-generated code) of the full ISA.

Here, have a snapshot of memcpy from MSVC14 (memcpy internally invokes memmove in VC):

That being said, I agree with the rest of the post. Synchronization through cleverly placed hook-points in the code flow is the way to go.