Quote:
Originally Posted by Jeoni
The fastest copy method should be an optimized implementation in assembler using "rep movsd / movsq" if you can assure that the source and destination buffers have their size aligned to 4 / 8 bytes. Without that, I guess that memcpy should be the way with the highest performance if properly implemented. That may also be implemented partly by memcpy, but with memcpy you surely get some (may not be much though) overhead.
|
memcpy has been subject of extreme optimization by compilers in the past X years and you can be pretty sure that in 99,9% of use cases it will emit better code than a common human being is capable of writing. Depending on various parameters, the compiler chooses between different approaches. For example, memcpying an 8 byte buffer will probably emit two/one (32/64 bit env) movs (or equivalent SSE instructions) rather than calling the CRTs memcpy. memcpy is only invoked on large or dynamically sized buffers and is a **** complex construct that chooses a highly optimized way to copy stuff using available CPU extensions etc. depending on a ton of factors. On modern processors, "rep" loops are in fact slower than "regular" ones due to processor manufacturers optimizing their CPUs for a RISC-style subset (which is commonly encountered in compiler-generated code) of the full ISA.
Here, have a snapshot of memcpy from MSVC14 (memcpy internally invokes memmove in VC):
That being said, I agree with the rest of the post. Synchronization through cleverly placed hook-points in the code flow is the way to go.