Why is "new" so slow even if not called?

11/27/2015 18:49 Shadow992#1
I am working at a quite speed dependend project.
This project should be written highly speedy and customizable.
This is why I really need nearly every milli second of execution time I can save.
Because of this I tend to use many news/deletes to not copy the objects themselves nor execute any kind of copy constructor.

To not overcall new/delete (because requesting Memory takes a huge time) I am saving already allocated memory in an internal pool of memory so I can reuse it later on.

I have got a class called "AllocationHelper", which contains some functions looking like the following one:

PHP Code:
std::vector<void*>* AllocationHelper::allocateVoidVector()
{
    
std::vector<void*>* ptr;
    if(!
allocatedVoidVectors.empty())
    {
        
ptr=allocatedVoidVectors.back();
        
allocatedVoidVectors.pop_back();
    }
    else
    {
        
ptr=new std::vector<void*>;
    }
    return 
ptr;
}

void AllocationHelper::recycleVoidVector(std::vector<void*>* ptr)
{
    if(
allocatedVoidVectors.size()<maximumPreAllocs)
    {
        
ptr->clear();
        
allocatedVoidVectors.push_back(ptr);
        return;
    }
    
delete ptr;


This code runs quite fast but when I searched for some bottlenecks in my program I stumbled over some quite strange behavior.
If I write my functions above like this:

PHP Code:
__attribute__((noinline)) std::vector<void*>* AllocationHelper::_allocateVoidVector()
{
    
std::vector<void*>* ptr;
    
ptr=new std::vector<void*>;
    return 
ptr;
}

std::vector<void*>* AllocationHelper::allocateVoidVector()
{
    
std::vector<void*>* ptr;
    if(!
allocatedVoidVectors.empty())
    {
        
ptr=allocatedVoidVectors.back();
        
allocatedVoidVectors.pop_back();
    }
    else
    {
        
ptr=_allocateVoidVector();
    }
    return 
ptr;
}

void AllocationHelper::recycleVoidVector(std::vector<void*>* ptr)
{
    if(
allocatedVoidVectors.size()<maximumPreAllocs)
    {
        
ptr->clear();
        
allocatedVoidVectors.push_back(ptr);
        return;
    }
    
delete ptr;

My code suddenly runs 2x times faster.
I could not believe that my code is working correctly, so I tested all with std::cout to see what my program is doing and it worked like I wanted it to work:
It calls like 10 times the "new" operator and then calls around 1.000.000 times the push/pop operations.

If I remove this extern call to "_allocateVoidVector()" and insert the new in the if my code takes around 150ms for 1.000.000 iterations.
If I add this extern call and add "__attribute__((noinline))" (so gcc will not inline it) my code runs in 80ms.

This made me think about if the new is kind a called/prepared even if I not reach the else branch. After thinking and googling a while (without any results), I tried the exactly same code but this time used malloc:

PHP Code:
std::vector<void*>* AllocationHelper::allocateVoidVector()
{
    
std::vector<void*>* ptr;
    if(!
allocatedVoidVectors.empty())
    {
        
ptr=allocatedVoidVectors.back();
        
allocatedVoidVectors.pop_back();
    }
    else
    {
        
ptr=malloc(sizeof(std::vector<void*>));
               
ptr->setUp();
    }
    return 
ptr;
}

void AllocationHelper::recycleVoidVector(std::vector<void*>* ptr)
{
    if(
allocatedVoidVectors.size()<maximumPreAllocs)
    {
        
ptr->clear();
        
allocatedVoidVectors.push_back(ptr);
        return;
    }
        
ptr->destroy();
    
free(ptr);

"setUp" and "destroy" are my constructors and destructors (because they do not get called automatically with malloc and free). And again my code performed quite good, it took around 75ms with using malloc.

So my question is not really a question but more kind of:
Quote:
Did you ever encouter something similar (I am using MinGW with gcc 5.1.0)?

What do you suggest how should I solve this problem? Should I use extra functions (which also slows down a bit, but not too much) or should I use some kind of bad practice by allocating class objects with malloc?
11/28/2015 14:22 qqdev#2
Check the ASM code for comparison.

BTW: How did you measure the time? Was the CPU clock stable? How many iterations did you perform?
11/28/2015 16:54 Shadow992#3
Quote:
Originally Posted by qqdev View Post
Check the ASM code for comparison.

BTW: How did you measure the time? Was the CPU clock stable? How many iterations did you perform?
My project is arouund 15.000 lines big, so having a look at the asm is something really really ugly.
What I did was around 1.000.000 iterations for one benchmark and benchmarked the code like 10 times because, as stated before, I did not believe in the results I got but even some hours later the result was the same.

To benchmark my code I used:
PHP Code:
std::chrono::high_resolution_clock 
So I guess this benchmark is not that wrong (even if we assume sometimes different cpu load etc.)
I did not investigate much time in finding out what exactly the asm looks like (so I also did not try some very basic things) for me it was enough to see when I only call these functions presented, the code slows down quite a bit.
11/29/2015 10:10 Ende!#4
What does the amount of code matter for looking at the ASM? Just let the compiler generate debug info and drop your binary into IDA PRO or if you prefer something GNUish, GDB. IDA is capable of parsing DWARF debug info and will present you with a full list of (named) functions, so finding what you need really shouldn't be a big deal. In GDB, you can just disassemble a function by name ("disas XXX").

Besides that, I compiled the first two examples you provided and other than what you described, they performed exactly how I'd expect. 3020ms for the first example, 3150ms for the second. MinGW64 on Win10, -O3, 1mio iterations recycleVoidVector(allocateVoidVector()), std::chrono::hrc for measuring.
11/29/2015 13:03 Shadow992#5
Quote:
Originally Posted by Ende! View Post
What does the amount of code matter for looking at the ASM? Just let the compiler generate debug info and drop your binary into IDA PRO or if you prefer something GNUish, GDB. IDA is capable of parsing DWARF debug info and will present you with a full list of (named) functions, so finding what you need really shouldn't be a big deal. In GDB, you can just disassemble a function by name ("disas XXX").

Besides that, I compiled the first two examples you provided and other than what you described, they performed exactly how I'd expect. 3020ms for the first example, 3150ms for the second. MinGW64 on Win10, -O3, 1mio iterations recycleVoidVector(allocateVoidVector()), std::chrono::hrc for measuring.
You are totally right under Linux using Gnu g++ compiling with these options:
PHP Code:
g++ -Wall -fexceptions --O2 -Winit-self -pedantic -std=c++11 -Wall -pthread 
I get the exact same behavior as you stated.
I used 32Bit MinGW. I think this is some kind of bug or similar in 5.1.0 32Bit.
Maybe I will investigate some time to find out what exactly is happening.