| « i++ vs. ++i | KISS Coding » |
Code Performance
Dereference
Every time you dereference, the compiler needs to generate extra code, especially when your structures contain arrays. If you need to access the same sub-element more than once or twice, you are better off using a pointer to that sub-element, rather than dereferencing it each time (this will usually also make your code more readable). This is even more important when you are using these sub-elements inside loops. For example:
Code:
for( mainIdx = 0; mainIdx < NUM_ELEMENTS; ++mainIdx ) | |
{ | |
mainElem[mainIdx].sum = 0; | |
mainElem[mainIdx].sumSq = 0; | |
for( subIdx = 0; subIdx < NUM_SUB_ELEMENTS; ++subIdx ) | |
{ | |
mainElem[mainIdx].sum += mainElem[mainIdx].subElem[subIdx].number; | |
mainElem[mainIdx].sumSq += mainElem[mainIdx].subElem[subIdx].number * | |
mainElem[mainIdx].subElem[subIdx].number; | |
} | |
} |
The above code can be noticeably improved by removing the extra dereferencing.
Code:
for( mainIdx = 0; mainIdx < NUM_ELEMENTS; ++mainIdx ) | |
{ | |
pMainElem = &mainElem[mainIdx]; | |
pMainElem->sum = 0; | |
pMainElem->sumSq = 0; | |
for( subIdx = 0; subIdx < NUM_SUB_ELEMENTS; ++subIdx ) | |
{ | |
pSubElem = &pMainElem->subElem[subIdx]; | |
pMainElem->sum += pSubElem->number; | |
pMainElem->sumSq += pSubElem->number * pSubElem->number; | |
} | |
} |
This can be even further improved upon by removing all array index calculations (this is specific to loops). We get the first pointer, and then use pointer arithmetic to get the pointer to the next element. This only helps for somewhat consecutive loops.
Code:
pMainElem = &mainElem[0]; | |
for( mainIdx = 0; mainIdx < NUM_ELEMENTS; ++mainIdx, ++pMainElem ) | |
{ | |
pMainElem->sum = 0; | |
pMainElem->sumSq = 0; | |
| |
pSubElem = &pMainElem->subElem[0]; | |
for( subIdx = 0; subIdx < NUM_SUB_ELEMENTS; ++subIdx, ++pSubElem ) | |
{ | |
pMainElem->sum += pSubElem->number; | |
pMainElem->sumSq += pSubElem->number * pSubElem->number; | |
} | |
} |
Loops
Loops also need attention. It is important to remember that the more iterations in a loop, the longer that code will take. If you can remove as much code as possible from inside a loop, that loop will perform better. For example:
Code:
pMainElem = &mainElem[0]; | |
for( mainIdx = 0; mainIdx < NUM_ELEMENTS; ++mainIdx, ++pMainElem ) | |
{ | |
pSubElem = &pMainElem->subElem[0]; | |
for( subIdx = 0; subIdx < NUM_SUB_ELEMENTS; ++subIdx, ++pSubElem ) | |
{ | |
pMainElem->id = mainIdx; | |
pSubElem->id = subIdx; | |
} | |
} |
Would be better off written as follows:
Code:
pMainElem = &mainElem[0]; | |
for( mainIdx = 0; mainIdx < NUM_ELEMENTS; ++mainIdx, ++pMainElem ) | |
{ | |
pMainElem->id = mainIdx; | |
| |
pSubElem = &pMainElem->subElem[0]; | |
for( subIdx = 0; subIdx < NUM_SUB_ELEMENTS; ++subIdx, ++pSubElem ) | |
{ | |
pSubElem->id = subIdx; | |
} | |
} |
In the above example, pMainElem->id had no relevance inside the sub-element loop because it did not require anything generated inside that loop, so it was better to move it outside the loop. If we apply some numbers to it, we can see better what I mean. Let us say that NUM_ELEMENTS is 1000 and that NUM_SUB_ELEMENTS is 5000. In the first case, pMainElem->id is accessed 1000 * 5000 times, so 5,000,000 times. In the second case, pMainElem is only accessed 1000 times. What’s worse is that in the first case, 1000 * 4999 = 4,999,000 of those times were wasted because it was assigning the same value.
Essentially, code is being executed almost 5 million times unnecessarily.
In-lining
I have recently had to collaborate with someone who is not familiar with C++. He told me that we should never use private member variables in a class when there are real-time considerations. He told me that when doing so, the entire class gets pushed on the stack, which is understandably expensive. Now it took me a while to figure out what he meant by that statement, but I eventually did get it. His understanding was that if a member variable is private, then from outside a class instance, the only way to access that variable is to use an accessor function, which would then incur function call overhead.
Although he is correct (strictly speaking), there are a couple of problems with his train of thought.
First, that is really bad programming practice. If you need direct access to a variable outside the scope of the class, then make it public because it is obviously part of the interface. The only time (almost) a private variable should be accessed is inside the class’ scope. A private member variable is meant for storing state information of the class, etc, not for telling the class to do stuff. If a private member function is accessed, you have already incurred the function call overhead to get inside of class scope for what ever operation you are trying to perform. It then costs nothing extra to access the private member.
Second, an accessor function is typically pretty simple. This means that any compiler worth it’s salt will automatically inline that function. If it is more complex, you can tell the compiler that you would really like the function to be in-lined, and then stand an even better chance of that in-lining happening. If you play your cards right, the function that performs some class action (accessing the private member along the way), will probably get in-lined as well, meaning that you got into class scope for free.
Compilers seem to like to inline. I did a test to compare the performance of in-lined code against that of non-in-lined code, and my biggest problem was getting the compiler to not inline it. This can of course be forced, but that was a little out of the way for the type of code I was trying to test.