Local declaration of function variables
31 Oct 07, 11:00AM
Local declaration of function variables
I would like to focus this article on the declaration of function variables and the increasingly distinct styles that are being used:
In the first camp, developers state that all variables used within a function should be declared neatly at the top of the function; the second group (into which I fall), say that you should only declare variables as and when you need them.
From a readability point of view, the arguments for either approach are largely equally valid and as as much a function of personal taste than logic. After all, we're almost all agreed that variables should be given 'sensible names', code should be justified and functions should never be more that 2-3 pages long. Besides a few well placed comments can make either approach acceptable to the other.
However, what are the technical implications for our compilers and ultimately to the CPU.
Using Of Registers
The goal of nearly every compilers is to reorganise code so that the central processing unit (CPU) is being efficiently utilised. The first problem the CPU encounters is accessing memory (RAM), which is significantly slower that the processor itself (>10x) - hence the advent of high speed, but pricey, cache memory. One approach to avoiding this problem is never to use RAM, thereby keeping all your variables on the chip itself, within a select few registers. Modern chips have about a 6-12 registers that can be used for any particularly purpose. Aside from removing the wait for RAM, you no longer need to retrieve the value from memory - saving at least one instruction. For this reason, compilers, particularly when optimising (gcc -O3) go out of the way to use registers as extensively as possible. For this reason any short functions will typically be so heavily optimised that it barely matters how you write it. With long function or complex function this is no longer the case, and variables then need to be swapped to and from memory as they are used.Locally Scope Your Variables And Re-declare As Often As You Like.
I've often seen code in the following form:double result=0; double tmp; for(int i=1; i=10; i++) { tmp=result*1.175 / (i* 0.1); result*=tmp; }The programmer has reasoned that the tmp variable takes some work to set up, so only wants to define it once. This is perfectly true, when tmp is an instance of a class or structure, but not when it comes to basic types like double, int, char etc. In the latter, any compiler will see that tmp is only used immediately in the subsequent line and leave the result in the register. As a result the above code produces the same compiled result as:
double result=0; for(int i=1; i=10; i++) { double tmp=result*1.175 / (i* 0.1); result*=tmp; } \code Indeed for readability, you may actually prefer to write the following \code double result=0; for(int i=1; i=10; i++) { double gross=result*1.175; double discount=i*0.1; result*=gross/discount; }The locally defined variables cost nothing as these values are stored within the register. If you really want to hammer this issue home to the compiler, you can put in the register keyword, but most compilers are better than you at resolving this, and only use the register tag as a suggestion.
Use Static Variables Infrequently
Some coders are convinced, that creating memory space for a variable takes up time and space, so they'll often define frequently used variables as static within the function (I used to do this myself once ), i.e.double doSomething() { static double result; static double tmp; result=0; for(int i=1; i=10; i++) { tmp=result*1.175 / (i* 0.1); result*=tmp; } return result; }The usage of static in this way is extremely inefficient. Firstly every time this function is run, the compiler will recall any values from a previous execution of this function (in this case a least 2 CPU instruction). If optimisation is successful, then the function must write the values of tmp and result back to the static memory location before leaving the function (2 more instructions). Furthermore, in a multi-threaded environment, several threads could all be accessing the same shared memory location. For this reason each time either tmp or result is used, it will be read, changed then immediately written back to memory. In short, the inclusion of the static keywords will double the length of the code. To see for yourself, compile the above example, with and without the static keyword. i.e.
gcc main.cpp -S -O3
Long Functions
As a function grows in size, the ability for the compiler to fully comprehend the logical flow your code decreases rapidly. For example:double tmp double result=0; for(int i=1; i=10; i++) { tmp=result*1.175 / (i* 0.1); result*=tmp; } // ..... Lots more code for(int i=1; i=10; i++) { tmp=result*1.105 / (i* 0.2); result*=tmp; }Will require at least one more machine instructions than:
double result=0; for(int i=1; i=10; i++) { double tmp=result*1.175 / (i* 0.1); result*=tmp; } // ..... Lots more code for(int i=1; i=10; i++) { double tmp=result*1.105 / (i* 0.2); result*=tmp; }In the first instance, the compiler will store the result of tmp after its first usage to memory, incase the final value is used again later. Depending on the complexity of the additional code between these two loops, the compilers optimiser may pick up this wastage - but probably wont. For this reason I believe in helping the compiler, by carefully scopping your function.
Conclusion
While we can still argue over what is easier to read:- Variables all defined at the top,
- or declared when you need them.