I'm afraid to ask this on SO because all I can get is down-vote so I ask it here.*

Question:

Does C/C++ compiler (e.g the popular "gcc") with OO concepts allocates/creates all data member if other data member are not in use?

/*************************************
* code for example: *
**************************************
typedef std::size_t Int;

class Rectangle
{
public:
Int x;
Int y;
Int width;
Int height;
Int border_top;
Int border_right;
Int border_bottom;
Int border_left;
}


//suppose I instantiate many of this class then call "some" data members

Rectangle rect1, rect2, ...;

rect1.x = 0;
rect1.y = 0;
//rect1.other_member arent called on the entire code;

rect2.width = 640;
rect2.height = 480;
//rect2.other_member arent called on the entire code;



/*************************************
* end of code *
**************************************

Does compiler allocates or reserved memory space from the data member the are not in use?
I think, compilers are smart enough to determine which data member are in use of that object and which are not. So is there a reason (WHY and WHY NOT) they (DO or DO NOT) allocate space those resources?

--------------------------------------

Also is there a site that discusses what does all the compiler is doing behind every programming concepts/paradigm it encountered?

Feel free to correct me If my assumptions are wrong :)

Thanks!

Sorry for I'm bad at explaining things.

Does compiler allocates or reserved memory space from the data member the are not in use?

Yes.

I think, compilers are smart enough to determine which data member are in use of that object and which are not. So is there a reason (WHY and WHY NOT) they (DO or DO NOT) allocate space those resources?

Determining which data members are used is the easy part. Generating and managing each individual object that may have different used members makes my brain hurt just thinking about it.

Not to mention the insane rules that would need to be in place in the language definition to give us any semblance of consistency between compilers. I suspect those rules would be so draconian that any benefit you think you could get from this feature would quickly be reduced to nothing for the sake of portability and backward compatibility.

But why wouldn't the compiler just allocate space for those data members in use by that object? Does the OS already do the job here for allocating only the resource that is active? If not, I think if the compiler implement this kind of method, there might be a plus in memory efficiency for every OO made software????

If you define a box filled with say a pot for marmelade, a can for milk etc.
Well, that is what you have. If you later "intantiate" a 100 of those boxes, I hope your garage or whatever is big enough. The pots and the cans will at first be empty, you can fill them up with as much marmalade and milk as they can contain, but you will always have 100 boxes.

But why wouldn't the compiler just allocate space for those data members in use by that object?

Apparently you completely failed to read the second part of my response that explains exactly why the compiler wouldn't do it.

@deceptikon, can you explain more? I didn't comprehend.

@ddanbe, yes I get your point but I think there's a possibility that other boxes contains not with other boxes? I think of it like.. (in this object, it has 'x' and 'y' but this one object here doesn't have 'y') Object oriented in computer is just a concept, so I think we can break the rules here??

@deceptikon, can you explain more? I didn't comprehend.

I can't really make it any simpler than this:

  1. The C++ standard doesn't currently allow it, and adding that feature in any useful way would be virtually impossible.
  2. Compiler support would be extremely difficult and error prone.

Just because something seems easy to you doesn't mean it's actually easy, especially when it comes to language and compiler design.

So any conclusion for this? Did you just say impossible? This question will haunt me for my entire life if I don't get a satisfactory answer =P

Impossible is a big word. It would definitely be painfully complicated and would change some very fundamental ideas, though. Here's just one reason why. The compiler works on one translation unit (which you can roughly think of as one cpp file) at a time, and then forgets about it when it moves on to the next one. The compiler has no knowledge of what's going on in other translation units.

If the compiler decides to not bother creating part of some object, it has to know absolutely for sure that no other translation unit will ever try to use it. That's effectively impossible, as the compiler might not even be compiling all the other translation units. Some of them could already be compiled and just waiting to be linked. To make it work, you'd have to throw away the idea of libraries. I suppose you could redefine libraries so that they explain what parts of objects they do and do not use, but the point of libraries is that they're already done and you don't need to know anything about them apart from how to link to the functions; what the function does is meant to be something you can ignore.

Also, what if some objects use some parts, and others use different parts? Are you still going to have every object of the same type be the same size? Or if you let them be different sizes, now you have to have a way to tell every function that uses an object how big it is; there will be no uniform size for any object. This would break so much and would force a lot of code to become dynamic (rather than worked out at compile time). Slower, more prone to bugs, harder to code for, just a total nightmare.

So what you're asking isn't impossible, but it would break everything and would effectively be a whole new language that was a horrible mess to code for.

So any conclusion for this? Did you just say impossible?

I said virtually impossible. Meaning it's not impossible, but realistically it won't happen. That's your conclusion, and if it's not satisfactory for you, C++ isn't the only language out there that supports OOP.

So any conclusion for this? Did you just say impossible?

Nobody's talking about it being "impossible", everything is possible. What deceptikon said is that it currently does not work that way (not in C++, and not in any other language that I know of), and that making this work would be really hard (not impossible).

But one thing that needs to be clarified is that what you propose is, in general, a really bad idea, and will lead to far worse performance. So, this kind of feature would mean lots of pain and for no gain (in fact, tremendous losses in performance and effeciency).

Here's why. Currently, if you have a simple C++ class Vector2D with two data members (x,y), each being of type double, then the memory layout is something like this:

****************
|  x   ||  y   |

where each * represents a byte (i.e., a double is 8 bytes). So, the total size of the class is 16 bytes. If you want to access the data member "x", you just look at the double value at the address of the object. If you want to access "y", you just look 8 bytes further in memory. This also means that if you have 100 objects of that class in an array, all the of data contained in all those objects are within a single chunk of 1600 bytes, which is easily loaded on cache memory and easily and efficiently traversed and operated on.

If you were to allow the different data members to be "optional", there are a number issues with doing this and there are a number of ways to go about doing this. One of the simpler solution would be to allocate each data member independently when they are first used (assigned a value), i.e., which leads to this memory layout (for a 32bit computer):

********
|px||py|

Optional:

********
|  x   |

********
|  y   |

where "px" either points to "x" if it exists, and is NULL otherwise, and idem for "py". First of all, in this example, it's clear that you are not getting as much memory savings as you'd expect, in fact, here you get mostly an added memory overhead, but, of course, in other cases it might start to represent a memory saving. Then again, it is not obvious that you would ever really save memory from this, because if everything has to be "optional", then you have overhead on everything, and you will nearly double your memory consumption overall. Secondly, you lose all the benefits of locality, because now if you have an array of 100 objects, you have 200 values (x and y for each objec), each individually scattered around in RAM memory, making any traversal or operations on those values extremely inefficient because you waste most of your time looking up those random memory locations (this is called "cache thrashing").

The other solution to the problem of having optional data members while conserving the compact memory layout is to provide a kind of under-the-hood set of classes for each case. So, for example, if both x and y are optional, the compiler would need to be prepared to expect four types of memory layouts:

Option 1:
*
(padding)

Option 2:
********
|  x   |

Option 3:
********
|  y   |

Option 4:
****************
|  x   ||  y   |

Of course, the compiler would need to generate code to convert between those layouts, e.g., if you decide to start using the "y" data member, then the object must be converted from Option 2 to Option 4. But more importantly, the compiler still needs to make each object look like just a Vector2D, i.e., it must hide the details of which memory layout is actually in use. Doing this comes at a cost, it isn't free. For example, if you had an array of 100 objects, how would you know how big the array needs to be? There is not easy way to solve this besides making the whole class into an indirected object, i.e., the object is just a thin wrapper around a pointer to the real data, plus an identifier to tell the compiler which layout is used. This would effectively mean that your array of 100 objects is actually an array of 100 pointers, and then again, you get the memory overhead, but more importantly, you still get a lot of cache thrashing, which will destroy the performance.

And then, I haven't even tackled the problem of dealing with inheritance and ABI, and lots of other issues which mostly increase the "pain" associated to making this feature work. But the important part is that there really isn't any "gain" to have with this, in fact, mostly losses. The only gains are in terms of run-time flexibility, but then again, it's not very convincing, IMO.

By the way, if you want to have this behaviour of having optional data members and stuff like that, you can. C++ does not provide this natively, but it allows you to make your own. In particular, you can look at boost::optional (or std::optional (C++14)), or boost::any.

Object oriented in computer is just a concept, so I think we can break the rules here??

Yes and no. OO-design is an abstract concept, and in that sense, can be divorced from implementation details (choices made by the compiler / language about how to layout the memory and resolve virtual calls and stuff like that). But programming is not an abstract activity, and it cannot be divorced from the implementation details. As per the example above where I explained issues related to indirect vs. compact layouts and cache thrashing issues, you cannot write a good implementation if you can't control those aspects of the program.

But why wouldn't the compiler just allocate space for those data members in use by that object?

Because the cost of keeping track of data members that are in use and those that are not, as well as the cost of allowing and accounting for all the different possible combinations is overwhelming and does not scale very well at all (i.e., the more "optional" data members there are, the possible combinations increase exponentially). The only way to beat these curses is to use a massive amount of indirection, leading to massive thrashing issues, overuse of dynamically allocate memory and generally terrible performance.

Does the OS already do the job here for allocating only the resource that is active?

Yes and no. The program asks for resources, the OS delivers. And the OS does not deal with individual data members or objects or anything of the sort. The program acquires memory from the OS in large chunks and then manages to distribute it internally as execution allocates memory (creates new objects, arrays, etc..). The program's entity that does this management is called the "heap", i.e., it is part of the program, not the OS. When the heap runs out of memory to give out, it asks the OS for another large chunk of it. As far as the OS is concerned, "active" resources are those that have been allocated to one program or another, it does not verify that the resource if actively being used or not.

there might be a plus in memory efficiency for every OO made software????

No there will not. Some managed languages like Java and C# do some level run-time flexibility in terms of not allocating memory of unused members and stuff like that. They essentially implement the first option that I explained above (i.e., every object is actually a pointer to actual memory). Overall, in these types of languages, the memory consumption is an order of magnitude larger than the more lean-and-mean languages like C / C++ / D .., and the performance is, similarly, an order of magnitude worse. The reason why they do this is not performance or memory consumption, it is run-time flexibility. In short, if everything is just a pointer to something else, then it is easier to make anything point to anything at run-time. In other words, they accept the big performance hit, in exchange for the increased flexibility.

Also is there a site that discusses what does all the compiler is doing behind every programming concepts/paradigm it encountered?

Compilers do not care about concepts or paradigms. As I said before, programming is not an abstract activity. For the compiler, there are precise syntax rules, precise specifications for the observable behavior for the set of instructions it sees, and there is a target architecture with its own specifications. The compiler is not in the business of understanding design concepts or the intent of the programmer, it just maps one precise set of specification (the C++ code) to another (the machine code). As far as knowing what the compiler generates for any given piece of C++ code, well, the specification for that is in the ISO C++ Standard document.

Thanks for the detailed explanation and effort mike_2000_1, very appreciated. The reason why I ask this question is because of curiosity only and I admit that I am lacking of following questions to ask. However, there are many(very many) facts that I didn't know how does the compiler generates the code. Sorry for neglecting the other necessary things to consider(byte padding, sizeof(class)'es, etc). I didn't know what other catastrophic events might occur if someone has done this. Thanks again.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.