Take the following C structure:

typedef struct dataPoint {
    tick * tickData;
    struct dataPoint * previousDataPoint;
    struct dataPoint * nextDataPoint;
} dataPoint;

Compiling on a 64-bit system with 8-byte words, this structure takes up 24 bytes (3 words) of memory.
24 % 8 = 0 ... so it is word aligned, yet for each structure created, it uses 4 words, why is this?

I am aware padding and alignment is done for performance reasons, but does anyone know the specific reason for this case, as it is already word aligned, perhaps CPU cache line boundaries?

It might depend on the compiler. What compiler and operating system are you using? On 32-bit VS 2012 it produces 12. The difference might be your definition of tick.

#include <stdio.h>

typedef int tick;

typedef struct dataPoint {
    tick * tickData;
    struct dataPoint * previousDataPoint;
    struct dataPoint * nextDataPoint;
} dataPoint;


int main()
{
    printf("sizeof(struct) = %u\n", sizeof(struct dataPoint));
}

I'm on the latest Mac hardware/software with LLVM 4.2

tick is another typedef'd structure, after malloc'ing both using sizeof(), the memory footprint is:

dataPoint =
303E100001000000 tickData
0000000000000000 previousDataPoint (uninitialised)
0000000000000000 nextDataPoint (uninitialised)
0000000000000000 <-- unused padding

tick = 
8711259C34010000 unsigned long long
8C101E6D1CB1F43F double
191C25AFCEB1F43F double
0000000000000000 <-- unused padding (again)

Each block is 1 word, or 8 bytes.

I'll fire up the Raspberry Pi and see what happens on that...

malloc() returns a block of memory AT LEAST as large as you requested, which means it could be larger.

malloc() returns a block of memory AT LEAST as large as you requested, which means it could be larger.

I've gone down the route of finding out why and read that malloc will return addresses aligned to the largest possible data type (something to do with safety of returned addresses), which on this system is 16 bytes (long double). Not sure why though, yet...

Not sure why though, yet...

malloc() is required to give you a block with suitable alignment for any type. That's why it can return a pointer to void which you can then assign to any object pointer type. On your system aligning to the largest scalar type just happens to be the best (hopefully) way to meet that requirement.

malloc() is required to give you a block with suitable alignment for any type. That's why it can return a pointer to void which you can then assign to any object pointer type. On your system aligning to the largest scalar type just happens to be the best (hopefully) way to meet that requirement.

You might need to expand a little on that, I can see what you mean by it, but there is still something I'm missing. If I requested 8 bytes of memory, I wouldn't go and try to stick 16 bytes into the block I was just given, so what purpose does it have?

Each compiler has it's own implementation of malloc(), but the ones I've debugged (stepped through the assembly code to find out how it does it) initially has a large block of contiguous memory. malloc() it tries to find a free block that is as close as possible to the amount of memory you requested plus some memory for it's own use such as pointer to the next free block of memory and the size of this memory block. So if you request 8 bytes and the smallest available free block is 16 bytes then malloc() might return a pointer to that 16-byte block of memory. When you call free() the compiler puts a pointer to that block back into the free-memory pool so that it can be given out by malloc() again. Some compilers may try to optimize the free memory pool by combining adjacent blocks of memory, defragment the memory, but tha is not always possible.

If I requested 8 bytes of memory, I wouldn't go and try to stick 16 bytes into the block I was just given, so what purpose does it have?

Once again, malloc() is required to align for any type. Just because you requested 8 bytes doesn't negate that requirement. That's not to say malloc() couldn't interpret the requested size and shrinkwrap the alignment accordingly, but the standards committee clearly chose not to go that route.

However, note that C11 (the latest standard) added aligned_alloc(), which supports specifying the alignment you want for a specific object rather than the maximally portable alignment.

From some further reading, the Mac implementation of malloc() performs allocations for multiples of 16 bytes, for both reducing the overhead by tracking smaller amounts of memory blocks and aligning memory to 16 byte boundaries for SSE instructions (used for floating point operations). That coupled with alignments for unknown types, mentioned earlier, I think I have the correct understanding.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.