Given a certain size, n, I would like to implement an array (of size n) which has an init, add object [in a location given by user] and remove object functions which work in O(1) time.

In other words, I'm trying to create an array which knows where "Junk values" are held without resetting all the values (the init function).

Any number (finite) of other data structures are allowed, but the same rules apply (i.e no resetting in a non-constant time).


So far, I have found out it can be done with rather simple tools. But the way I use them, the algorithm fails in case the junk values are "magically set" to the index or pointer which are checked.
Does anybody have an ideas? Even a clue would be appreciated.

Look into stl's list

Look into stl's list

Algorithms or containers?

Furthermore, isn't it impossible to be done with a single structure addition? Structures like arrays require initialization before they can be used (including a hash table), and structures like linked lists, stacks or queues don't require a non-constant initialization yet have an access time of O(n) in the worst case . Other structures like trees have an overall log(n) and are thus irrelevant.

@ poster above me: init doesn't mean alloc. If you're assuming that space has already been set aside for your array (i.e. the array already exists) then simply by updating the appropriate variables (e.g. size, last index, whatever) you can init in constant time. Similarly, for adding, if you keep track of the index of the last element that was added, you can add in constant time by adding to the last index plus one. To remove, you're setting a constant number of bits (to some predetermined 'wipe-away' value) so that is also constant. Anyway, I don't understand what the OP is saying about junk values being "magically set" to the pointer... explain please.

@ poster above me: init doesn't mean alloc. If you're assuming that space has already been set aside for your array (i.e. the array already exists) then simply by updating the appropriate variables (e.g. size, last index, whatever) you can init in constant time. Similarly, for adding, if you keep track of the index of the last element that was added, you can add in constant time by adding to the last index plus one. To remove, you're setting a constant number of bits (to some predetermined 'wipe-away' value) so that is also constant. Anyway, I don't understand what the OP is saying about junk values being "magically set" to the pointer... explain please.

[Skip to the end for an example which will hopefully make this more clear]

The thing is, this isn't a "stack" kind of array. I must add and remove variables in specific locations. Thus keeping track of the size would be pointless like this, as it doesn't tell you where the junk values are.

Explanation:
Let's say for example the algorithm creates/allocs an extra array in the beginning .
This extra array consists of int pointers to the actual array holding the values. For every adding of a value to the array (these locations are significant and provided by the user), I set the extra array's pointing value to the actual array's correct location.
When removing/checking whether there's a value, I check whether the corresponding value in the extra array equals the actual's corresponding location.

Now, this is probably very confusing, so I'll add an example.

The actual array gets allocated in the memory to addresses 100-200, and the extra array is 600-700.
User wants to put "14" in the 2nd position.
Algorithm sets address 104-107 to hold the value "14", and address 604-607 to hold the value "104".
User wants to see if 2nd position holds a value, or delete it.
Algorithm checks if extra array's value in the 2nd position equals actual array's address in the 2nd position. Both are equal "104", therefore there exists a value (which can be deleted by setting extra's value in this address to NULL).

Problem with this idea: What if extra's array's original (junk) value in the 2nd position was "104"?
User checks if there exists a value, yet there is none. Algorithm fail.

What to do? Find an algorithm which doesn't fail. That's the problem.

In some sense, the problem is impossible. Simply indexing an array takes O(log(n)) time. The reason is that the number of bits needed to describe the index is log_2(n). Of course, processors operate with bitstrings in chunks of 32 or 64 at a time, so we like to pretend that array indices are magic values that can be used in O(1) time. It's sensible, from a machine performance standpoint, to treat them as such.

The thing is that when we uncover problems such as the one we now face, the obvious solutions invariably produce real logarithmic time performance for lookup and setting operations. We could make a side-array of bits that describes whether a given element in the main array is initialized. Of course we'll need a side-side-array of bits to describe which chunks of bits in that side-array are initialized. And so on. This tower of side-arrays invariably has size O(log(n)), and the actual logarithmic constant depends on how many bits we chunk together.

We might decide that our main array is an array of n 32-bit integers. Then we have a side-array with n bits, each bit telling whether a given element of the main array is initialized. Then we have a side-side-array with n/32 bits, each bit telling whether a given block of 32 bits in the side-array has been initialized. (We initialize the whole block in the side-array to zero the first time we try to touch it.) Then we have a side-side-side-array with n/1024 bits, and a side-side-side-side-array with n/32768 bits, and a side-side-side-side-side-array with n/1048768 bits, and then a side^6-array with n/2^25 bits, and a side^7-array with n/2^30 bits, which, if we're on a thirty-two bit system, we just initialize because we know the array size can't be larger than 2^32-1, which means the side^7-array has at most four bits, so we might call that a constant-time operation.

Now each array access is a O(log(n)) operation. But since n can't actually be larger than 2^32, what does that mean? For each array access, we touch 8 memory locations. That's a small number.

And what if instead of using chunk sizes of 32 bits, we used chunk sizes of 64 bits? Then each array access touches 6 memory locations, but each memory location is twice as big. But hey, how big are the cache lines.

The result is that while we have a logarithmic algorithm, we have one whose logarithmic base is large: 2^6 or something like that. That's not as big as 2^32, which boils array indexing down to one array access, but it's pretty good -- we're using a lot of bits at the same time.

Compare that with balanced binary trees, whose depths, when optimally balanced, reach 30 nodes before filling up memory on a 32-bit system. And when less optimally balanced can reach 60 or more nodes deep. That's still an "absolute" bound on the depth. But it's a lot more memory being accessed in different locations.

So, to recap: an important thing to consider when considering O(log(n)) constants is to ask how many bits are getting used in parallel. Array access uses 32 or 64 bits in parallel, which is simplistically speaking as good as we can get. In the above monologue, we use 5 or 6 bits in parallel, which is decent.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.