jump to navigation

Why pointers and arrays in C are not the same October 11, 2009

Posted by haskelladdict in C programming, Linux.
Tags: , , ,
add a comment

In a post in February I posted the following piece of code
file1.c:

int foobar[100];

file2.c:

extern int *foobar;

stating that it won’t work as expected. Back then I referred people to Peter van then Linden’s book “Expert C Programming” for a detailed exposition. However, since a few people have since asked for an explanation I’ll attempt to give one below. Please refer to van then Linden’s book for a much more detailed and more well written exposition of the whole subject.

First off, it is very important to realize that arrays and pointers in C are not the same. Unfortunately there are many places in the C universe where this equivalence holds which has led many people to assume it is a general truth – it is not! So, once more, arrays and pointers in C are not the same! In other words, the two declarations

extern int *x;
extern int y[];

mean something different to the compiler.

That said, let’s first think about how elements of

int foo[] = "01234"

are accessed. In order to get at foo[2], say, we first retrieve the address of foo from the symbol table. Importantly, stored at this address is an int, namely, the first element of foo, foo[0]. Therefore, to get at foo[3] we simply start at foo and then grab the third int starting from that address. Hence, retrieving foo[3] involves two steps, get the address of foo itself, then march forward in memory to grab the 3rd int.

Now that that’s out of the way let’s consider

int *bar = "01234"

and again think about what is happening under the hood when retrieving bar[3]. Here, we have told the compiler that bar is a pointer to an int. So when the compiler retrieves the address of bar from the symbol table, bar does not store an int like in the case of foo above. Rather, bar holds a pointer to int which in turn tells us where in memory this int actually is, quite literally an “int*”, really. This is the important distinction! Hence, accessing bar[3] involves one additional redirection step compared to a direct array access like in the case of foo[3]. First, the compiler retrieves the pointer to int stored at bar. Then we follow this pointer to get at the actual int, bar[0]. Finally, we march ahead three ints worth of memory to get at bar[3].

With this in mind let’s again look at the example given at the beginning:
file1.c:

int foobar[100];

file2.c:

extern int *foobar;

In file1.c we declare the symbol foobar to be an array of ints. Since foobar is an array, its address stored in the symbol table will directly point to the first int in the array, foobar[0] in this case.

Unfortunately, in file2.c we tell the compiler that foobar is a pointer to an int. So what will happen if we try to access foobar[2] somewhere inside file2.c? According to the above the following:

  1. Get the address of foobar from the symbol table
  2. Retrieve the pointer to int stored at foobar

Ups! That’s bad, foobar does not store a pointer to int, it stores ints directly. Hence, the additional redirection step caused by declaring foobar as “int*” causes the compiler to falsely interpret the first bunch of ints stored at foobar as a pointer to int. Obviously, when following this bogus pointer nothing really good is going to happen and we may end up with a potentially very hard to track down bug.

That said, the correct way to declare foobar in file2.c is of course
file2.c

extern int foobar[];

Here, we’re telling the compiler that when retrieving the address of foobar from the symbol table it must directly interpret it as a memory location where ints a stored and all is well.

Hence, despite the fact that occasions abound where C will treat arrays as pointers (e.g., array names in expressions are treated as pointers), fundamentally they are different and it is important to keep this in mind.