Ambiguous raw pointers

19th June 2014

I recently posted an answer on Stack Overflow about the misuses of dynamic allocation and raw pointers (that is, types like int*). You'll probably see this kind of advice regularly in the context of modern C++. There's almost always safer alternatives that make your code much cleaner and are less likely to introduce bugs. While dynamic allocation and raw pointers are often found together, they are orthogonal concepts that each have their own problems. In this article, we'll look only at raw pointers.

It's important to choose types that cannot be used in ways for which they weren't intended. This ensures that problems are detected earlier and that your interfaces are clean and expressive. I often think about what types tell me. Just by looking at the type of a variable (perhaps a function parameter), what can I determine about the values that it should take?

Raw pointers are far too ambiguous. It is possible to use them in ways much broader than their intended use, often with terrible consequences. If you see a function with a pointer parameter, it is difficult to know what you should be passing to the function without reading its documentation. Even then, you'd better hope the documentation was correct and you are passing the correct things. The more specific we can be with our types the better, as it ensures that our code is safe, trustworthy, and easily understood. As with many things in C++, we can often achieve this with little to no run-time cost.

Let's consider a pointer type, T*, and some questions it leaves unanswered:

Can it be a null pointer?
Can it point at an invalid object?
What is the storage duration of the object?
Who has mutable access to the object?
Who is responsible for deleteing the object, if necessary?
Is the object just one element in an array of T?

In fact, all this pointer really says is “I store either a null pointer value or the address of what is hopefully a T object”. It's not nice to have all these mysteries whenever we see a raw pointer. Ideally, our types should be explicit about as many of these things as possible.

Arrays

Pointers are commonly used to point at the first element in an array. This is necessary when an array type itself cannot be used, such as when the array has dynamic length or as the parameter of a function. The usual array indexing syntax can be applied to the pointer, so it can largely be treated as though it were an array. Using pointers in this way is ambiguous because nothing about the type actually says it's pointing at an element in an array.

struct Conversation {
	// ...
	std::string* messages;
	// ...
};

Here, only the plural name of the data member suggests that it might be pointing at the first element in array. How would we know without looking at how the pointer is being assigned?

In this case, we are much better off using a standard library container:

struct Conversation {
	// ...
	std::vector<std::string> messages;
	// ...
};

Now we know for certain that messages is some kind of container of strings. This also has the benefit of having a much nicer interface, rather than having to manually perform dynamic resizing of the array.

C-style strings

It is common to read the type char const* as “a C-style string”, but this is only because it is so often treated as a special case. We tend to assume that it points at the first character in a null-terminated array of characters. This is quite a strong assumption, given that there's no reason this pointer couldn't just be pointing at a single char, or even an array of char that is not null-terminated.

void print(char const*);

We would probably assume that we should pass a C-style string to this function. However, the type of the argument itself is not enough to know for certain. What would happen if we passed a pointer to a single char? What about a null pointer? Would it break? Why even allow this possibility?

void print(std::string const&);

This is much better because it explicitly asks for a std::string and we know what a std::string represents. It is impossible for us to pass something that is not a string.

Optional arguments

While default arguments are supported by the language, raw pointers are sometimes used to allow arguments from the middle of the argument list to be omitted. A null pointer is treated as an omitted argument.

double complexAlgorithm(int, Whatsamajig const*, std::string const*, double const*);

This function has three arguments that are optional. However, we cannot know for certain without reading the documentation. Are we sure it won't break if we pass a null pointer?

double complexAlgorithm(int, boost::optional<Whatsamajig>, boost::optional<std::string>, boost::optional<double>);

By using boost::optional, it is clear that these arguments are optional. To omit an argument, we pass boost::none.

Reference semantics

In general, value semantics should be preferred, where the value of an object is all that is important, rather than its identity. Sometimes, however, we need to pass a particular instance of an object around. Traditionally, this was achieved with pointers. In fact, C programmers sometimes refer to pointers as references because they provide reference semantics.

void foo(int*);

It is intended that a pointer to an int is passed to this function so that it can modify that particular int object. Once again, this is not entirely clear from the type. What happens if we pass a null pointer? How do we know it's not supposed to take an array?

void foo(int&);

Here, on the other hand, the intention is clear. C++ has built-in reference types, which neatly express the concept of reference semantics. The passed argument cannot be anything but a valid int object. A reference cannot be null, so we don't even have to worry about that.

Whether you should have output parameters at all is another matter. Consider std::tuple for returning multiple outputs. In fact, mutable reference semantics in general can be difficult to reason about because the referenced objects might be modified by multiple parts of your code. If you need reference semantics, prefer const references when possible. Otherwise, use value semantics as much as possible.

Transferring ownership

When an object is dynamically allocated with new, some code must be responsible for deleteing that object. This code is said to “own” that object. We often want to transfer ownership of an object between parts of our code.

As an example of transferring ownership with raw pointers, a common pattern in C libraries is to have functions that allocate objects and return pointers to those objects:

Image* createImage();
void destroyImage(Image*);

It is then expected that the resulting pointer is later passed to the deallocation function when the object is no longer required. This can easily lead to memory leaks if the user fails to deallocate the object correctly. It is not clear at all from the return type that we now have the responsibility of deallocating the object or how we should go about doing so.

C++11 introduced a number of smart pointers. These smart pointers represent some particular ownership of a dynamically allocated object. For example, std::unique_ptr represents unique ownership, while std::shared_ptr represents shared ownership. These smart pointers will automatically deallocate the object when all ownership is given up.

The previous C-like library interface would now be written as follows:

std::unique_ptr<Image> createImage();

This now clearly states that we are being given unique ownership of the Image. That is, the library is promising that it does not hold on to the object. An added bonus is that now you do not need to manually deallocate the Image. This will happen automatically whenever you give up ownership.

Other uses

Pointers allow polymorphic behavior. That is, calling a virtual member function through a pointer will ensure that the dynamic type of the object will be looked up. This is also supported by references and smart pointers, so the same benefits as above apply here. We should always be able to assume, under the Liskov substitution principle, that any of these types that provide reference semantics will happily accept derived types also. There is therefore no additional ambiguity here.

Sometimes we want to share a pointer to some object without passing any ownership. When raw pointers are used in this way, they're described as observing or non-owning pointers. There is currently no standard way to express this intent so raw pointers are commonly used and exhibit the same problems we have seen above. There is, however, a proposal to introduce a non-owning smart pointer. Despite this, I suggest that such observing interfaces should use reference types instead.

If you require some reference semantics that are not provided by any existing type (and raw pointers are too broad, as usual), why not implement it yourself? A simple type that clearly expresses your intent can go a long way towards improving your code.

One of the few acceptable uses of raw pointers is when you need to make sure a function overload is chosen as expected. When the compiler has a choice between a standard conversion and a conversion to a user-defined type, it will always prefer the standard conversion. This can be problematic:

void foo(bool isSomething);
void foo(std::string str);

If you attempt to call this function with a string literal, the first overload will be chosen. The best way to get around this is to provide a C-style string overload that forwards to the std::string overload:

void foo(char const* str);

Additionally, if you are dealing with low-level memory manipulation, then you probably do want to use raw pointers. In this case, they genuinely represent your intentions accurately.

Conclusion

We have seen that raw pointers are pretty bad at expressing intent because they do not sufficiently restrict the values they take. There is almost always a better alternative type you can use. Doing so will drastically improve the safety of your code by avoiding misuse at run-time and allow you to write cleaner, more expressive interfaces. Of course, there are always exceptions, but good code can often get away with very minimal use of raw pointers.

Note that the use of raw pointers is independent of the use of new for dynamic allocation. We can pass pointers to objects with automatic or static storage, after all. However, new is equally as problematic and can be avoided by using smart pointers and RAII appropriately. My answer on Stack Overflow covers this briefly.

For further information, look up more about smart pointers, the slides for “Don't use f*cking pointers”, and “Why are pointers not recommended when coding with C++?”.