Hopefully I shouldn't need to explain what gets is or why it's easily one of the worst possible standard functions you could call in your code. In my personal opinion, the title for "holy shit, never use this function" is a tie between gets and system. I'll discuss the problem with system in another article.

The good news is that gets is no longer a problem, provided you use the latest C11 standard. In the 1999 revision of the C language, gets was declared to be deprecated. This means that it was still supported, compilers still had to implement it correctly, but the even as soon as the next minor revision of the standard the function could be removed entirely. And hey, that's exactly what happened. The 2011 revision of C eliminated gets; it's no longer required to be supported, and any use of it makes your code non-portable. Rejoice!

Wait, why does this qualify as a tribal knowledge article? A quick Google search will tell you all you need to know. And while using Google to do research itself may qualify as tribal knowledge (*heavy sarcasm*), I'd hope it wouldn't be necessary to explain to most of you. ;)

No, this article will focus on what to do now that you no longer have guaranteed access to gets. The answer isn't quite as simple as "use fgets" because fgets has quirks too. Rather, let's look at a couple of hand rolled solutions that provide close to the same interface and functionality as gets, yet are vastly safer.

Alternatives

The goals for a gets alternative are typically as follows:

  1. Read a whole line, regardless of how long it is.
  2. Don't require a pre-allocated buffer.
  3. Return a pointer to the line.

Optional details that vary include:

  • Error handling mechanism.
  • Shared or unique storage for the line.
  • Notify the length of the line.

For the purposes of this article, let's go with an error handling mechanism that simplifies the interface as much as possible. We do want it to be a gets alternative, and gets is as simple of an interface as it gets. So for error handling we'll return a null pointer if no characters were stored, and set errno if an error occurs after any characters are stored. The interface remains simple:

char *getstr(FILE *in);

Let's start with the obvious implementation where we read into a dynamically constructed string character by character:

char *getstr(FILE *in)
{
    size_t capacity = BUFSIZ;
    char *p = malloc(capacity + 1);

    if (p != NULL) {
        size_t length = 0;
        char *temp;
        int ch;

        while ((ch = getc(in)) != '\n' && ch != EOF) {
            if (length == capacity) {
                /* Grow the string to accommodate long lines */
                capacity += BUFSIZ;
                temp = realloc(p, capacity + 1);

                if (temp != NULL) {
                    p = temp; /* All is well, use the new block */
                }
                else {
                    /* Allocation failure is a fatal error */
                    errno = ERANGE; /* Reuse ERANGE for the hell of it */
                    break;
                }
            }

            p[length++] = (char)ch;
        }

        if (!ferror(in)) {
            p[length] = '\0'; /* Terminate the string */

            /*
                Attempt to shrink wrap the block to fit our complete string.
                Failure in this case is not fatal, just wasteful of memory.
            */
            temp = realloc(p, length + 1);

            if (temp != NULL) {
                p = temp;
            }
        }
        else {
            /* Reset the string if we didn't store anything */
            if (length == 0) {
                free(p);
                p = NULL;
            }

            errno = ERANGE; /* Reuse ERANGE for the hell of it */
        }
    }

    return p;
}

So far so good. The code should be straightforward to anyone reasonably familiar with C. The use of temp for taking the result of realloc instead of p is because realloc could fail and return NULL. The last thing you want is to suddenly lose your only reference to a block of anonymous memory, right? ;) For the sake of...laziness, I suppose, I chose to use the standard macro ERANGE to notify errors in errno. Ideally you would want to define two of your own error macros so that they're clear in calling code as well as differentiate between an allocation error and a stream error. You might also want to set errno in the case of a shrink wrap failure.

So that's that. But can we do better? Well, we could take avantage of other input functions like fgets or fread. This offers potential performance benefits in terms of buffering. However, in terms of brevity it's not any better. But first, here's the code using fgets:

char *getstr(FILE *in)
{
    size_t capacity = BUFSIZ;
    char *p = malloc(capacity + 1);

    if (p != NULL) {
        size_t length = 0;
        char *temp;
        int ch;

        while (fgets(p + length, capacity - length, in) != NULL) {
            length = strlen(p);

            if (p[length - 1] == '\n') {
                /* The string fits, so trim the newline and we're done! */
                p[--length] = '\0';
                break;
            }
            else {
                /* Long line. Grow the string and try again */
                capacity += BUFSIZ;
                temp = realloc(p, capacity + 1);

                if (temp != NULL) {
                    p = temp; /* All is well, use the new block */
                }
                else {
                    /* Allocation failure is a fatal error */
                    errno = ERANGE; /* Reuse ERANGE for the hell of it */
                    break;
                }
            }
        }

        if (!ferror(in)) {
            if (length == 0) {
                /* Enforce a valid string since fgets didn't store anything */
                p[length] = '\0';
            }

            /*
                Attempt to shrink wrap the block to fit our complete string.
                Failure in this case is not fatal, just wasteful of memory.
            */
            temp = realloc(p, length + 1);

            if (temp != NULL) {
                p = temp;
            }
        }
        else {
            free(p);
            p = NULL;
            errno = ERANGE; /* Reuse ERANGE for the hell of it */
        }
    }

    return p;
}

You'll notice that the string resizing and shrink wrapping are still there by necessity. The savings you get from using fgets instead of getc are mitigated by edge cases such as when fgets fails immediately (on end-of-file, for example) and doesn't properly terminate the string. We also needed to remove the '\n' character since fgets stores it automatically if it's present.

Finally, there's an added problem with this implementation. If fgets returns NULL, you cannot determine the status of the string. It may contain valid characters, or not. You don't know, and you don't know how many. You also don't know if the string was terminated. So the only recourse is to assume that the string is always invalid. That's why the above code simply releases memory and sets p to NULL in the case of a stream error.

Given the two implementations, I prefer the first because it's more explicit and there are fewer gotchas. Having written these types of functions many times, I can safely say that the first is easier for me to get right on the first try. The second takes more thought and careful debugging.

Usage

Both implementations of getstr are called much like gets. There are four big differences though:

  1. Pass in a stream object. This means you can use something other than stdin.
  2. No pre-allocated buffer. getstr allocates memory for you, so you just need to define a pointer.
  3. RELEASE YOUR MEMORY! The pointer getstr returns is now owned by you, and you need to call free.
  4. errno can optionally be checked for errors.

Here's a simple test/example program using getstr:

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>

char *getstr(FILE *in);

int main(void)
{
    char *line;

    errno = 0;
    line = getstr(stdin);

    if (errno != 0) {
        puts("Error detected in getstr. The whole line may not have been extracted.");
    }

    printf(">%s<\n", line);

    if (line != NULL) {
        free(line);
    }

    return 0;
}
Honorable Mention

C11 adds a function called gets_s. That's "safer gets" for those who haven't been lamenting Microsoft's safe C library and the spurious warning 4996 that finds itself disabled poste haste whenever I create a new C project in Visual Studio. gets_s has the following signature:

char *gets_s(char *str, rsize_t n);

It looks good on the surface. You get the behavior of gets (or do you?) with the safety of fgets. But the cake is a lie. There are two huge issues with gets_s that make it prohibitive:

  • If the line is too long (ie. gets_s doesn't see a '\n' in the stream), gets_s will discard extraneous characters up to a newline or end-of-file. Certainly, this matches the naive way people may use fgets, but that doesn't make it a robust solution. As such, you can only safely use gets_s without loss of data when you know for an absolute fact that none of the lines will be longer than n-1. Since you don't know this, and in fact can't know this, gets_s is unsafe in robust code. Fail.

  • The "safe" functions, which includes gets_s, are a part of Annex K in the C11 standard. They are classified as optional for the implementation (ie. the compiler's standard library). In other words, there's no guarantee that gets_s is even supported by all C11 compilers, despite the fact that it's defined by the C standard. So to be strictly portable, you still have to work around implementations that don't define the __STDC_LIB_EXT1__ macro. Double fail.

mike_2000_17 commented: Another insightful article! Thanks! +14

The 2011 revision of C eliminated gets; it's no longer required to be supported, and any use of it makes your code non-portable. Rejoice!

Maybe yes, and maybe no. Universities are still teaching with ancient Turbo C compiler after 30 years extinct. How soon sill they get around teaching with 2013 compilers that have dropped gets()??

How soon sill they get around teaching with 2013 compilers that have dropped gets()??

Probably never. Why update your curriculum when you can install Virtual Box and simulate DOS to support an ancient compiler that conforms to the 30 year old curriculum? :rolleyes:

commented: Indeed! It is time to kill that dinosaur! +12

To Decepticon:

Just remember that old addage - Those who can, do. Those who can't, teach! :-)

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.