Tribal Knowledge: atoi is evil!

Question

deceptikon 1,790 Code Sniper

11 Years Ago

A common task in C or C++ is to get a string from somewhere and convert it to an integer. "Use atoi" is a common suggestion from the unwashed masses, and on the surface it works like a charm:

#include <stdio.h>
#include <stdlib.h>

long long square(int value)
{
    return value * value;
}

long long cube(int value)
{
    return value * square(value);
}

int main(void)
{
    char buf[BUFSIZ];

    fputs("Enter an integer: ", stdout);
    fflush(stdout);

    if (fgets(buf, sizeof buf, stdin) != NULL) {
        int value = atoi(buf);

        printf("%d\t%lld\t%lld\n", value, square(value), cube(value));
    }

    return 0;
}

"So what's the problem?", you might ask. The problem is that this code is subtly broken. atoi makes two very big assumptions indeed:

The string represents an integer.
The integer can fit in an int.

If the string does not represent an integer at all, atoi will return 0. Yes, that's right. If atoi cannot perform a conversion, it will return a valid result. Which means that if atoi ever returns 0, you have no idea whether it was because the string is actually "0", or the string was invalid. That's about as robust as a library can get...not!

If the string does represent an integer but the integer fails to fit in the range of int, atoi silently invokes undefined behavior. No warning, no error, no recovery. Do not collect $200 and go straight to jail, your program is completely undefined from that point forward.

By the way, don't expect any support from errno if atoi fails for any reason; atoi isn't required to set errno under any circumstances.

atoi falls under a class of truly heinous library functions that exist solely due to backward compatibility of existing code. Another notable member of this hall of shame is gets. Unlike gets, which cannot be made safe, atoi can be used safely by thoroughly validating the string before passing it in:

#include <ctype.h>
#include <limits.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

bool is_valid_int(const char *s);

long long square(int value)
{
    return value * value;
}

long long cube(int value)
{
    return value * square(value);
}

int main(void)
{
    char buf[BUFSIZ];

    fputs("Enter an integer: ", stdout);
    fflush(stdout);

    if (fgets(buf, sizeof buf, stdin) != NULL) {
        buf[strcspn(buf, "\n")] = '\0';

        if (is_valid_int(buf)) {
            int value = atoi(buf);

            printf("%d\t%lld\t%lld\n", value, square(value), cube(value));
        }
    }

    return 0;
}

bool is_valid_int(const char *s)
{
    long long temp = 0;
    bool negative = false;

    if (*s != '\0' && (*s == '-' || *s == '+')) {
        negative = *s++ == '-';
    }

    while (*s != '\0') {
        if (!isdigit((unsigned char)*s)) {
            return false;
        }

        temp = 10 * temp + (*s - '0');

        if ((!negative && temp > INT_MAX) || (negative && -temp < INT_MIN)) {
            return false;
        }

        ++s;
    }

    return true;
}

Aside from being a pain in the butt and easy to get wrong, you'll notice that is_valid_int performs a string to integer conversion. If you're already doing it manually, why do you then subsequently need atoi to do exactly the same thing in a less safe way? Obviously something is wrong here.

Now, I strongly recommend against writing your own manual conversions if you can avoid it, because it's a pain in the butt and easy to get wrong, as mentioned before. So what is a C programmer to do? The answer is strtol. atoi is defined as behaving the same as strtol(s, NULL, 10), except for the behavior on error. This suggests that strtol can handle the errors that atoi can't, and that's true. Here's a safe replacement of atoi in the first program using strtol:

#include <errno.h>
#include <limits.h>
#include <stdio.h>
#include <stdlib.h>

long long square(int value)
{
    return value * value;
}

long long cube(int value)
{
    return value * square(value);
}

int main(void)
{
    char buf[BUFSIZ];

    fputs("Enter an integer: ", stdout);
    fflush(stdout);

    if (fgets(buf, sizeof buf, stdin) != NULL) {
        char *end = NULL;

        errno = 0;

        long temp = strtol(buf, &end, 10);

        if (end != buf && errno != ERANGE && (temp >= INT_MIN || temp <= INT_MAX)) {
            int value = (int)temp;

            printf("%d\t%lld\t%lld\n", value, square(value), cube(value));
        }
    }

    return 0;
}

The first part of the test after strtol returns is to see if any characters were converted. If not, the string isn't a valid integer. Then errno is checked for a range error, which strtol will assign if the value is out of range for long. If errno remains 0 as you initialized it, the value is in the range of long. Finally, the value is verified to be within the range of int and you're good to go (though square and cube should also check the result range, that's beyond the scope of this article). This code does the same thing that the manual code above did, except now you're letting the standard library do the heavy lifting.

Note that checking errno for ERANGE and a range check against int are required for full safety. The reason for this is that long and int can have the same range. strtol will return LONG_MIN or LONG_MAX if there's an overflow situation, at which point only the value of errno will help you diagnose an error versus a legitimate boundary value.

Hopefully this article explained why atoi is to be avoided and how to safely do conversions with strtol. I now return you to your regularly scheduled programming.

c c++

Edited 11 Years Ago by deceptikon

6 Contributors
7 Replies
2K Views
10 Years Discussion Span
Latest Post 8 Months Ago Latest Post by Salem

Ancient Dragon commented: excellent +14

nitin1 commented: hey! you and your way of explaining things is out of the world. excellent!! +3

ddanbe commented: Deep knowledge! +15

mike_2000_17 commented: Interesting! +14

All 7 Replies

mike_2000_17 2,669 21st Century Viking

11 Years Ago

Very interesting post. However, one thing that I noticed was that it doesn't seem like there is any incompatibility with the implementation of strtol and atoi. And so, I checked the GNU GCC's implementation of the standard C library, and here is what I found:

__extern_inline int
__NTH (atoi (const char *__nptr))
{
  return (int) strtol (__nptr, (char **) NULL, 10);
}

But, of course, that doesn't change anything of what you said, i.e., you should still just use strtol instead. But I would guess all implementations of atoi are just forwarding to strtol anyways. In other words, it's just a legacy name and specification for the same function, such that the old and unsafe code would still work.

Salem 5,265 Posting Sage

8 Months Ago

hladysz said
please fix the condition:

Please explain why you think this is the wrong condition.

Edited 8 Months Ago by Salem

hladysz commented: Try with "9876543210" +0

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

nitin1 15 Master Poster · Answer 1 · 2013-10-18T04:47:18+00:00

how did you come across this thing? Did your code fail because of this ? Or have you read this somewhere ?

Secondly please explain this :

"The reason for this is that long and int can have the same range. strtol will return LONG_MIN or LONG_MAX if there's an overflow situation, at which point only the value of errno will help you diagnose an error versus a legitimate boundary value." thanks in advance.

P.S This is simply awesome. I never know this thing. Keep sharing these type of things.

sepp2k 378 Practically a Master Poster · Answer 2 · 2013-10-18T10:52:14+00:00

If the user enters a number too large to fit into a long, LONG_MAX will be returned. If the user enters a number that's exactly LONG_MAX, LONG_MAX will also be returned. The first would be an error case, the second would not (in the case that sizeof(int) == sizeof(long) - otherwise LONG_MAX would still be too large to fit into an int and the problem would be caught by the <= INT_MAX check). So we need errno to distinguish between these cases.

deceptikon 1,790 Code Sniper Team Colleague Featured Poster · Answer 3 · 2013-10-18T11:45:19+00:00

Keep sharing these type of things.

I have a bunch of things like this that are hard earned through experience. I was thinking either a series of posts or tutorials each covering one specific topic in detail where the topic is the kind of thing you'd typically learn through painful experience or from talking to experienced programmers.

nitin1 15 Master Poster · Answer 4 · 2013-10-19T14:23:01+00:00

@James sir, Please , you should start a seprate thread or something seprate area so that we can learn these superb things from you.yeyy!! excited to know more alike these things. making each post as a thread will scatter everything. But if no other option, still this way is also best and best. ;) thanks a lot.

hladysz 0 Newbie Poster · Answer 5 · 2024-10-04T10:56:39+00:00

Excellent post! Could you please fix the condition?

if (end != buf && errno != ERANGE && (temp >= INT_MIN || temp <= INT_MAX)) {

Tribal Knowledge: atoi is evil!

Recommended Answers Collapse Answers

All 7 Replies

Recommended Answers