A common task in C or C++ is to get a string from somewhere and convert it to an integer. "Use atoi
" is a common suggestion from the unwashed masses, and on the surface it works like a charm:
#include <stdio.h>
#include <stdlib.h>
long long square(int value)
{
return value * value;
}
long long cube(int value)
{
return value * square(value);
}
int main(void)
{
char buf[BUFSIZ];
fputs("Enter an integer: ", stdout);
fflush(stdout);
if (fgets(buf, sizeof buf, stdin) != NULL) {
int value = atoi(buf);
printf("%d\t%lld\t%lld\n", value, square(value), cube(value));
}
return 0;
}
"So what's the problem?", you might ask. The problem is that this code is subtly broken. atoi
makes two very big assumptions indeed:
- The string represents an integer.
- The integer can fit in an
int
.
If the string does not represent an integer at all, atoi
will return 0
. Yes, that's right. If atoi
cannot perform a conversion, it will return a valid result. Which means that if atoi
ever returns 0
, you have no idea whether it was because the string is actually "0"
, or the string was invalid. That's about as robust as a library can get...not!
If the string does represent an integer but the integer fails to fit in the range of int
, atoi
silently invokes undefined behavior. No warning, no error, no recovery. Do not collect $200 and go straight to jail, your program is completely undefined from that point forward.
By the way, don't expect any support from errno
if atoi
fails for any reason; atoi
isn't required to set errno
under any circumstances.
atoi
falls under a class of truly heinous library functions that exist solely due to backward compatibility of existing code. Another notable member of this hall of shame is gets
. Unlike gets
, which cannot be made safe, atoi
can be used safely by thoroughly validating the string before passing it in:
#include <ctype.h>
#include <limits.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
bool is_valid_int(const char *s);
long long square(int value)
{
return value * value;
}
long long cube(int value)
{
return value * square(value);
}
int main(void)
{
char buf[BUFSIZ];
fputs("Enter an integer: ", stdout);
fflush(stdout);
if (fgets(buf, sizeof buf, stdin) != NULL) {
buf[strcspn(buf, "\n")] = '\0';
if (is_valid_int(buf)) {
int value = atoi(buf);
printf("%d\t%lld\t%lld\n", value, square(value), cube(value));
}
}
return 0;
}
bool is_valid_int(const char *s)
{
long long temp = 0;
bool negative = false;
if (*s != '\0' && (*s == '-' || *s == '+')) {
negative = *s++ == '-';
}
while (*s != '\0') {
if (!isdigit((unsigned char)*s)) {
return false;
}
temp = 10 * temp + (*s - '0');
if ((!negative && temp > INT_MAX) || (negative && -temp < INT_MIN)) {
return false;
}
++s;
}
return true;
}
Aside from being a pain in the butt and easy to get wrong, you'll notice that is_valid_int
performs a string to integer conversion. If you're already doing it manually, why do you then subsequently need atoi
to do exactly the same thing in a less safe way? Obviously something is wrong here.
Now, I strongly recommend against writing your own manual conversions if you can avoid it, because it's a pain in the butt and easy to get wrong, as mentioned before. So what is a C programmer to do? The answer is strtol
. atoi is defined as behaving the same as strtol(s, NULL, 10)
, except for the behavior on error. This suggests that strtol
can handle the errors that atoi
can't, and that's true. Here's a safe replacement of atoi
in the first program using strtol
:
#include <errno.h>
#include <limits.h>
#include <stdio.h>
#include <stdlib.h>
long long square(int value)
{
return value * value;
}
long long cube(int value)
{
return value * square(value);
}
int main(void)
{
char buf[BUFSIZ];
fputs("Enter an integer: ", stdout);
fflush(stdout);
if (fgets(buf, sizeof buf, stdin) != NULL) {
char *end = NULL;
errno = 0;
long temp = strtol(buf, &end, 10);
if (end != buf && errno != ERANGE && (temp >= INT_MIN || temp <= INT_MAX)) {
int value = (int)temp;
printf("%d\t%lld\t%lld\n", value, square(value), cube(value));
}
}
return 0;
}
The first part of the test after strtol
returns is to see if any characters were converted. If not, the string isn't a valid integer. Then errno
is checked for a range error, which strtol will assign if the value is out of range for long
. If errno
remains 0
as you initialized it, the value is in the range of long
. Finally, the value is verified to be within the range of int
and you're good to go (though square
and cube
should also check the result range, that's beyond the scope of this article). This code does the same thing that the manual code above did, except now you're letting the standard library do the heavy lifting.
Note that checking errno
for ERANGE
and a range check against int
are required for full safety. The reason for this is that long
and int
can have the same range. strtol
will return LONG_MIN
or LONG_MAX
if there's an overflow situation, at which point only the value of errno
will help you diagnose an error versus a legitimate boundary value.
Hopefully this article explained why atoi
is to be avoided and how to safely do conversions with strtol
. I now return you to your regularly scheduled programming.