Here's a challenge for you C++ aces. The challenge is to write a function with the following declaration:

unsigned long extract_digits ( unsigned long x, size_t n, size_t i );

This challenge has three parts.

Part I (Beginner):

Write the extract_digits function. It should return a sub-value of the first parameter consisting of n digits starting at i. For example, the first 3 digits of 12345 starting at 1 should return 234.

Part II (Intermediate):

Make the extract_digits function as solid as possible given unexpected input.

Part III (Expert):

Make the extract_digits function as fast as possible.

There's no need to submit the solution when you're finished. This is a personal challenge, not a contest. However, programmers are a naturally competitive species, so if enough people post solutions to this thread, I'll judge them and maybe even give away a prize. :)

WolfPack commented: Nice. +7

Don't think you can trick us into doing your homework for you! This is the oldest one in the book!

And here's a lame version that assumes reasonable input, has passed only one test case, and assumes 32-bit longs.

unsigned long extract_digits (unsigned long x, size_t n, size_t i) {
  static const unsigned long tenpows[]
    = {1,10,100,1000,10000,100000,1000000,10000000,100000000,1000000000};

  size_t len;
  if (x < 1000) {
    len = 1 + (len >= 10) + (len >= 100);
  }
  else {
    if (x < 1000000) {
      len = 4 + (len >= 10000) + (len >= 100000);
    }
    else {
      len = 7 + (len >= 10000000) + (len >= 100000000);
    }
  }

  x = x % (tenpows[len - i]);
  
  return (x / tenpows[len - i - n]);
}
commented: ha ha ha. +7

>Don't think you can trick us into doing your homework for you! This is the oldest one in the book!
:D

Wow, I wouldn't do it that way at all!
I was thinking that it was more of an indexed pointer thing. You know, start at i and loop n times.
I believe a pointer would be faster than an array as well which is the third part of the "challenge".

Am I incorrect ?

Hm... I like Rashakil's solution. It's more elegant than anything I would have come up with. Unfortunately, I think I broke it. Test case: extract_digits(-1, 9, 3) which causes a bad index into the tenpows array.

Since the specifications were not accurate, I made some assumptions when it came to exceptions. Being a Java programmer I couldn't resist this humble solution...

#include <iostream>
#include <string>
#include <sstream>
#include <stdexcept>
using namespace std;

unsigned long extract_digits (unsigned long x, size_t n, size_t i)
{
    stringstream ss;
    unsigned long result = 0;
    string output;
    ss << x;
    string str = ss.str();

    if(i >= str.size() || i < 0)
        throw runtime_error("OutOfBoundsException");
    if(n < 1)
        throw runtime_error("NoDigitsSelectedException");
    output = str.substr(i, n);
    ss.str(output);
    ss >> result;
    return result;
}

int main(void)
{
    cout << extract_digits(234324, 3, 3);
    cin.get();
    return 0;
}

I guess I satisfied both conditions I and II of the problem statement though I am a bit unclear of what the third meant, considering that this would work for really long numbers without getting into the powers of 10 stuff.

PS: I hope I get a prize for this, I am in real need of some cash rewards. ;-)

>Since the specifications were not accurate
Sadly, realistic specifications tend not to be. This is an exercise in creativity and adaptation. :)

>though I am a bit unclear of what the third meant
Basically, find a worst case for the extraction and call the function with that worst case a lot of times. I'm looking for both algorithmic- and micro-optimizations because if you change the algorithm, it changes the worst case, which changes the test. ;)

Okay. Here's a version that should fix that break. It's passed zero test cases :-)

unsigned long extract_digits (unsigned long x, size_t n, size_t i) {
  static const unsigned long tenpows[]
    = {1,10,100,1000,10000,100000,1000000,10000000,100000000,1000000000};

  size_t len;
  if (x < 1000) {
    len = 1 + (len >= 10) + (len >= 100);
  }
  else {
    if (x < 1000000) {
      len = 4 + (len >= 10000) + (len >= 100000);
    }
    else {
      len = 7 + (len >= 10000000) + (len >= 100000000);
    }
  }

  x = x / (tenpows[len - i - n]);
  
  return (x % tenpows[n]);
}

Again, broken. Test case: extract_digits(5, 2, 4)

I'm assuming valid input.

commented: Programmers don't assume ;-) -4
commented: Yep +0
commented: Let's keep the C/C++ forum rep meaningful shall we? Let the non-geeks polute the geeks lounge rep. +8

I'm assuming valid input.

Aw, c'mon. You can do the intermediate level. Please? :P

I kinda forgot what forum I was in (C++/C), but here is a C version of what someone has already done in C++:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <math.h>

unsigned long extract_digits(unsigned long x, size_t n, size_t i){
  char * buffer;
  char * output;
  size_t num_digits;
  unsigned long num;

  num_digits = (int) log10(x) + 1;

  if ((n - i) > num_digits) return 0;

  buffer = (char *) calloc(num_digits, sizeof(char));
  if (buffer == NULL) return 0;

  output = (char *) calloc((n + 1), sizeof(char));
  if (buffer == NULL) return 0;

  sprintf(buffer, "%u", x);

  strncpy(output, buffer + i, n);

  num = (unsigned long) atol(output);

  free(buffer);
  free(output);

  return num;
}

Aw, c'mon. You can do the intermediate level. Please? :P

F the intermediate level. I'm skipping to "expert". Or at least giving people ideas for the expert level.

F the intermediate level. I'm skipping to "expert". Or at least giving people ideas for the expert level.

Ah. I'd assumed that the expert level encompassed the intermediate as well.

Oh man, doubleplus retarded. Here's a version that uses the right variable names...

unsigned long extract_digits (unsigned long x, size_t n, size_t i) {
  static const unsigned long tenpows[]
    = {1,10,100,1000,10000,100000,1000000,10000000,100000000,1000000000};

  size_t len;
  if (x < 1000) {
    len = 1 + (x >= 10) + (x >= 100);
  }
  else {
    if (x < 1000000) {
      len = 4 + (x >= 10000) + (x >= 100000);
    }
    else {
      len = 7 + (x >= 10000000) + (x >= 100000000);
    }
  }

  x = x / (tenpows[len - i - n]);

  return (x % tenpows[n]);
}

I'd start with the basic model of s.o.s and add the following checks:

n must be less than str.size()
i must be less than str.size() - n

To make the function more secure I'd consider changing the function prototype and definition to accept only strings as input so if user tries sending a signed long (particularly a negative signed long) instead of an unsigned long the problem could be detected and a correction requested.

The only microptimization I'd know to do would be to call str.size() only once, not three times, in extract_digit().

> I'd start with the basic model of s.o.s and add the following checks:
I have already added those.

> To make the function more secure I'd consider changing the
> function prototype and definition to accept only strings as input
This is a C++ challenge, so we are not allowed to change anything ;-)

Also, sending an unsigned long is not an error, its a subtle bug. You can't find out given a number has overflowed or not by any easy means. Plus the function I posted doesn't crash given the signed input since it would anyways be converted to unsigned one.

> he only microptimization I'd know to do would be to call str.size()
> only once, not three times, in extract_digit().
Huh, three times ?

Is the code expected to be HW/compiler independent ?

To make the function more secure I'd consider changing the function prototype and definition to accept only strings as input ...

Then it would just be a substr method, and not interesting at all.

> Is the code expected to be HW/compiler independent ?
Standard C++ is always compiler independent.

Thats it, this is as fast as it gets using my technique. Basically I reduced three calls to constructors and destructors respectively. ;-)

#include <iostream>
#include <cstring>
#include <stdexcept>
const size_t MAX_SIZE = 32; //some multiple of 8 greater than 10

using namespace std;

unsigned long extract_digits (unsigned long x, size_t n, size_t i)
{
    unsigned long result = 0;
    char* input = new char[MAX_SIZE];
    sprintf(input, "%lu", x);
    if(i >= strlen(input))
        throw runtime_error("OutOfBoundsException");

    char* output = new char[MAX_SIZE];
    memset(output, '\0', MAX_SIZE); //necessary to wipe the junk
    strncpy(output, input + i, n);
    sscanf(output, "%lu", &result); //can be replaced with strtoul()

    delete[] input;
    delete[] output;

    return result;
}
commented: yes but it is bastardised c++. -2

>>I have already added those.

All I see in your post is:

if(i >= str.size() || i < 0) throw runtime_error("OutOfBoundsException");

if(n < 1) throw runtime_error("NoDigitsSelectedException");

I agree that those are appropriate checks, but at least in my documentation, substr() will throw an error if the origin of the substring to be looked for is out of bounds automatically, but it doesn't say it will throw an error for any other problems. Therefore, after a little more thought I think the above check on i will be covered by the standard form of substr(), though it probably doesn't hurt to do it yourself. I don't think the following errors will be checked by substr() however.

if(n > str.size())
//throw error of some type because if you try to return more digits than there are in x you may well be out of bounds, but you can send a value of n to extract_digits() that exceeds that value.

if(i > (str.size() - n))
//throw an error because you will be reading past the end of the array containing str if this is true.

Now I have introduced two possible additional calls to size() on the same string. I presume it is quicker to call size() on the same string and store it in a variable than it is to call it more than once, though I don't really know for sure, because it doesn't routinely matter to me in the code I write.

>>Then it would just be a substr method, and not interesting at all.

True, but if that's the way to get the "sturdiest", most error proof way to accomplish the task, and that is your goal, then why not.

> Is the code expected to be HW/compiler independent ?
Standard C++ is always compiler independent.

Uh, no, for example, my code is standard C++, but it assumes 32 bit unsigned longs.

Size of data types is architecture dependent, not compiler. For long, it is 2 * word size of the machine. The compilers just follow the specification laid down. It would be nice if someone with a copy of C++ standard could confirm this.

No, it's compiler dependent. Compilers are specific to the architecture they're compiling for. And you're wrong about the size of a long.

Size of data types is architecture dependent, not compiler. For long, it is 2 * word size of the machine. The compilers just follow the specification laid down. It would be nice if someone with a copy of C++ standard could confirm this.

I believe the standard sets no rules for any type except char. I don't have a copy of it though, so take it as is.

Why not just use mathematical functions to solve this?? :)

Basically we calculate numbe of digits with log10 and round upwards. Then we sort of devide the number into two parts, one with all digits up to start+count and same number but with everything between start and start+count canceled to zero. Now we can simply subtract these two numbers and get the digits we were after. Fast and efficient :)

unsigned long extract_digits ( unsigned long x, size_t start, size_t count ){
    //calc numner of digits
    int dn=(int)ceil(log10(x));
    if(start>dn||(start+count)>dn)
        throw (const char*)("Invalid value for start and end!");
    unsigned long l=(unsigned long)(pow(10,count)*(unsigned long)(x/pow(10,dn-start)));
    unsigned long r=(unsigned long)(x/pow(10,dn-start-count));
    return r-l;
}

Note that I use cast instead of floor()

>It would be nice if someone with a copy of C++ standard could confirm this.
The standard only requires a minimum supported range. In the case of long int, it's -2147483647 to 2147483647. The word size of the machine means nothing to the standard, but implementations generally set the size of int to be the same as the data bus size of the machine for performance reasons. So on a 32-bit system, int would be 32-bits, and on a 16-bit system, int would be 16-bits. But there's no requirement that the size of a long be related to the size of an int provided that both meet the minimum size requirements.

But there's no requirement that the size of a long be related to the size of an int provided that both meet the minimum size requirements.

Which kinda sucks when you want longs to be longer than ints :icon_wink:

>Which kinda sucks when you want longs to be longer than ints
It does indeed, but with minimal extra effort you can get around that restriction. ;)

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.