Dear all,

I am busy implementing a small program and I'd like to work with threads. First of all, I'd like to specify that I am certainly not an expert user of C++ (I often use Perl -ouch- but in this case, Perl was far too slow for the program I wanted to use).

That being said, my program is divided into two main parts :
a) reading and filling a set of vectors (with functions)
b) running one function to compute a score (with vectors loaded in part a)) and then running them again on a vector of randomized vectors of Elem (bootstrap to estimate the significance of the score).


The vectors loaded in the first part are declared as :

vector <string> datasetNames; // vector of string
vector <Elem> rankedData; // vector of Elem. An elem is an object that contains one int and one double
vector <vector <bool> > datasets;// vector of vectors of booleans
vector <vector <Elem> > shuffledRankedVectors; // vector of numerous (at least 5000) vectors of Elem

In the begininning, I was using a function es (see below) that was computing the score once on the real values (vector <Elem> rankedData) and a given number of times on the randomized values (vector <vector <Elem> > shuffledRankedVectors).

double es (vector <Elem> &rankedData, int rankedSize, vector <bool> &dataset, vector <double> ess)

Even if it was far better than the Perl performances, the bootstrap part stayed far too slow, so I wanted to use threads. To this, unfortunately, I read that I could only pass ONE argument to the function es. I thus wrote a function run_es_pval (see below) that would take a struct that contains integers and pointers to the objects I needed to run my calculations.

struct input_struct {
  int start; 
  int end; 
  vector <string> *datasetNames; 
  int rankedSize; 
  vector <Elem> *rankedData; 
  vector <vector <bool> > *datasets; 
  vector <vector <Elem> > *shuffledRankedVectors;
};
void* run_es_pval(void *inptr) {
  input_struct in = *((input_struct*)(inptr));
  
  int start = in.start;
  int end = in.end;
  
  vector <string> *datasetNames_ptr = in.datasetNames; 
  vector <string> datasetNames = *datasetNames_ptr;
  int rankedSize = in.rankedSize; 
  
  vector <Elem> *rankedData_ptr =  in.rankedData; 
  vector <Elem> rankedData = *rankedData_ptr;
  vector <vector <bool> > *datasets_ptr = in.datasets; 
  
  vector <vector <bool> > datasets = *datasets_ptr;
  vector <vector <Elem> > *shuffledRankedVectors_ptr = in.shuffledRankedVectors;
  vector <vector <Elem> > shuffledRankedVectors = *shuffledRankedVectors_ptr;
  
  for (int i = start; i < end; i++) {
    
    string datasetName = datasetNames[i];
    vector <double> esResults;
    esResults.resize(rankedSize);
    // real computations
    double esi = es(rankedData, rankedSize, datasets[i], esResults);
    // random controls (bootstrapping)
    double pval = getpval (shuffledRankedVectors, rankedSize, datasets[i], esi);
    // print pvalues
    cout << datasetName << "\t" <<   esi << "\t"<< pval<< endl;
  }
}
pthread_t* h1 = new pthread_t;
    pthread_attr_t* atr = new pthread_attr_t;
    pthread_attr_init(atr);
    pthread_attr_setscope(atr,PTHREAD_SCOPE_SYSTEM);
    pthread_create(h1,atr,run_es_pval,(void *) &thisthreadvalues);

However, I observe two main issues :

  • when using directly the function run_es_pval, it takes twice more space in memory (which means that with more than 1 thread, the memory usage should increase again) ... but at least, it's working!
  • when using the function run_es_pval through pthread_create, I obtain a segmentation fault on running.

So my question is thus double

  • How not to increase the memory usage even if I use 1000 threads?
  • Why do I obtain a segmentation fault?

At the moment, the only solution I see would be to store all my vectors as global variables but I don't find it very elegant!

I thank you a lot for the time you spent reading my issue.

Regards,

Sylvain

I am indeed using linux and I may give a try to your idea but I'd like my code to be as portable as possible.

So, I'd like to find another solution!

Thanks for the hint...

I am indeed using linux and I may give a try to your idea but I'd like my code to be as portable as possible.

So, I'd like to find another solution!

Thanks for the hint...

oooh sorry link fail!
lol, it was supposed to be a youtube video.
http://www.youtube.com/watch?v=ys4NjnSyzkY

Thanks a lot for the precisions! I'll give a try by tomorrow!

Other solutions (linked to pthreads are welcome).

Cheers

Problem solved thanks to pseudorandom21!

Many thanks!

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.