lamba89 0 Newbie Poster

Hi everyone, I'm trying to parallelise a Gram-Schmidt algorithm. Here's the serial code:

for( k = 0; k < m; k++ ) {
  for( i = 0; i < k; i++ ) {

    s = 0.0;
    for( j = 0; j < n; j++ ) {
      s += q[ i ][ j ] * q[ k ][ j ];
    }

    for( j = 0; j < n; j++ ) {
      q[ k ][ j ] -= s * q[ i ][ j ];
    }
  }

  s = 0.0;
  for( j = 0; j < n; j++ ) {
    s += pow( q[ k ][ j ], 2 );
  }

  for( j = 0; j < n; j++ ) {
    q[ k ][ j ] /= sqrt( s );
  }
}

I have tried a bunch of different ways to parallelise it and I found that the k-loop or the i-loop cannot be parallelised because they all depend on previous iterations. Only some the inner loops can be parallelised (i.e. the j-loops) by doing something like this:

  for( k = 0; k < m; k++ ) {
    for( i = 0; i < k; i++ ) {

      s = 0.0;
#pragma omp parallel for default(none) shared(i, k, q, m, n) private(j) reduction(+:s) schedule(dynamic)
      for( j = 0; j < n; j++ ) {
        s += q[ i ][ j ] * q[ k ][ j ];
      }

/* #pragma omp parallel for */
      for( j = 0; j < n; j++ ) {
        q[ k ][ j ] -= s * q[ i ][ j ];
      }
    }

    s = 0.0;
#pragma omp parallel for default(none) shared(i, k, q, m, n) private(j) reduction(+:s) schedule(dynamic)
    for( j = 0; j < n; j++ ) {
      s += pow( q[ k ][ j ], 2 );
    }

/* #pragma omp parallel for */
    for( j = 0; j < n; j++ ) {
      q[ k ][ j ] /= sqrt( s );
    }
  }

This however creates a large overhead because threads are created and destroyed over and over as the algorithm is running making it a lot slower than the serial version even for very large 'm' and 'n'.

I can't find any way to efficiently parallelise the piece of code but I have seen parallelised MGS algorithms in research papers. What am I missing here?

Thanks