Hello! I am making a program which does many computations for a Photovoltaic system. The problem is that i get a strange segmentation fault error when I run the program. Here is where it crushes (at for-loop of calculation S5) :

commands.....
                  ...
                  ...

/********************	WIRING	*******************************/
	Ni = N_I/(Ns*Np);
	
	S1 = 0;
	for(j=0; j<Np; j++)
		S1 += (j-1)*Lpv2 + Lpv2/2;

	S2 = 0;
	for(j=0; j<Nr; j++)
		S2 += 2*h_d + W_T + (j-1)*Np*Lpv2;

	S3 = 0;
	for(j=0; j<Nc; j++)
		S3 += h_d + (j-1)*Ns*Lpv1;

	S4 = 0;
	for(j=0; j<Nc_nm; j++)
		S4 += h_d + (j-1)*Ns*Lpv1;

	S5 = 0;
	for(j=0; j<N_row; j++)
		S5 += (j-1)*(W_T + Fy);
	

	PL_ac = (1 + SF2)*( Pi_n_ac*(N_row - 1)*Nr*S3 + Pi_n_ac*Nr*S4 + Pi_n_ac*Nr*Nc*(N_row - 1)*S5 + Pi_n_ac*Ni*(Lpv1*n_t/2 + h_d) );

	Pic = Ni*Pi_n_ac/1000;

        more commands follow......
                             ...
                             ...

When S5 calculated in for-loop, I get segfault. If I put in comments, it works ok. If i put in comments one other variable (for example S4 and its loop) and leave S5 as it is, it works again. But for all for-loops, segfault appears.

I doubled the stack size with ulimit, but no improvement. I have some houndreds of double variables and some decades large arrays of double variables. I sense that the problem has to do with the memory that the program uses, but I'm really stucked. If someone has an idea what's happening, it would be very helpful.

My system is OpenSuse 11.4 (Celadon), the processor is Intel core2duo at 2,3GHz and system memory is 4Gb.

Sorry if my English is poor... Thanks!

line 26: does N_row contain a valid value?

I don't see how the code you posted could cause a stack problem since it doesn't appear to be using any arrays. The real problem could be somewhere else in your program, but its just manifesting itself in the code you posted. You would have to give us the entire program in order for us to help you debug it. If its a large project then just compress it and attach it to your post. But if the code is proprietary, such as company secrets, then its best not to do that.

It's for my thesis, but the code is not my proprietary, i've take it from a professor. Anyway, thanks for your answer. I'll try another compiler and use gdb more deeply. Until now the problem still exists, if you have any better suggestion that would help me find out tell me please. Thanks!

Oh, yes, N_row contains a valid value....

You should post your actual error message, because I'm pretty sure that whatever you're seeing is not a segfault.

segfaults happen when you access out of bound array indices, or "garbage" pointers. Your loop contains neither of these things (unless this isn't your real code).

Examples.

The flawed code

$ cat foo.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void foo ( void ) {
  char  *p = malloc(5);
  char  *q = NULL;
  strcpy(p,"hello");    // \0 overflows
  strcpy(q,"world");    // NULL
  // and a memory leak
}

int main ( int argc, char *argv[] )
{
  foo();
  return EXIT_SUCCESS;
}
$ gcc -g foo.c
$ ./a.out 
Segmentation fault

Oh no, it doesn't work - I wonder why.

Use GDB.

$ gdb ./a.out 
GNU gdb (GDB) 7.1-ubuntu
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/sc/Documents/coding/a.out...done.
(gdb) run
Starting program: /home/sc/Documents/coding/a.out 

Program received signal SIGSEGV, Segmentation fault.
memcpy () at ../sysdeps/x86_64/memcpy.S:79
79	../sysdeps/x86_64/memcpy.S: No such file or directory.
	in ../sysdeps/x86_64/memcpy.S
(gdb) where
#0  memcpy () at ../sysdeps/x86_64/memcpy.S:79
#1  0x00000000004005b4 in foo () at foo.c:9
#2  0x00000000004005ca in main (argc=1, argv=0x7fffffffe3a8) at foo.c:15
(gdb) up
#1  0x00000000004005b4 in foo () at foo.c:9
9	  strcpy(q,"world");    // NULL
(gdb) print q
$1 = 0x0
(gdb) kill
Kill the program being debugged? (y or n) y
(gdb) quit

Well that's easy enough to fix - q pointer is NULL, no wonder it crashes.

What about that buffer overrun, can we detect that?
Sure can!

$ valgrind ./a.out
==2143== Memcheck, a memory error detector
==2143== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
==2143== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for copyright info
==2143== Command: ./a.out
==2143== 
==2143== Invalid write of size 1
==2143==    at 0x4C28E5D: memcpy (mc_replace_strmem.c:497)
==2143==    by 0x40059A: foo (foo.c:8)
==2143==    by 0x4005C9: main (foo.c:15)
==2143==  Address 0x51b0045 is 0 bytes after a block of size 5 alloc'd
==2143==    at 0x4C274A8: malloc (vg_replace_malloc.c:236)
==2143==    by 0x400575: foo (foo.c:6)
==2143==    by 0x4005C9: main (foo.c:15)
==2143== 
==2143== Invalid write of size 1
==2143==    at 0x4C28F04: memcpy (mc_replace_strmem.c:497)
==2143==    by 0x4005B3: foo (foo.c:9)
==2143==    by 0x4005C9: main (foo.c:15)
==2143==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==2143== 
==2143== 
==2143== Process terminating with default action of signal 11 (SIGSEGV)
==2143==  Access not within mapped region at address 0x0
==2143==    at 0x4C28F04: memcpy (mc_replace_strmem.c:497)
==2143==    by 0x4005B3: foo (foo.c:9)
==2143==    by 0x4005C9: main (foo.c:15)
==2143==  If you believe this happened as a result of a stack
==2143==  overflow in your program's main thread (unlikely but
==2143==  possible), you can try to increase the size of the
==2143==  main thread stack using the --main-stacksize= flag.
==2143==  The main thread stack size used in this run was 8388608.
==2143== 
==2143== HEAP SUMMARY:
==2143==     in use at exit: 5 bytes in 1 blocks
==2143==   total heap usage: 1 allocs, 0 frees, 5 bytes allocated
==2143== 
==2143== LEAK SUMMARY:
==2143==    definitely lost: 0 bytes in 0 blocks
==2143==    indirectly lost: 0 bytes in 0 blocks
==2143==      possibly lost: 0 bytes in 0 blocks
==2143==    still reachable: 5 bytes in 1 blocks
==2143==         suppressed: 0 bytes in 0 blocks
==2143== Rerun with --leak-check=full to see details of leaked memory
==2143== 
==2143== For counts of detected and suppressed errors, rerun with: -v
==2143== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 4 from 4)
Segmentation fault

Look at the "Invalid write of size 1" dumps.
These show a stack trace to
a) where the problem was
b) where the memory was allocated.
The fix is either
a) fix the copy so it copies the right amount
b) fix the allocation so the copy has enough room.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.