My servers are running RedHat ES 3.
A while back, (maybe a month or two) when I came in to work in the morning, our main server was not responding at all. Nothing over the network, or on the console. The power light was still on though.
Upon trying to restart it, it became apparent that the RAID controller card had gone bad.
I replaced the RAID controller card, and everything has pretty seemed to be working as normal.
Except there have now been 2 or 3 times since I replaced the card, that it has done that same thing. Completely not responding, not even to the console. Restarting it, seems to solve the problem though.
What seems even more strange to me, is the fact that this has always happened in the middle of the night, when nothing is really going on on the server. During the day, and evening, we have this server running up to 140 'dumb' terminals. As well as many people accessing it through PuTTY and SAMBA. But it has never gone down when all this is going on.
It's not a major problem (yet), since it has happened only a few times, and happened at night -- but I want to try to head this off before it does become a problem. I have tried looking at logs, etc, and haven't found anything revealing (at least not to me).
Does anyone have an suggestions, of what it might be/where I should start looking? Do you think that the new RAID card might also be going bad?
Any and all input is greatly appreciated!
(Let me know what in any extra information I should post)