Except for very rare cases, programmers always work with black boxes. We rely on routines, libraries and frameworks, for which we usually know nothing but what the documentation says. You have the OS, you have its API. You may have a managed environment like Sun Java and Microsoft .NET Framework. On top of that, you usually have a few frameworks and libraries. The latter are likely to be white boxes, but are usually too intricate to easily trace issues inside.
These complex environments easily end up with various stability and performance-related issues for your applications. Especially nasty are memory weaks, which are difficult, not to say impossible to trace and resolve.
Recently we had a strange experience with a project in Java. The system was working quite properly for many weeks with good uptime, when the project was upgraded to Java 1.6. After the upgrade performance would start declining after a few hours, the applications consuming all the memory available. We spent days trying to find the weak spot in our code, but to no avail. Then we reverted temporarily the project back to Java 1.5 and, what do you know, the problem was gone! However, we had to use Java 1.6 for various purposes.
We ultimately came to a solution - servers, running our application, get restarted every couple of hours automatically, one by one, to ensure the availability of service.
This solution looks dirty, rude, and defies all good design and programming practices. Unfortuantely, sometimes you cannot help it. You cannot trace and fix a problem deep in Java’s runtime, or NET’s runtime, or one or another closed source library.
I myself felt quite uncomfortable with that solution, until I learned from a good source that the same practice is used by… Google. Yes, Google do restart their servers on a scheduled basis.
I am still not really happy with this approach, but knowing that the big and smart guys out there do it as well is at least some comfort.