We had a thread leak on our production servers recently that caused downtime on multiple applications.¬† The culprit turned out to be an application that configured Log4J incorrectly.
How can Log4J bring down an application server?¬† I don’t think this one was Log4J’s fault so much as a developer who initialized it incorrectly.
As you may know, Log4J has a nice feature called ConfigureAndWatch which allows you to specify a configuration file and an interval.¬† Log4J will check the configuration file for updates on the specified interval and will update the configuration as needed. ¬† This is a very useful feature as it allows us to configure logging on the fly on our production servers.¬† Thus, when an app begins to misbehave, we can enable more detailed logging without having to restart the application (which would reset the app and potentially make the error disappear for awhile).
The danger with ConfigureAndWatch is that by its very nature, it must spawn a new thread when it is called.¬† This thread sleeps for a period of time, then wakes up to check the config file.¬† This works well if you initialize Log4J properly as it is only called once when the application starts up.¬† Unfortunately, a developer will occasionally embed a call to ConfigureAndWatch in the main execution path of the application.¬† In the case of a web application, this results in a new thread being spawned each time the application gets a hit.¬† The threads never die, they just keep going.¬† Because they sleep most of the time, they don’t show up on system load reports and such until the system eventually cannot allocate any more threads.
On our LINUX systems, java threads count as OS processes.¬† As the OS reached its process limit, the application crashed as did the container, and everything else that was owned by the same user.
I caught wind of the thread leak and someone involved mentioned that all the threads were running the FileWatchDog class which is the class used by Log4J to monitor files.¬† We were able to identify the offending app and issue a fix, but this should have never happened.
In general, developers should follow the following rules when developing application code:
- Shared libraries should not run any kind of Log4J initialization code.¬† This should be left to the applications that will use the library.
- If you don’t need the ability to change your logging configuration at run-time, you should allow Log4J to configure itself by simply including a log4j.properties or log4j.xml file in the application classpath.
- Never call ConfigureAndWatch in a way where it can be executed more than once during an application’s life.
- Use a static block in the application’s main class initialize logging
- In a Web Application, use a ServletContextListener to initialize logging