As an extension of your rule of logging where in the application the log statement came from, you may want to add per module level logging flags. Instead of logging everything, all the time, this allows you to selectively target sections of your application instead. There is overhead in this, and you need to create a facility that allows you to enable / disable that logging. Ideally, you would be able to enable / disable on-the-fly as the application is running.
I'm used to seeing a layer below debug which I term "Trace", but that's not necessarily a universal term. "Trace" level logging tracks as much as you can possibly stand, including module entry / exit, timestamps with entry / exit, and bonus points for capturing passed values. Obviously, that generates a LOT of data and it's not something you turn on willy-nilly. But it has advantages with respect to debugging when you can't attach to the process or you don't have a core dump of the errant application.
I like to see file / module references and timestamps with my log information. It can be handy when trying to hunt down race conditions between threads as well as coordinating the activities of multiple areas of the application. To be fair, I know of some folk who think these details clutter the log file. Adding in the timestamp is something to discuss with the team. (Apologies if log4j already does that.)
If the logging isn't being taken care of by it's own thread / process, that's something to consider as well. Instead of making the application thread wait for the logging to process, the log message gets passed off to the log handler and the application thread goes on its merry way. Alternatively, creating some sort of buffer mechanism to handle the log messages is another way to speed up application responsiveness.
Having controls on the size and history of log files is another feature to consider. You don't want the app blowing out all the disk space on the host system, nor do you necessarily want to keep all log files for all eternity.
an acceptable way it to use the singleton logger which delegates the actual logging to its own thread
you can then use any efficient producer-consumer solution (like a non-blocking linked list based on the atomic CaS) to gather the log messages without worrying that it is an implicit global lock
the log call will then first filter and build the log message and then pass it to the consumer, the consumer will then grab it and write it out (and free the resources of the individual message)
Best Answer
Well I see some possibilities about performances issues there, I will assume the logging framework you use is decent enough to not be the source of the problem.
To resume : with logging you may have trouble with :