What are the characteristics for a good report generation software for reporting and tracking software benchmarking results

information-presentationperformancereport generationreporting

This is an offshoot question from this answer to a previous question, where http://speed.pypy.org is highlighted as an example having a good presentation.

(However, it appears to me that the project doesn't separate the components of execution/tracking/report-generation/web-service, which makes it harder for other people to adopt.)

I am interested in both the functional and the UI requirements of such software. I hope to be able to choose an existing one based on the criteria so that I can use it in my project.

Right now, the only thing I can think of is that the Execution UI should be similar to a Unit Testing harness, but the Reporting UI should be totally different from the xUnit family of software. Webpages seem to be a better way to navigate through the results.

And, along with some minor ideas:

  • There should be a tracking component to track performance changes at all levels
  • However, the presentation layer should highlight only "relevant" performance changes, that is, performance drops in important areas that are serious enough to require developers' attention.

I am also interested in whether any of the advices from Edward Tufte can be applied here.

Best Answer

You might find this so question about (macro-)benchmarking tools for java interesting (even if you use a different language):

  1. for runtime measurements, there are a lot of technical aspects to consider, mainly optimizations;
  2. for runtime measurements, the statistics are very important, but very few tools implement them;
  3. there are a lot of monitoring tools that offer a unit testing or logging like harness, and a standardized reporting interface (JMX) - just like you thought of;
  4. Point 2 and 3 are somewhat contradictory: the execution has to know about the reporting to do the statistics right, i.e. to be able to decide how often some code should be executed and measured to get a high enough statistical confidence;
  5. a good tool that is pretty language independent is Auto-pilot.
Related Topic