For my graduate thesis, I had a similar challenge and spent some time reflecting about what it was I was trying to accomplish.
Putting a non-trivial amount of code in printed form brings up several challenges.
Code is best examined and manipulated in electronic form, so how do you provide that maneuverability within the printed page?
Quite a bit of the code is irrelevant to the problem at hand. File IO, memory manipulation, error handling, etc... are all examples of things that have to be in the code but don't support the thesis itself.
You want / need to provide all of the code so a future student can pick up your research and continue the work. In addition, the University expects all of the code so they can demonstrate you actually did the work and validate your results.
My thesis involved taking an existing algorithm, refining it for performance, and then extending the algorithm to a new set of use cases.
Within the body of my thesis, I placed only the relevant portions of the old and new routines side by side in order to provide a measure of comparison. Within the text, I then explained the differences between the functions and the measurable differences that I had found. I would then include appropriate charts / graphs / illustrations along with the explanation to help support the point I was making.
In some cases, I had refactored the existing code into new functions. Sometimes it made sense to include the refactored out sections of code within the discourse and sometimes it didn't.
Think of your thesis as a narrative. Don't include anything that doesn't directly contribute to the point you need to be making within each section. Your thesis will be long enough as it is, and you don't want to overload the reader with unnecessary or irrelevant detail. This aspect was crucial for me as several of my advisers didn't care as much about the code and wanted to focus on the results.
The last aspect to consider is where to include the complete source listing. I placed my source in an appendix at the end of my thesis. I also included some explanatory text on compilation and running the program so others could validate what I had done. Since I had some test sets of data that were necessary to recreate what I had done they were included too in the appendix.
The general approach to measure these figures is:
- Establish a test plan with sufficient coverage.
- Execute the formal test plan (could be automated or manual tests), and register the failed test and if necessary the bug reports issued after root cause analysis.
- Compare the figures with the KLOCs which can be computed automatically from source code.
Needless to say : if you're having a manual "ad-hoc" test approach, you wont get consistent bug numbers: as you've mentioned, many bugs aren't discovered immediately. However formal test plans with unit, integration and acceptance tests are very common for bigger and mission critical software. TDD further emphasises the tests, providing very detailed unit tests that can check and diagnose the promised functionality and all the invariants that your code is supposed to respect.
There's also the question if the results of preventive tests run by a developer before submitting his code for integration are to be counted or not. Same question for issues discovered in peer reviews.
The definition of bugs is also an issue. People overuse this word in common language, and the frontier is not clear: is it a non-compliance of the code with the specification ? or is it also issues caused by unclear requirements ? Here some standards with precise definitions, like ISO 9126, can really help.
Finally, the KLOC is a concept that was introduced in a time where dominant languages were line oriented (e.g. fortran, cobol). So it's really a question nowadays, what should count for a LOC: empty lines ? comment lines ? conditionally compiled lines ? active lines, or active instructions ? etc...
All this being said, you'll have of course variances in your absolute figures that will depend on your precise definitions and methodology. But if you remain consistent, interesting facts may emerge when you look at the evolution of these metrics rather than at the absolute figures.
There are companies that keep statistics on huge number of software, and have developed a predictive model that is used to predict bug rate based on evolution of the metric on the project. They then use this prediction in the decision making about releasing or not to market (I think I've read a paper from HP some years ago, but I couldn't find it back). Such predictions have of course only statistical value: the fact that it's meaningful in general, doesn't avoid that particular project might completely contradict the model.
Personally, I'm not sure that these predicting methods still make sense in an era of agile and TDD, where bugs are prevented in early stages and on the fly. However I have to admit that introducing such structured metrics on subcontracted projects (i.e. sotfware built according to well specified requirements) allowed to quickly control and address reliability issues of some contractors.
Best Answer
That because you're a programmer, you know how to fix [person]'s virus ridden machine.