The critical difference between the two approaches is, that the one he considers to be the only correct way is imperative and yours is declarative.
Your approach explicitly declares rules, i.e. the rules of the grammar are (almost) directly encoded in your code, and the parser library automatically transforms raw input into parsed output, while taking care of state and other things that are hard to handle. Your code is written within one single layer of abstraction, which coincides with the problem domain: parsing. It's reasonable to assume parsec's correctness, which means the only room for error here is, that your grammar definition is wrong. But then again you have fully qualified rule objects and they are easily tested in isolation. Also it might be worth noting, that mature parser libraries ship with one important feature: error reporting. Decent error recovery when parsing went wrong is not trivial. As proof, I invoke PHP's parse error, unexpected T_PAAMAYIM_NEKUDOTAYIM
:D
His approach manipulates strings, explicitly maintains state and lifts up the raw input manually to parsed input. You have to write everything yourself, including error reporting. And when something goes wrong, you are totally lost.
The irony consist in that the correctness of a parser written with your approach is relatively easily proven. In his case, it is almost impossible.
There are two ways of constructing a software design: One way is to
make it so simple that there are obviously no deficiencies, and the
other way is to make it so complicated that there are no obvious
deficiencies. The first method is far more difficult.
C. A. R. Hoare
Your approach is the simpler one. All it precludes is for him to broaden his horizon a bit. The result of his approach will always be convoluted, no matter how broad your horizon.
To be honest, it sounds to me, that the guy is just an ignorant fool, who is suffering from the blub syndrome, arrogant enough to assume you are wrong and yell at you, if he doesn't understand you.
In the end however, the question is: who is going to have to maintain it? If it's you, then it's your call, no matter what anybody says. If it's going to be him, then there's only two possibilities: Find a way to make him understand the parser library or write an imperative parser for him. I suggest you generate it from your parser structure :D
Your question (as your final paragraph hints) is not really about the lexer, it is about the correct design of the interface between the lexer and the parser. As you might imagine there are many books about the design of lexers and parsers. I happen to like the parser book by Dick Grune, but it may not be a good introductory book. I happen to intensely dislike the C-based book by Appel, because the code is not usefully extensible into your own compiler (because of the memory management issues inherent in the decision to pretend C is like ML). My own introduction was the book by PJ Brown, but it's not a good general introduction (though quite good for interpreters specifically). But back to your question.
The answer is, do as much as you can in the lexer without needing to use forward- or backward-looking constraints.
This means that (depending of course on the details of the language) you should recognise a string as a " character followed by a sequence of not-" and then another " character. Return that to the parser as a single unit. There are several reasons for this, but the important ones are
- This reduces the amount of state the parser needs to maintain, limiting its memory consumption.
- This allows the lexer implementation to concentrate on recognising the fundamental building blocks and frees the parser up to describe how the individual syntactic elements are used to build a program.
Very often parsers can take immediate actions on receiving a token from the lexer. For example, as soon as IDENTIFIER is received, the parser can perform a symbol table lookup to find out if the symbol is already known. If your parser also parses string constants as QUOTE (IDENTIFIER SPACES)* QUOTE you will perform a lot of irrelevant symbol table lookups, or you will end up hoisting the symbol table lookups higher up the parser's tree of syntax elements, because you can only do it at the point you're now sure you are not looking at a string.
To restate what I'm trying to say, but differently, the lexer should be concerned with the spelling of things, and the parser with the structure of things.
You might notice that my description of what a string looks like seems a lot like a regular expression. This is no coincidence. Lexical analysers are frequently implemented in little languages (in the sense of Jon Bentley's excellent Programming Pearls book) which use regular expressions. I'm just used to thinking in terms of regular expressions when recognising text.
Regarding your question about whitespace, recognise it in the lexer. If your language is intended to be pretty free-format, don't return WHITESPACE tokens to the parser, because it will only have to throw them away, so your parser's production rules will be spammed with noise essentially - things to recognise just to throw them away.
As for what that means about how you should handle whitespace when it is syntactically significant, I'm not sure I can make a judgment for you that will really work well without knowing more about your language. My snap judgment is to avoid cases where whitespace is sometimes important and sometimes not, and use some kind of delimiter (like quotes). But, if you can't design the language any which way you prefer, this option may not be available to you.
There are other ways to do design language parsing systems. Certainly there are compiler construction systems that allow you to specify a combined lexer and parser system (I think the Java version of ANTLR does this) but I have never used one.
Last a historical note. Decades ago, it was important for the lexer to do as much as possible before handing over to the parser, because the two programs would not fit in memory at the same time. Doing more in the lexer left more memory available to make the parser smart. I used to use the Whitesmiths C Compiler for a number of years, and if I understand correctly, it would operate in only 64KB of RAM (it was a small-model MS-DOS program) and even so it translated a variant of C that was very very close to ANSI C.
Best Answer
Independent JPEG ground - http://www.ijg.org/ is the authority (reference) code for JPEG decoding and JPEG encoding encapsuled as libjpeg; which is also the most portable and default library for JPEG in most platforms.
You should try to dig into libjpeg for this.
Specifically, there is an ambiguity in the native representation of JPEG's color space. As IJG's documentation describes:
Different applications actually have a different ways to represent this which is actually a problem. See this: http://photo.net/digital-darkroom-forum/00TVcQ
However, it doesn't mean that things are absolutely broken. The information related to color space is usually in the file format, namely JFIF or EXIF.
JFIF
For JFIF (Reference 1: OR Reference 2) - there is a notion of default color space.
From the spec:
The conversion of YCbCr to RGB and vice-versa is also given in the same document. You can also checkout the libjpeg library (IJG) for actual code on this.
EXIF:
In EXIF,
EXIF, allows to specify your custom color table and gamma levels through alternative set of tags including:
TransferFunction
,white point
,PrimaryChromaticities
,ReferenceBlackWhite
andYCbCrCoefficients
.See. Appendix E, "Color Space Guidelines," Reference: http://www.exif.org/Exif2-1.PDF
See also,