Option 1
While it is tempting to have a set of lightweight APIs that send the normalized data, there are some potential pitfalls with this approach.
First, the lightness of these APIs might end up being paid for with tight coupling between your normalized data model and your APIs. If you want to change your data model or APIs, you will have to change the other or engineer a means of preserving the old approach on the other side of the change. This complicates maintenance.
Second, you are correct to note that there will be lots of HTTP calls from the client to your APIs. Put yourself in the role of the implementor of the client and ask whether you really want to make all those API calls. With proper error checking and exception handling, it gets to be a lot of work.
Option 2
On the other hand, sometimes a "one call does it all" approach is the best one. The "send everything" approach of option 2 can be a lot easier for the client, and due to its denormalized nature, provides a de facto interface between client and server that should be straightforward to maintain as the two develop and evolve separately. The price with this is speed and size. As you note, it could take a while to assemble all that data, then transmit it, especially if only a small portion of it is needed. But don't forget, it will be a lot faster to fetch all the data over the LAN in your data center than for the client to do so over the open internet.
Recommendation
I lean toward option 2, though I propose adding a dose of option 1 to strike a balance between simplicity and performance. If some data takes exceptionally longer than average to assemble, then see if you can leave it out of the main API and put it in a separate one. Remember, the goal is to make it easy on both the server and the client.
Caching Caveats
Since the data is changing every 2-3 days, be careful about caching. Since the data is so big and expensive to assemble, caching it tempting and probably a good idea. However, since it is changing regularly, be sure to refresh the cache and take steps to force the client to fetch new data when it is available. Techniques like cache-busting API parameters, observation times, expiration times, and the like could be of service here.
I don't think you should have syntax highlighting information directly in your text buffer. Instead, I would add additional data structures for the display code.
Here's why:
Once you're providing functionality like selections etc, you'll probably need an anchor concept (a steady pointer to a specific location in the buffer, even when characters are inserted or deleted before that location - see below for implementation idea). If you're dealing with longer texts, you might also need these anchors to provide a fast index into your buffer for line beginnings (since you're using a gap buffer, you need some way to translate line numbers into buffer positions).
What I'm getting at is that you need several supporting data structures besides the actual text buffer in order to provide fast standard editor commands and display the currently visible buffer. Despite your decision to use a gap buffer, editors are line-based and you'll need to support that somehow. So why not write a line-based syntax highlighter which will take the text buffer, an anchor into it (which should ideally be the beginning of a line) and a highlighting state and output a list of ("text fragment", "style information") pairs up to the end of the line, which your display code will use to actually display this line? If it's not fast enough to do on the fly, you can create a cache of these lists indexed by the line number. You could probaby do it for the whole file at once, too, but I suspect that performance will suffer if you do that every time you move around in your file.
You'll need the highlighter state as an input because of multi-line tokens such as long comments or strings. Since usually an editor will display a file from the beginning when it is first opened, you can start with an "all clear" state, call the highlighter code to spit out the first line's highlighting information, and keep the highlighter state at the end of the line (eg, "currently in a multi-line comment" or "all-clear") as the state of the highlighter at the beginning of the next line.
So, basically, I'd suggest implementing anchors and then keeping around an index to quickly translate line numbers into text buffer anchors. Then I'd augment this index to not only provide anchors into the text buffer but also a cache of syntax highlighter output for that line and highlighter state at the the beginning of this line. So if the user is editing a line, you don't have to restart the highlighter at the very top of the buffer; you simply restart highlighting at the beginning of the line; and if you're clever, you can also avoid having to highlight the whole rest of the buffer (I'm thinking you stop highlighting as soon as you reach either the bottom of the visible area or a line with the same state your highlighter is actually in, but flag the following highlighter states as "possibly outdated" if the highlighter state doesn't match the cached one for the next line.
I think you can get away with only highlighting very small sections of your buffer this way, usually just a single line, except for editor commands like "jump to the end of the file", in which case you'll need to syntax-highlight the whole buffer in order to determine the state the highlighter should be in at then end of the file.
Edit: How to implement anchors
To implement anchors, you can use events and listeners. Whenever the buffer gets a command to insert or delete text, it notifies all it's listeners that these events are about to happen. Anchors then subscribe to these events as listeners, and update themselves based on the kind of event. For example, if three characters are inserted before the anchor position, you need to add 3 to the anchor position; if characters are inserted after the anchor position, no action is needed, etc. So, basically, an anchor is an object that keeps track of a buffer offset and subscribes to change events from the gap buffer.
Best Answer
An Inverted Index seems to be what you seek. The basic idea is:
For example, here's the 3 database tables as I would create:
To find all filenames that contain the words "help" and "me":
That help?