I'm considering using parts of a GNU Affero General Public License v3 (GNU AGPL v3) licensed library in a commercial context, but we obviously can't afford to release our whole code-base.
We would only need the library for a crawler-like step in the backend. If we use the library in the crawler to generate data and push it into a database, and have our web-app hosted by another program/process entirely that reads from the database, are we legally required to release any of our source code? The web-facing ("distributed") code would not call the AGPL-v3 licensed code, only use it's generated output.
Any insight is greatly appreciated!
Best Answer
In this case, the difficult legal test is deciding whether your data-consuming application can legally be considered "combined with" your AGPL data-producing application. The FSF proposes that two programs must communicate "at arms length" to be considered separate works. (The tricky bit about this rule is that no one is entirely sure how long programs' arms are.)
First, we can rule out any AGPL requirements being transfered via data output, by consulting the GPL FAQ. There is a question that directly addresses this (from the other side -- trying and failing to make GPL rules apply to generated data):
Thus, as long as the AGPL crawler doesn't include any GPL-licensed data in its output (and I see no reason whatsoever why a Web crawler would do so), then the generated data itself caries no burden of AGPL responsibility.
Still, it could be the case that the data-consuming code might be legally considered part of the same program as the AGPL data-producing code, in which case the AGPL would apply to the entire work. Without seeing your architecture (and possibly without the final ruling a judge), it's not possible for me to say whether the two components can be called one work, but the information you've presented strongly suggests the two components appear to be separate works, legally speaking. Again, from the FAQ:
It seems that writing to and reading from a database falls in the same broad category as "pipes, sockets and command-line arguments." Your data-consuming code could run without the data-producing code running on the same computer at all: you could have one system generate a database, and then pass the database to another system that consumes it. From this point of view, it seems clear to me (through not a surefire legal certainty) that the programs are separate works, and a user who interacts with your data-consuming Web process is not also interacting with your data-producing AGPL code.