Is using data generated using AGPL-v3 licensed code considered distribution

agplcommerciallicensing

I'm considering using parts of a GNU Affero General Public License v3 (GNU AGPL v3) licensed library in a commercial context, but we obviously can't afford to release our whole code-base.

We would only need the library for a crawler-like step in the backend. If we use the library in the crawler to generate data and push it into a database, and have our web-app hosted by another program/process entirely that reads from the database, are we legally required to release any of our source code? The web-facing ("distributed") code would not call the AGPL-v3 licensed code, only use it's generated output.

Any insight is greatly appreciated!

Best Answer

In this case, the difficult legal test is deciding whether your data-consuming application can legally be considered "combined with" your AGPL data-producing application. The FSF proposes that two programs must communicate "at arms length" to be considered separate works. (The tricky bit about this rule is that no one is entirely sure how long programs' arms are.)

First, we can rule out any AGPL requirements being transfered via data output, by consulting the GPL FAQ. There is a question that directly addresses this (from the other side -- trying and failing to make GPL rules apply to generated data):

Is there some way that I can GPL the output people get from use of my program? For example, if my program is used to develop hardware designs, can I require that these designs must be free?

In general this is legally impossible; copyright law does not give you any say in the use of the output people make from their data using your program. If the user uses your program to enter or convert his own data, the copyright on the output belongs to him, not you. More generally, when a program translates its input into some other form, the copyright status of the output inherits that of the input it was generated from.

So the only way you have a say in the use of the output is if substantial parts of the output are copied (more or less) from text in your program...

Thus, as long as the AGPL crawler doesn't include any GPL-licensed data in its output (and I see no reason whatsoever why a Web crawler would do so), then the generated data itself caries no burden of AGPL responsibility.

Still, it could be the case that the data-consuming code might be legally considered part of the same program as the AGPL data-producing code, in which case the AGPL would apply to the entire work. Without seeing your architecture (and possibly without the final ruling a judge), it's not possible for me to say whether the two components can be called one work, but the information you've presented strongly suggests the two components appear to be separate works, legally speaking. Again, from the FAQ:

Where's the line between two separate programs, and one program with two parts? This is a legal question, which ultimately judges will decide. We believe that a proper criterion depends both on the mechanism of communication (exec, pipes, rpc, function calls within a shared address space, etc.) and the semantics of the communication (what kinds of information are interchanged).

If the modules are included in the same executable file, they are definitely combined in one program. If modules are designed to run linked together in a shared address space, that almost surely means combining them into one program.

By contrast, pipes, sockets and command-line arguments are communication mechanisms normally used between two separate programs. So when they are used for communication, the modules normally are separate programs. But if the semantics of the communication are intimate enough, exchanging complex internal data structures, that too could be a basis to consider the two parts as combined into a larger program.

It seems that writing to and reading from a database falls in the same broad category as "pipes, sockets and command-line arguments." Your data-consuming code could run without the data-producing code running on the same computer at all: you could have one system generate a database, and then pass the database to another system that consumes it. From this point of view, it seems clear to me (through not a surefire legal certainty) that the programs are separate works, and a user who interacts with your data-consuming Web process is not also interacting with your data-producing AGPL code.

Related Topic