Include GPLv2 licensed data in MIT licensed project

licensing

I'd like to include some data from a GPLv2 licensed project in my MIT licensed project.

More specifically, I want to use the data from the other project as the training data for my machine learning algorithm and I'd also like to include the trained model in my project.

I don't want to include the whole project source code, just those data files. I will not modify them. I also want to have the trained model in my project which I think is derived work?

Can I create a folder for those data files, add a copy of the GPLv2 license, make it clear that my project is MIT licensed apart from that folder which contains GPLv2 licensed files?

Does the trained model also have to be released under GPLv2? If so, can I also keep it in that folder?

Best Answer

As I understand it, the GPL files serve as input to your software. In that case, your software can not be considered to be derived from the GPL files and is thus not affected by the copyleft nature of the GPL license.

The output of your program (when taking those GPL files as input) is derived from the GPL files and thus also bound by the GPL license.

This is under the assumption that the model for your algorithm will only be loaded into the algorithm at runtime and thus can be treated as data (for example, you can provide the algorithm with a different model without needing access to the source code). If this assumption is incorrect and the file containing the model is an integral part of your algorithm, then the GPL requires that you make the entire project available under that license.

As for distribution, you get the clearest situation if you split the distribution into two parts: one part with the MIT-licensed project and a second part with the GPL-licensed data and derived model. Then you can distribute both parts each with their appropriate license without possibly incurring confusion which parts are under which license.