C Header File Creation – How to Generate from a Dynamic Library

headerslibraries

Suppose I have a compiled dynamic library: .dll, .lib, .so etc. Is it (theoretically) possible to automatically create C header file for such a library? Is there an existing tool that does that?

Intuitively it looks to me like it should be possible. After all, the linker is able to find the necessary symbols inside the dynamic library and resolve those symbols at runtime. But still, some information may be missing. If so, which one? Argument types? The return type? I know that when a C++ library is compiled without the "extern" flag, with the information about the types being embedded into the name. Would this kind of library be "reverse-engineerable" ?

Update. Thanks for all the responses — it seems like there is a consensus that it is generally NOT possible, unless one is willing to try really hard (I guess by examining the assembly and seeing how many parameters are being popped off from stack) OR the library is compiled in the debug mode.

The purpose of this question is neither to obfuscate my own library, nor to decompile an existing one. Rather, it is a theoretical question: is such action possible for a generic library? The reason for my curiosity is that I'm trying to understand the legal implications of having a library licensed under GPL while its header files licensed under LGPL.

Best Answer

In general, it won't be possible (at least not with ELF files on Linux). Because type and signature information is not kept (e.g. in ELF symbol files). But C++ compilers are doing name mangling to encode some type information in their ELF symbol name. However, C compilers don't do that. And C++ name mangling doesn't tell enough (e.g. it would tell that the first argument of some function is a Foo* pointer, but it won't describe the fields inside class Foo).

For example, you can't even (reliably) know how many arguments a given function (notably a C one) is expecting, and even more their type. And some functions don't have externally visible names (e.g.static functions, but read also about visibility function attribute on Linux). Read more about ABIs (e.g. here for Linux on PCs) and calling conventions.

However, if the code has been compiled (using -g) with debug information in DWARF, it could be possible. Read also about the strip command.

And if you have additional a priori information (for example, knowing that the given library is distributed by Debian) it probably should be possible. Some projects (perhaps FOSSology, but I could be wrong) simply guessed free software libraries by comparing their constant literal strings against a previously built database of them.

BTW, what you are looking at is more or less called a decompiler and the process would be decompilation. Read also about obfuscation.

With a lot of efforts and resources (e.g. what the NSA would be capable of) many things could be in practice possible, but difficult and costly.

Related Topic