Understand foreign function interface (FFI) and language binding

bindingffilanguage-agnosticprogramming-languages

Mixing different programming languages has long been something I don't quite understand. According to this Wikipedia article, a foreign function interface (or FFI) can be done in several ways:

  1. Requiring that guest-language functions which are to be host-language callable be specified or implemented in a particular way; often using a compatibility library of some sort.
  2. Use of a tool to automatically "wrap" guest-language functions with appropriate glue code, which performs any necessary translation.
  3. Use of wrapper libraries
  4. Restricting the set of host language capabilities which can be used cross-language. For example, C++ functions called from C may not (in general) include reference parameters or throw exceptions.

My questions:

  1. What are the differences between the
    1st, 2nd and 3rd ways? It seems to
    me they are all to compile the code of
    the called language into some
    library with object files and header
    files, which are then called by the
    calling language.
  2. One source it links says,
    implementing an FFI can be done in
    several ways:

    • Requiring the called functions in the target language implement a
      specific protocol.
    • Implementing a wrapper library that takes a given low-language
      function, and "wraps" it with code to do data conversion to/from the
      high-level language conventions.
    • Requiring functions declared native to use a subset of high-level functionality (which is compatible with the low-level language).

    I was wondering if the first way in
    the linked source is the same as the
    first way in Wikipedia?

    What does the third way in this
    source mean? Does it corresponds to the 4th way in Wikipedia?

  3. In the same source, when comparing the three ways it lists, it seems to say
    the job of filling the gap between
    the two languages is gradually
    shifted from the called language
    to the calling language. I was
    wondering how to understand that? Is this shifting also true for the four ways in Wikipedia?
  4. Are Language binding and FFI
    equivalent concepts? How are they
    related and differ?

    a binding from a programming language
    to a library or OS service is an API
    providing that service in the
    language.

  5. I was wondering which way in the quotation from Wikipedia or from the source each of the following examples belongs to?

Thanks for your enlightenment! Best regards!

Best Answer

May be a specific example will help. Let us take the host language as Python and the guest language as C. This means that Python will be calling C functions.

  1. The first option is to write the C library in a particular way. In the case of Python the standard way would be to have the C function written with a first parameter of Py_Object * among other conditions. For example (from here):

    static PyObject *
    spam_system(PyObject *self, PyObject *args)
    {
        const char *command;
        int sts;
    
        if (!PyArg_ParseTuple(args, "s", &command))
            return NULL;
        sts = system(command);
        return Py_BuildValue("i", sts);
    }
    

    is a C function callable from Python. For this to work the library has to be written with Python compatibility in mind.

  2. If you want to use an already existing C library, you need another option. One is to have a tool that generates wraps this existing library in a format suitable for consumption by the host language. Take Swig which can be used to tie many languages. Given an existing C library you can use swig to effectively generate C code that calls your existing library while conforming to Python conventions. See the example for building a Python module.

  3. Another option to us an already existing C library is to call it from a Python library that effectively wraps the calls at run time, like ctypes. While in option 2 compilation was necessary, it is not this time.

Another thing is that there are a lot of options (which do overlap) for calling functions in one language from another language. There are FFIs (equivalent to language bindings as far as I know) which usually refer to calling between multiple languages in the same process (as part of the same executable, so to speak), and there are interprocess communication means (local and network). Things like CORBA and Web Services (SOAP or REST) and and COM+ and remote procedure calls in general are of the second category and are not seen as FFI. In fact, they mostly don't prescribe any particular language to be used at either side of the communication. I would loosely put them as IPC (interprocess communication) options, though this is simplification in the case of network based APi like CORBA and SOAP.

Having a go at your list, I would venture the following opinions:

  • Common Object Request Broker Architecture: IPC, not FFI
  • Calling C in C++, by the extern "C" declaration in C++ to disable name mangling. ****
  • Calling C in Matlab, by MATLAB Interface to Shared Libraries Option 3 (ctypes-like)
  • Calling C in Matlab, by Creating C/C++ Language MEX-Files Option 2 (swig-like)
  • Calling Matlab in C, by mcc compiler Option 2 (swig-like)
  • Calling C++ in Java, by JNI, and Calling Java in C++ by JNI Option 3 (ctypes-like)
  • Calling C/C++ in other languages, Using SWIG Option 2 (swig)
  • Calling C in Python, by Ctypes Option 3 (ctypes)
  • Cython Option 2 (swig-like)
  • Calling R in Python, by RPy Option 3 (ctypes-like) in part, and partly about data exchange (not FFI)

The next two are not foreign function interfaces at all, as the term is used. FFi is about the interaction between to programming languages and should be capable of making any library (with suitable restrictions) from one language available to the other. A particular library being accessible from one language does not an FFI make.

  • Programming Language Bindings to OpenGL from various languages
  • Bindings for a C library from various languages