open source distributed computing and the bindings problem

Having been programming in the middleware/distributed servers side since the mid nineties , the increase in the sophistication ,power and feature breadth of network frameworks in the last five years is fairly astounding from the standpoint of the independent software engineer or small business. Software resources such as transaction process monitors , real time publish and subscribe and graph databases have been made widely available through open source and cloud markets. This also applies to once obscure API’s (obscure to those without big iron to play with and test on, that is) such as parallel processing. Unfortunately this has opened up a problem domain as well: language bindings to the relevant API’s and interfaces of these highly sophisticated and complex resources.

I decided to do some thinking about this after researching several server side applications I wanted to use for my imap implementation over dds: Tika and openNLP.Both of these are offered by the Apache foundation. I eliminated openNLP immediately since it seems to be dead, but Tika looked interesting as solving a problem I am facing in processing and sourcing email attachments for searching and Bayesian operations. The problem for both was that they offered Java only API facilities.

I have been encountering this frequently lately. I develop primarily in C++ and Python and am surprised at the number of large open source implementations that do not offer bindings for either. Relating to this , I remembered reading about why the c++ bindings were deprecated in the open source MPI offering. The author’s post and the comments pretty much say it all. The deprecation of the c++ bindings is here clearly laid out and understandable: the maintainers did not see any productive advantage to continuing what was essentially a one to one mapping between C and C++ functions. And even if they had , its not as if they would have been stampeded by volunteers to do the dirty work. I myself have recently come across something tangential to this working with Apache Kafka. The official C/C++ interface library to Kafka is maintained by Magnus Edenhill at kafka  .  As someone who has worked the the C,C++ and Python Kafka bindings , Mr.Edenhill has done a most best job. A caveat however about the library is relevant to the MPI happenings-I find myself using the C bindings because: a) the c++ source seems to be one to one with c , not giving you  a whole lot of object oriented goodness , 2) the quality of the c++ work is inferior to the c( the c++ objects are almost impossible to use virtually or inherit from. This is no knock against the maintainer , just a fact of someone who has actually used both. I recently have been using kafka on android and have had to fall back to the c kafka library which works like a charm through a threadbare wrapper , because most of the Python clients(I am using pure python on ARM , no java) are broken or don’t work properly on android(sigh , fodder for another post). Short skinny: Mr.Edenhill only has so much time-he does his best and thank you sir. We can conclude that there are good reasons for not binding to C++.

A final segue on this point: the base implementation of the two libraries mentioned is C. No c++ is not as significant as if the base was written in Java for example. The moniker “C++”  actually  represents a four headed hydra: a C runtime , language facilities and preprocessor;  a C++ language facilities and object model; the Standard Template Library; C++ meta-programming(template) facilities. So it could be credibly argued that a c based impl has implemented a C++ binding-just incompletely. I have never encountered anything written in C that could not be made to work in a C++ application , although the amount of self-inflicted brain damage varies from hot to cold. So now the question: what does a C++ programmer do with say , Tika- Java based , no C++ binding? Unfortunately the prognosis is not good.