For years I've wanted to have code I've written myself that can I look at when I need to write some projects from scratch. As I was working on jox, I noticed that the graph solving algorithm it needs would be an excellent program to write in different languages. In this blog post I document the different things I learned as I wrote the various implementations.
Background
I started programming at the age of 10 and this year I'm turning 35 and thus this is my 25th year of programming. As a teenager I wrote code for myself. Over the years I collected a vast codebase that I could reuse as I started something new.
This changed in 2008 as I started working on proprietary code for a company. Since 2012 I've worked as a full-time employee for various companies. Especially as I've switched from one company to another, I've often longed for the old code that I've written and that is now inaccessible to me.
Some years ago I got the idea of creating a set of code skeletons for various languages. While I had a clear need for it, the idea didn't sit too well with me. I had a clear idea that I want the skeletons to include at least build scripts and unit testing. However, the idea rubbed me in the wrong way as writing a silly program that does not solve any real world problem might be far too simple as reference.
As a side note, for many years now I've disliked how programming languages are often introduced by showing a simple hello world program. It shows some basic syntax of a language, but syntax isn't really the interesting part. Things like idioms and paradigms, metaprogramming support, error signalling and unit testing are what I care about.
As I was working on the design of the job dependency resolver algorithm in jox
, I realized it was an excellent choice for creating sample code. I need unit tests for verification the test works. The algorithm is a graph algorithm. The graph can have loops, which typically complicates things somewhat. Furthermore, the algorithm would need to be able to detect faulty input, report the error and exit gracefully.
I wanted to write the algorithm in python, because python is often useful for many things and I've written a bunch of python already. I wanted to include an implementation in javascript, because it's a popular choice of language in many places. Furthermore, I used jest
in a project and felt it was a rather nice unit testing library and I wanted to play around with it in my spare time.
I wanted to add an implementation in C++, because it has for many years been my favorite langauge. Also, as I've written many builds in CMake, I wanted to try out using Autotools. You can find my blog post detailing what I learned about Autoconf here: My Autoconf Primer.
Furthermore, Golang is currently a somewhat popular choice. Moreover, the list of successful projects written in Golang is impressive: Docker, Kubernetes and Terraform, just to name a few that I've used extensively myself, which are open-source.
Last but not least, I wanted to write an implementation in Scheme, because I've lately become interested in declarative programming and lambda calculus. Thus, I've been practicing Scheme.
To reiterate, the languages I wrote the algorithm in are:
- python
- javascript
- C++
- Golang
- Scheme
As I wanted to test all the languages with the same tests, I created a unified test vector set using JSON. Now, if I want to modify test data, I need to only update JSON files and tests for all languages are updated synchronously.
Nowadays GitHub provides native test pipelines on their site called GitHub Action workflows. As I've written GitHub Actions previously, I decided to use them here as well. Therefore, all compilable code is compiled in GitHub workflows and all tests are run in GitHub workflows. Obviously, you can run the tests locally, too.
The repository for this project can be found here: https://github.com/keelefi/jox-algorithm
Note: At the time of writing this post, the Scheme implementation is still work-in-progress. Also, I might later add more languages. However, more importantly, I plan to update the code samples if I learn I'm using bad or outdated patterns etc.
In this section I presented the goals of the exercise. Furthermore, I described the choice of programming languages and skimmed through some related info. In the following sections I describe the various things I learned while doing this exercise, one per section, respectively.
JSON data should be formatted as objects
As described in the background section, I used JSON files to store test data. I had input test vectors under "input"
and output test vectors under "output"
, respectively. However, under both of them, I put more keys, creating a hierarchy I expected would mimic the code I want to have. In other words, if I had input vectors "jobA"
and "jobB"
, the JSON file would look something like:
{
"input": {
"jobA": { ... },
"jobB": { ... }
},
...
}
This is exactly how I ended up writing the code. However, there are JSON parsing libraries where the library automatically creates objects for you depending on the JSON data. These libraries expect lists of objects, instead of keys to objects as I used. Namely, I should've formatted the example JSON data as follows:
{
"input": [
{ "name": "jobA", ... },
{ "name": "jobB", ... }
],
...
}
In this scenario I would've been able to fully use the automatically generated objects the various libraries provide.
Build setups are hard in every language
Already as a teenager I learned that setting up the build and dependencies in a new project can be cumbersome. At the time, I was only interested in creating builds on my own machine. As a I grew older and more professional, I got more interested in build reproducibility and having the build setup work on any machine. The increase in complexity as you want to have reproducible builds cannot be overstated. It is hard, and it is hard in every language.
I started my exercise writing the python build. I've used pip
, virtualenv
and pyenv
and I was expecting this would be easy. I spent lots of time doing research online and ultimately I came to the conclusion it seems very hard to create a fully isolated build environment in python. Ultimately I gave up and just wrote a README.md
containing instructions to install a few dependencies using pipenv
. As this exercise is rather simple, this build is unlikely to fail, but then again, I failed to create solid reference material here.
With javascript I used npm
. npm
is very easy to setup. You use npm init
to create a package.json
where you populate dependencies and scripts for the build. In my case the only dependency was jest
, which I use for unit testing. However, even though npm
was very simple to use in this case, there are severe problems w.r.t. reproducibility and security when using npm
. Packages can disappear and malicious packages can be uploaded to the package repository. Hence, like the solution I used for building python, the solution for javascript is not optimal, either.
With C++ I used Autoconf. As mention in the background section, I wrote a separate blog post on this experience. Autoconf was confusing to learn and I cannot recommend it to any proprietary inhouse projects where the build environment and target system are well-known. However, for open-source software where the integration is done by the user (or software distributor, such as a linux distribution like Debian or Fedora), Autoconf is very powerful.
Golang has it's own compiler which also ships the unit test executor. Golang was the easiest build setup in my scenario. I wrote a minimal go.mod
containing one external dependency. The go.mod
also contains the compiler version that is needed for compilation. Furthermore, running go
will create a go.sum
that, according to best practices, is uploaded in the git repository. This practice creates some extra commits in the repository as the dependencies are upgraded to newer versions. For this to work efficiently, you want good test coverage so that you can just bump versions as they're released. While I acknowledge this is opinionated, I think it's a good thing. Then again, I don't think this works nearly as well in FOSS scenarios, but I think the authors of Golang never had FOSS in mind. In fact, as far as I know, Golang compiles all code into a single binary and uses no dynamically linkable libraries.
Lastly I had Scheme. There's a multitude of implementations of Scheme interpreters. I use GNU Guile myself. I worry that my code would not work as well with other Scheme interpreters. Furthermore, Scheme is extended with SRFI's - Scheme Requests For Implementation. The SRFI's are numbered proposals for the various Scheme implementations to implement. In practice, your interpreter might ship with the SRFI you are looking for. If you're looking for a less common SRFI, you may have to find one from a separate source. But not all dependencies are SRFI's. In my case I wanted to use guile-json
and this alone likely breaks the code on all other Scheme interpreters.
The Scheme SRFI system is actually very nice on a theoretical level. Anyone can create an SRFI, there's a clear way to improve the SRFI and after it's finalized, all the various interpreters can implement the SRFI as they see fit. This is extremely decentralized which is great. However, I see problems in the distribution of the SRFI implementations. If the interpreter doesn't ship the SRFI you are looking for, where do you get the SRFI? I can get it in many different ways, but there is no standardized way and that can be a headache for build reproducibility.
GitHub Action workflows are good and bad
I've worked with many CI systems over the years and I have to admit that GitHub Action workflows are by far the simplest and fastest to setup a simple build and test pipeline in. Of course, if you don't know the syntax like I do, it can take some time to setup a pipeline. But for me it took very minimal time to setup building and testing for all five languages.
The builds are not that straightforward, either. In the C++ build I use the distribution package manager extensively to install packages. In the Scheme build I clone the guile-json
git repository. In all builds I use language specific tools for compiling (where applicable) and running the tests.
However, while it's easy to setup the steps to build and run tests, it's dramatically less simple to visualize the results. I would like to have a status matrix showing for each language the state of build and tests, respectively. In other words, for example, I would like to see at a glance if the C++ build works and if the C++ tests work, respectively. Even after spending considerable amount of time on this, I found no feasible way to achieve this. Instead, each workflow shows either "pass" or "failure", consolidating all steps into a single result.
For what it's worth, this is not unlike my previous experiences with GitHub Action workflows. When using Jenkins, it is very easy to get advanced build and test status visualization. For instance, you can show how many tests pass out of tests total. As far as I know, this is currently not possible to achieve in GitHub Actions. As such, I think GitHub Action workflows are a good fit for small FOSS projects like this exercise, but for serious software development you want something more advanced, even though the initial setup is more involved.
Python is great for algorithm prototyping and design
Already before I went to University, I used to sketch on paper (or in my head) the algorithms I needed to implement. At the University, this practice was reenforced and I liked to do UML's and whatnot. However, as I wrote increasingly complex algorithms, this practice didn't seem as feasible. I would end up with the best solution faster by just writing code and iterating the code.
Not all languages are alike in this regard. As I worked on this exercise, it became more apparent to me how different python and C++ are in this regard. I like C++ in many ways, but in C++ you need to write a considerable amount of boilerplate and making just trivial changes to data structures typically requires a magnitude more changes compared to similar changes in python.
There are many reasons for this discrepancy. One of the reasons is that the syntax of C++ is more expressive. In C++, even though we are working with a high-level language, we know exactly if we will get a hashmap (e.g. unordered_map
), red-black tree (e.g. map
or set
) etc. In python you don't need to worry about this, which can be inefficient in production, but great for sketching.
Indeed, sketching out the algorithm design in python turned out very productive for me. As I hit a dead end, it didn't take much to make suitable changes and to get up to speed again. As such, python may be a good choice for prototyping in many scenarios.
Metaprogramming is awesome
Not only was it easy to sketch the algorithm proper in python, but also many things related to testing were easy in python and javascript as they have very extensive metaprogramming support. By comparison, the metaprogramming capabilities of C++ are less, but still rather powerful. In my experience, this is not the case in Golang, and it's painful.
For example, in all three of python, javascript and C++ I could have separate test cases in their own, dynamically generated, functions. In Golang I found no option for dynamic functions for every test data file and instead I had to create one function that reads the test vector files and then runs all of the tests. For what it's worth, this is also what I ended up doing in Scheme, but I'd say that is more because I don't know Scheme that well.
Metaprogramming is such a powerful tool that I plan on writing a separate blog post on it. I think that metaprogramming capabilities are a crucial feature of a programming language. With metaprogramming you can work around many otherwise nasty problems you are facing.
Exceptions are nice, but not necessary
All four other languages I used in this exercise support exceptions, but Golang does not. Instead, in Golang you return errors. This is similar to how you propagate errors in C. I know that many C++ programmers shy away from exceptions, but I've always liked them in all languages.
Indeed, I feel that also in this exercise the solutions in python, javascript and C++ were nicer than the the Golang solution due to the use of exceptions. When the algorithm faces an unrecoverable error scenario, it can just throw an exception. In contrast, with Golang I needed to add checks for function calls to see if an error occured.
This is obviously opinionated, and proponents of the Golang design will tell you that it's great that the caller of a function make it explicit that the function can return an error. While I tend to appreciate some explicitness, over the years I've learned to like it less.
That being said, the Golang solution for this exercise did not turn out as nasty as I was worried it would. Checking the errors in Golang doesn't clutter the code nearly as much I was expecting. Personally, I prefer exceptions, but I can live with the error propagating system present in Golang.
You are most productive in the language you know
This is an obvious point to make, but this was so emphasized during my exercise that I want to mention it. Even if you know the language syntax, if you are not familiar with idioms and patterns of a language, it will take a lot more time to write the code. I was, and still am, most familiar with python, javascript and C++. As I was writing code in these languages, I focused almost exclusively on the algorithm and the libraries I was using.
Compared to the three previously mentioned languages, with Golang I have far less mileage. For example, I wrote some structs, then later I had to rewrite them as I realized the idiomatic way to solve things in Golang are different. Golang is not a complicated language, but reaching the full solution required a lot of rewriting as I learned my earlier choices were bad. I also expect I will have to improve the Golang code in the future more than the three other implementations.
Scheme is the one I have the least mileage with. In fact, even though I spent more time on Scheme than any other language, I didn't get even close to the full solution. However, I don't think Scheme is harder in itself. I think Scheme is magnitudes simpler than C++. The thing is that I'm used to imperative programming and Scheme is based on lambda calculus, which is fundamentally different. The way I see it, I have over 20 years worth of experience writing imperative code and less than 1 year worth of experience writing lambda calculus. Therefore, I'm currently very slow in writing Scheme.
Popular languages have an advantage in stack overflow questions and answers
Writing python and javascript was fast not only because I knew the languages, but also because I could ask google any question that popped in my head and a related stack overflow question and answer would be available, respectively. The answers are not poor quality, either, but rather, they often explain the background and guide you to idiomatic solutions and solid patterns, which I find very nice.
As I moved on to write the Scheme solution, this became very clear. Many questions I wanted to ask had not been asked anywhere. I had to do a lot more work to find the relevant information I was looking for. This slowed me down considerably. However, more importantly, I'm also less confident my solutions and the patterns I'm using in Scheme are good. Frankly, I feel they aren't.
For what it is worth, as I was stuck with the Scheme implementation, I even ended up creating a stack overflow account so that I could ask a question myself. It was a nice experience: I asked my question on a Sunday morning and a few hours later it was already answered. The answer was off the topic, but I asked clarifying questions and the same guy ultimately answered my original question. I hope my question will be useful to someone else, too!
I hope to learn more Scheme and lambda calculus
Even though I didn't finish the Scheme implementation, it was fun and exciting to write it. Pretty much everything is different in Scheme compared to the mostly imperative languages I'm used to. While it was rather easy to write the algorithm in python (and then copy the solution to javascript, C++ and Golang), I'm not too happy with the result. As I have extensive testing, I know the python solution works correctly. But it's not beautiful, and, in fact, it might not be optimal, either.
Over this blog post, I've described the other languages as imperative. However, the division between imperative and declarative is not clearcut. For example, C was clearly designed as an imperative language. It was meant to be an abstraction layer for assembly code. The original use case was to compile C to different assembly languages.
While the original use case still stands (even though most code now exclusively runs on x86), processors are now far more complex and compilers are way more powerful at optimizing. Instead of executing the C program the programmer wrote, the compiler and processor will run the equivalent, but optimized, program. This is especially important when considering multi-threaded programs running on multicore systems. For more information on the topic, I suggest the interested reader watch the talk Atomic weapons by Herb Sutter.
I think that Scheme and lambda calculus has still a lot to give to me and the programming industry in general. In fact, I feel like over the decades we are slowly moving away from the computation model Alan Turing envisioned and toward lambda calculus, as described by Alonzo Church. For what it's worth, Turing machines and lambda calculus are equivalent.
As I've understood, many hope that functional programming would make simultaneous multiprocessing more feasible. While I personally don't see the connection that strong, I think that writing declarative code is more productive and meaningful compared to the imperative counterpart. After all, at the end of the day, we have the computer make mathematical calculations, and we don't care what middle steps it does, as long as it ultimately returns the solution to our question. This is fundamentally declarative, not imperative.
Future work
Obviously, as the Scheme implementation is still work-in-progress, I want to learn more Scheme and I have fun writing it, I hope to finish the Scheme implementation. Currently, I'm stuck working on a depth-first-search algorith in Scheme.
As I mentioned earlier in the blog post, I'm planning on writing a blog post about metaprogramming. I think it would be fun and useful. Similarly, I might write something about the differences between imperative and declarative programing, as I'm currently quite interested in the topic.
Also, I hope to keep updating the jox-algorithm
repository relevant to this blog post. I hope to update the build setups as I learn more about reproducible builds, as I think that would be very useful. I might also change from GitHub Action workflows to something else. Also, as I learn better programming patterns, I hope to fix any bad or outdated patterns I'm currently using.
I don't expect anyone to make pull requests, but I'm open to pull requests in case someone finds this exercise useful as reference material when starting new projects in various languages.
Conclusion
In this blog post I describe my hobby project I've been sporadically working on for a few months or so. I wrote the same code in five different languages: python, javascript, C++, Golang and Scheme. In this blog post I describe various things I learned while doing the exercise. If you found this blog post useful or have opinions or questions about my points, feel free to comment below. I also accept pull request to the repository if you find it useful as reference material.
Top comments (0)