DEV Community

Cover image for On rewriting a Go command-line program in Haskell
Hercules Lemke Merscher
Hercules Lemke Merscher

Posted on • Updated on • Originally published at bitmaybewise.substack.com

On rewriting a Go command-line program in Haskell

As promised in my previous post, I decided to share here my experience of using Haskell for some mundane practical tasks.

If you haven’t read my previous post where I share my perception of revisiting Haskell after a while, here it is:

I decided to rewrite a tiny command-line application I wrote in Go a while back in Haskell.

Why? I thought it would be a good exercise. Plus, the conciseness of Haskell is appealing to me.

The img2cbr is a small program that converts a directory containing images to a compacted file with the cbr extension—cbr is an extension for comic books. It’s a handy tool that helps transform scanned digitized comic books to a cbr file, which can then be converted to other formats later, such as PDF or ePUB, or simply read it using apps such as Calibre.

The Go version

The Go version can be seen here: https://gitlab.com/bitmaybewise/img2cbr/-/blob/f007fd5cc787019710aff60d9d9e5fb67b8b7410/main.go. It is a tiny and easy-to-grasp program in a single file with 130 lines.

I wrote the initial prototype in a shell script but decided to rewrite it in Go because it was easy to cross-compile as a single fat binary to multiple platforms, and Go is a great language to interface with the system.

 The new Haskell version

Versions I used:

$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 9.6.3

$ cabal --version
cabal-install version 3.10.1.0
compiled using version 3.10.1.0 of the Cabal library
Enter fullscreen mode Exit fullscreen mode

The img2cbr.cabal (without the comments):

cabal-version:      3.4
name:               img2cbr
version:            0.3.0.0
license:            BSD-3-Clause
license-file:       LICENSE
author:             bitmaybewise
category:           CLI
build-type:         Simple
extra-doc-files:    CHANGELOG.md
common warnings
    ghc-options: -Wall -O1
executable img2cbr
    import:           warnings
    main-is:          Main.hs
    build-depends:
        base ^>=4.18.1.0
        , optparse-applicative
        , process
        , text
        , directory
        , filepath
    hs-source-dirs:   app
    default-language: GHC2021
Enter fullscreen mode Exit fullscreen mode

Here’s app/Main.hs:

{-# LANGUAGE OverloadedRecordDot #-}

module Main where

import Control.Concurrent
import Control.Monad
import Data.Text qualified as T
import Options.Applicative
import System.Directory (createDirectoryIfMissing, doesFileExist)
import System.FilePath (takeDirectory)
import System.Process (readProcess)
import Text.Printf (printf)

data Opts = Opts
  { origin :: String,
    destination :: String,
    depth :: Int,
    pool :: Int,
    verbose :: Bool
  }
  deriving (Show)

data WorkerDirectories = WorkerDirectories {pending :: MVar Int, total :: Int, channel :: Chan String}

optsParser :: Parser Opts
optsParser =
  Opts
    <$> strOption (long "origin" <> short 'i' <> help "directory of origin")
    <*> strOption (long "destination" <> short 'o' <> help "directory of destination")
    <*> option auto (long "depth" <> short 'd' <> help "directory depth" <> value 1)
    <*> option auto (long "pool" <> short 'p' <> help "number of parallel convertions" <> value 1)
    <*> switch (long "verbose" <> short 'v' <> help "verbose output")

opts :: ParserInfo Opts
opts = info (optsParser <**> helper) (header "img2cbr - converts a folder containing images to a cbr file")

findDirectories :: Opts -> IO [String]
findDirectories options = do
  output <- readProcess "find" [origin options, "-type", "d", "-mindepth", show $ depth options, "-maxdepth", show $ depth options] []
  pure $ lines output

img2cbr :: String -> Opts -> IO ()
img2cbr dir options = do
  let cbr = T.replace (T.pack options.origin) (T.pack options.destination) (T.pack $ dir ++ ".cbr")
  exists <- doesFileExist $ T.unpack cbr
  if exists
    then when (verbose options) $ do
      putStrLn $ "File already exists, skipping -- " <> T.unpack cbr
    else do
      createDirectoryIfMissing True (takeDirectory . T.unpack $ cbr)
      when (verbose options) $ do
        putStrLn $ "packaging -- " <> T.unpack cbr
      void $ readProcess "zip" ["-r", T.unpack cbr, dir] []

printProgress :: Int -> Int -> IO ()
printProgress pending total = do
  let current = total - pending
      currentProgress = current * 100 `div` total
  printf "(%d / %d) %d%s\n" current total currentProgress "%"

runWorker :: WorkerDirectories -> Opts -> MVar () -> IO ()
runWorker dirs options await = do
  totalPending <- readMVar dirs.pending
  -- checking total pending before readChan, otherwise it will block when empty
  if totalPending == 0 then takeMVar await else runWorker' totalPending
  where
    runWorker' totalPending = do
      dir <- readChan dirs.channel
      void $ swapMVar dirs.pending (totalPending - 1)
      img2cbr dir options
      printProgress (totalPending - 1) dirs.total
      runWorker dirs options await

main :: IO ()
main = do
  options <- execParser opts
  dirs <- findDirectories options
  let total = length dirs
  mDirsTotal <- newMVar total
  dirsChannel <- newChan
  writeList2Chan dirsChannel dirs
  let wDirs = WorkerDirectories {pending = mDirsTotal, total = total, channel = dirsChannel}
  awaiting <- replicateM options.pool $ do
    await <- newMVar ()
    void . forkIO $ void (runWorker wDirs options await)
    pure $ \() -> putMVar await ()
  mapM_ (\wait -> wait ()) awaiting
Enter fullscreen mode Exit fullscreen mode

The whole program can be seen on the repository in my GitLab profile. I also keep a mirror on GitHub.

My conclusions

The whole main file is now 86 lines. Way less than the 130 lines of Go. An expected outcome, I’d say. Haskell is far more concise than Go.

I’m using the optparse-applicative package to parse the command-line parameters. It has some extras but it does pretty much what can be done using the flag package built-in in the Go standard library. Again, conciseness is the key advantage here, as both do practically the same work.

The findDirectories function is way more pleasant to the eyes now. Beating a dead horse again here about conciseness. In terms of being practical, both Haskell and Go are good for system programming, thus spawning a process to run another CLI command in the shell is easy regardless.

As Haskell has no strings with support for UTF-8 and lacks some basic functions for text replacement, in the img2cbr function I had to use the text package to do this kind of work, while in Go we have strings with batteries included. The text package is almost omnipresent in Haskell projects—a small inconvenience only, as we need to add one more package to the cabal configuration file.

In Go, I simulated multiple workers using channels. In Haskell, we have a similar abstraction for that called Chan. It has some caveats regarding race conditions and deadlocks, but it is enough for what is needed here. I had to pair it with an MVar (a mutable reference) to control the total of pending values read from the Chan by the multiple workers, otherwise reading an empty Chan would raise an exception. I believe that, eventually, the combination of the use of Chan and MVar to control how many values had been consumed could lead to a race condition when updating the MVar or trying to read the Chan when it is already empty but in practice, I did not face this scenario when using it, so I’m abstaining myself of introducing atomic locks and increase complexity, worst case I re-run the program. An MVar is also used as a mechanism to wait for the worker to finish its processing. I could’ve used the async package but for this small program, it’s like killing an ant with a bomb. Overall, I think the code is easier to understand than the workers implemented in Go.

GHC has some nice features to optimize the binary being generated but I wish it could be so simple to cross-compile binaries like Go does. I had no time yet to do my homework on that, but the closest I could get from a quick search on the internet was this article. I wonder if languages such as OCaml would have tooling similar to Go in terms of cross-compilation. Please let me know in the comments if you know anything.

Wrapping up, I felt much more productive writing the code in Haskell, mainly because of its terse syntax, while in comparison with Go, everything is more verbose. I wish cross-compilation could be easier.


If you liked this post, consider subscribing to my newsletter Bit Maybe Wise.

You can also follow me on X and Mastodon.


Photo by MARK ADRIANE on Unsplash

Top comments (2)

Collapse
 
gabrielfallen profile image
Alexander Chichigin

I could’ve used the async package but for this small program, it’s like killing an ant with a bomb.

Well, I think it's quite the opposite: the async allows us to write the same logic in a very clean, simple and obvious way, without the need to juggle MVars and think about synchronization much. I'd say it's like Text but for concurrency: you add it to your project once, and it solves pretty much all your problems in a straightforward way.

Also, I think ghc.gitlab.haskell.org/ghc/doc/use... would make working with Opts a bit simpler and cleaner...

In general, for this "shell-like" programming style that mostly relies on running external programs, I'd suggest using hackage.haskell.org/package/turtle or a similar library. 😃

Collapse
 
bitmaybewise profile image
Hercules Lemke Merscher

I still need to play with Record Wildcards, you just gave me an incentive to go check it out :D

Since this is a personal project without a big relevance, I tried to use the least number of external dependencies possible to see how far I'd come in the process. For serious projects, I'd consider safer and more mature libraries for sure.

Turtle looks cool. I didn't know it. Thanks for the recommendations. :)