As promised in my previous post, I decided to share here my experience of using Haskell for some mundane practical tasks.
If you haven’t read my previous post where I share my perception of revisiting Haskell after a while, here it is:
Revisiting Haskell after 10 years
Hercules Lemke Merscher ・ Jan 15
I decided to rewrite a tiny command-line application I wrote in Go a while back in Haskell.
Why? I thought it would be a good exercise. Plus, the conciseness of Haskell is appealing to me.
The img2cbr is a small program that converts a directory containing images to a compacted file with the cbr extension—cbr is an extension for comic books. It’s a handy tool that helps transform scanned digitized comic books to a cbr file, which can then be converted to other formats later, such as PDF or ePUB, or simply read it using apps such as Calibre.
The Go version
The Go version can be seen here: https://gitlab.com/bitmaybewise/img2cbr/-/blob/f007fd5cc787019710aff60d9d9e5fb67b8b7410/main.go. It is a tiny and easy-to-grasp program in a single file with 130 lines.
I wrote the initial prototype in a shell script but decided to rewrite it in Go because it was easy to cross-compile as a single fat binary to multiple platforms, and Go is a great language to interface with the system.
The new Haskell version
Versions I used:
$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 9.6.3
$ cabal --version
cabal-install version 3.10.1.0
compiled using version 3.10.1.0 of the Cabal library
The img2cbr.cabal (without the comments):
cabal-version: 3.4
name: img2cbr
version: 0.3.0.0
license: BSD-3-Clause
license-file: LICENSE
author: bitmaybewise
category: CLI
build-type: Simple
extra-doc-files: CHANGELOG.md
common warnings
ghc-options: -Wall -O1
executable img2cbr
import: warnings
main-is: Main.hs
build-depends:
base ^>=4.18.1.0
, optparse-applicative
, process
, text
, directory
, filepath
hs-source-dirs: app
default-language: GHC2021
Here’s app/Main.hs:
{-# LANGUAGE OverloadedRecordDot #-}
module Main where
import Control.Concurrent
import Control.Monad
import Data.Text qualified as T
import Options.Applicative
import System.Directory (createDirectoryIfMissing, doesFileExist)
import System.FilePath (takeDirectory)
import System.Process (readProcess)
import Text.Printf (printf)
data Opts = Opts
{ origin :: String,
destination :: String,
depth :: Int,
pool :: Int,
verbose :: Bool
}
deriving (Show)
data WorkerDirectories = WorkerDirectories {pending :: MVar Int, total :: Int, channel :: Chan String}
optsParser :: Parser Opts
optsParser =
Opts
<$> strOption (long "origin" <> short 'i' <> help "directory of origin")
<*> strOption (long "destination" <> short 'o' <> help "directory of destination")
<*> option auto (long "depth" <> short 'd' <> help "directory depth" <> value 1)
<*> option auto (long "pool" <> short 'p' <> help "number of parallel convertions" <> value 1)
<*> switch (long "verbose" <> short 'v' <> help "verbose output")
opts :: ParserInfo Opts
opts = info (optsParser <**> helper) (header "img2cbr - converts a folder containing images to a cbr file")
findDirectories :: Opts -> IO [String]
findDirectories options = do
output <- readProcess "find" [origin options, "-type", "d", "-mindepth", show $ depth options, "-maxdepth", show $ depth options] []
pure $ lines output
img2cbr :: String -> Opts -> IO ()
img2cbr dir options = do
let cbr = T.replace (T.pack options.origin) (T.pack options.destination) (T.pack $ dir ++ ".cbr")
exists <- doesFileExist $ T.unpack cbr
if exists
then when (verbose options) $ do
putStrLn $ "File already exists, skipping -- " <> T.unpack cbr
else do
createDirectoryIfMissing True (takeDirectory . T.unpack $ cbr)
when (verbose options) $ do
putStrLn $ "packaging -- " <> T.unpack cbr
void $ readProcess "zip" ["-r", T.unpack cbr, dir] []
printProgress :: Int -> Int -> IO ()
printProgress pending total = do
let current = total - pending
currentProgress = current * 100 `div` total
printf "(%d / %d) %d%s\n" current total currentProgress "%"
runWorker :: WorkerDirectories -> Opts -> MVar () -> IO ()
runWorker dirs options await = do
totalPending <- readMVar dirs.pending
-- checking total pending before readChan, otherwise it will block when empty
if totalPending == 0 then takeMVar await else runWorker' totalPending
where
runWorker' totalPending = do
dir <- readChan dirs.channel
void $ swapMVar dirs.pending (totalPending - 1)
img2cbr dir options
printProgress (totalPending - 1) dirs.total
runWorker dirs options await
main :: IO ()
main = do
options <- execParser opts
dirs <- findDirectories options
let total = length dirs
mDirsTotal <- newMVar total
dirsChannel <- newChan
writeList2Chan dirsChannel dirs
let wDirs = WorkerDirectories {pending = mDirsTotal, total = total, channel = dirsChannel}
awaiting <- replicateM options.pool $ do
await <- newMVar ()
void . forkIO $ void (runWorker wDirs options await)
pure $ \() -> putMVar await ()
mapM_ (\wait -> wait ()) awaiting
The whole program can be seen on the repository in my GitLab profile. I also keep a mirror on GitHub.
My conclusions
The whole main file is now 86 lines. Way less than the 130 lines of Go. An expected outcome, I’d say. Haskell is far more concise than Go.
I’m using the optparse-applicative package to parse the command-line parameters. It has some extras but it does pretty much what can be done using the flag package built-in in the Go standard library. Again, conciseness is the key advantage here, as both do practically the same work.
The findDirectories
function is way more pleasant to the eyes now. Beating a dead horse again here about conciseness. In terms of being practical, both Haskell and Go are good for system programming, thus spawning a process to run another CLI command in the shell is easy regardless.
As Haskell has no strings with support for UTF-8 and lacks some basic functions for text replacement, in the img2cbr
function I had to use the text package to do this kind of work, while in Go we have strings with batteries included. The text package is almost omnipresent in Haskell projects—a small inconvenience only, as we need to add one more package to the cabal configuration file.
In Go, I simulated multiple workers using channels. In Haskell, we have a similar abstraction for that called Chan. It has some caveats regarding race conditions and deadlocks, but it is enough for what is needed here. I had to pair it with an MVar (a mutable reference) to control the total of pending values read from the Chan by the multiple workers, otherwise reading an empty Chan would raise an exception. I believe that, eventually, the combination of the use of Chan and MVar to control how many values had been consumed could lead to a race condition when updating the MVar or trying to read the Chan when it is already empty but in practice, I did not face this scenario when using it, so I’m abstaining myself of introducing atomic locks and increase complexity, worst case I re-run the program. An MVar is also used as a mechanism to wait for the worker to finish its processing. I could’ve used the async package but for this small program, it’s like killing an ant with a bomb. Overall, I think the code is easier to understand than the workers implemented in Go.
GHC has some nice features to optimize the binary being generated but I wish it could be so simple to cross-compile binaries like Go does. I had no time yet to do my homework on that, but the closest I could get from a quick search on the internet was this article. I wonder if languages such as OCaml would have tooling similar to Go in terms of cross-compilation. Please let me know in the comments if you know anything.
Wrapping up, I felt much more productive writing the code in Haskell, mainly because of its terse syntax, while in comparison with Go, everything is more verbose. I wish cross-compilation could be easier.
If you liked this post, consider subscribing to my newsletter Bit Maybe Wise.
You can also follow me on X and Mastodon.
Photo by MARK ADRIANE on Unsplash
Top comments (2)
Well, I think it's quite the opposite: the
async
allows us to write the same logic in a very clean, simple and obvious way, without the need to juggleMVars
and think about synchronization much. I'd say it's likeText
but for concurrency: you add it to your project once, and it solves pretty much all your problems in a straightforward way.Also, I think ghc.gitlab.haskell.org/ghc/doc/use... would make working with
Opts
a bit simpler and cleaner...In general, for this "shell-like" programming style that mostly relies on running external programs, I'd suggest using hackage.haskell.org/package/turtle or a similar library. 😃
I still need to play with Record Wildcards, you just gave me an incentive to go check it out :D
Since this is a personal project without a big relevance, I tried to use the least number of external dependencies possible to see how far I'd come in the process. For serious projects, I'd consider safer and more mature libraries for sure.
Turtle looks cool. I didn't know it. Thanks for the recommendations. :)