loading...

Learning Haskell with ShellCheck

aibhstin profile image Aibhstin ・6 min read

One of the best ways of learning more about a language and its ecosystem is by looking at open-source projects written in the language. A good example to use for Haskell is the ShellCheck project, which is both a library and an executable. The ShellCheck repository can be found here. ShellCheck, essentially, finds bugs in shell scripts and reports them to the user.

I'm going to start by cloning the repository so I can look at it on my machine:

~ git clone https://github.com/koalaman/shellcheck
Cloning into 'shellcheck'...
remote: Enumerating objects: 28, done.
remote: Counting objects: 100% (28/28), done.
remote: Compressing objects: 100% (14/14), done.
remote: Total 6725 (delta 11), reused 21 (delta 9), pack-reused 6697
Receiving objects: 100% (6725/6725), 4.26 MiB | 1.48 MiB/s, done.
Resolving deltas: 100% (4106/4106), done.
~ cd shellcheck
~ shellcheck get:(master) pwd
/home/aibhstin/shellcheck

We can run the tree command to look at the complete directory structure:

~ shellcheck git:(master) tree
.
├── CHANGELOG.md
├── doc
│   ├── emacs-flycheck.png
│   ├── terminal.png
│   └── vim-syntastic.png
├── Dockerfile
├── Dockerfile.multi-arch
├── LICENSE
├── manpage
├── nextnumber
├── quickrun
├── quicktest
├── README.md
├── shellcheck.1.md
├── ShellCheck.cabal
├── shellcheck.hs
├── snap
│   └── snapcraft.yaml
├── src
│   └── ShellCheck
│       ├── Analytics.hs
│       ├── Analyzer.hs
│       ├── AnalyzerLib.hs
│       ├── AST.hs
│       ├── ASTLib.hs
│       ├── Checker.hs
│       ├── Checks
│       │   ├── Commands.hs
│       │   ├── Custom.hs
│       │   └── ShellSupport.hs
│       ├── Data.hs
│       ├── Fixer.hs
│       ├── Formatter
│       │   ├── CheckStyle.hs
│       │   ├── Diff.hs
│       │   ├── Format.hs
│       │   ├── GCC.hs
│       │   ├── JSON1.hs
│       │   ├── JSON.hs
│       │   ├── Quiet.hs
│       │   └── TTY.hs
│       ├── Interface.hs
│       ├── Parser.hs
│       └── Regex.hs
├── stack.yaml
├── striptests
└── test
    ├── buildtest
    ├── check_release
    ├── distrotest
    ├── shellcheck.hs
    └── stacktest

7 directories, 45 files

Let's start by looking into the cabal file, specifically at the executable section:

~ shellcheck git:(master) cat ShellCheck.cabal
executable shellcheck
if impl(ghc < 8.0)
  build-depends:
    semigroups
build-depends:
  aeson,
  array,
  base >= 4 && < 5,
  bytestring,
  containers,
  deepseq >= 1.4.0.0,
  Diff >= 0.2.0,
  directory >= 1.2.3.0,
  mtl >= 2.2.1,
  filepath,
  parsec >= 3.0,
  QuickCheck >= 2.7.4,
  regex-tdfa,
  ShellCheck
main-is: shellcheck.hs

We can see that the main entry point is located in "shellcheck.hs". Let's start by taking a look at this file. First, the imports:

import qualified ShellCheck.Analyzer
import           ShellCheck.Checker
import           ShellCheck.Data
import           ShellCheck.Interface
import           ShellCheck.Regex

import qualified ShellCheck.Formatter.CheckStyle
import           ShellCheck.Formatter.Format
import qualified ShellCheck.Formatter.Diff
import qualified ShellCheck.Formatter.GCC
import qualified ShellCheck.Formatter.JSON
import qualified ShellCheck.Formatter.JSON1
import qualified ShellCheck.Formatter.TTY
import qualified ShellCheck.Formatter.Quiet

import           Control.Exception
import           Control.Monad
import           Control.Monad.Except
import           Data.Bits
import           Data.Char
import           Data.Either
import           Data.Functor
import           Data.IORef
import           Data.List
import qualified Data.Map                        as Map
import           Data.Maybe
import           Data.Monoid
import           Data.Semigroup                  (Semigroup (..))
import           Prelude                         hiding (catch)
import           System.Console.GetOpt
import           System.Directory
import           System.Environment
import           System.Exit
import           System.FilePath
import           System.IO

Even this list of imports offers a lesson for our own Haskell projects: imports should be grouped such that 'local' ones come first, and external ones come second, sorted in alphabetical order. We can also see that Prelude is manually imported in order to hide catch. This list of imports is fairly standard for most Haskell projects.

SIDENOTE: Copy and pasting from the terminal and adding 4 spaces got really annoying, really quickly. I know there's probably some way to instantly add 4 spaces using some kind of keyboard shortcut, but it was far less effort for me to just make a little Haskell program that does it for me, which you can find here. Also, this program took about a minute to write, to do a very simple, repetitive task. This is usually something somebody would use Python for, I chose Haskell because it's just better.

Next we have some declarations of new datatypes and some instances defined for them:

data Flag = Flag String String
data Status =
    NoProblems
    | SomeProblems
    | SupportFailure
    | SyntaxFailure
    | RuntimeException
  deriving (Ord, Eq, Show)

instance Semigroup Status where
    (<>) = max

instance Monoid Status where
    mempty = NoProblems
    mappend = (Data.Semigroup.<>)

Firstly, we see the Flag datatype. This is a simple product datatype consisting of two strings and the data constructor, Flag.

Next we have the Status datatype. This is a sum datatype, with an arity of 5, as it only has 5 data constructors, with each being nullary. Instances of Ord, Eq, and Show are then derived. This results in an ordering:

NoProblems < SomeProblems < SupportFailure < SyntaxFailure < RuntimeException

This property is then used in the instance definition of Semigroup for Status, defining (<>) for Status as the max function.

GHCi> NoProblems <> SomeProblems
SomeProblems
GHCi> SomeProblems <> SupportFailure
SupportFailure
GHCi> RuntimeException <> SyntaxFailure
RuntimeException

In the instance definition of Monoid for Status we can see that mempty is set to be NoProblems, as this is the lowest in the ordering. This way of ordering makes sense intuitively, as you ideally want the more severe errors to 'bubble' to the surface first.

Next, a record datatype called Options is defined, and a value is created with this type:

data Options = Options {
    checkSpec        :: CheckSpec,
    externalSources  :: Bool,
    sourcePaths      :: [FilePath],
    formatterOptions :: FormatterOptions,
    minSeverity      :: Severity
}

defaultOptions = Options {
    checkSpec = emptyCheckSpec,
    externalSources = False,
    sourcePaths = [],
    formatterOptions = newFormatterOptions {
        foColorOption = ColorAuto
    },
    minSeverity = StyleC
}

This is fairly standard, but it hopefully shows how important and how useful Haskell's algebraic datatypes are in modelling problem domains. Algebraic datatypes are one of Haskell's biggest strengths and definitely warrant greater attention.

Next, a simple 'usage header' is declared, as well as a list called options:

usageHeader = "Usage: shellcheck [OPTIONS...] FILES..."
options = [
    Option "a" ["check-sourced"]
        (NoArg $ Flag "sourced" "false") "Include warnings from sourced files",
    Option "C" ["color"]
        (OptArg (maybe (Flag "color" "always") (Flag "color")) "WHEN")
        "Use color (auto, always, never)",
    Option "i" ["include"]
        (ReqArg (Flag "include") "CODE1,CODE2..") "Consider only given types of warnings",
    Option "e" ["exclude"]
        (ReqArg (Flag "exclude") "CODE1,CODE2..") "Exclude types of warnings",
    Option "f" ["format"]
        (ReqArg (Flag "format") "FORMAT") $
        "Output format (" ++ formatList ++ ")",
    Option "" ["list-optional"]
        (NoArg $ Flag "list-optional" "true") "List checks disabled by default",
    Option "" ["norc"]
        (NoArg $ Flag "norc" "true") "Don't look for .shellcheckrc files",
    Option "o" ["enable"]
        (ReqArg (Flag "enable") "check1,check2..")
        "List of optional checks to enable (or 'all')",
    Option "P" ["source-path"]
        (ReqArg (Flag "source-path") "SOURCEPATHS")
        "Specify path when looking for sourced files (\"SCRIPTDIR\" for script's dir)",
    Option "s" ["shell"]
        (ReqArg (Flag "shell") "SHELLNAME")
        "Specify dialect (sh, bash, dash, ksh)",
    Option "S" ["severity"]
        (ReqArg (Flag "severity") "SEVERITY")
        "Minimum severity of errors to consider (error, warning, info, style)",
    Option "V" ["version"]
        (NoArg $ Flag "version" "true") "Print version information",
    Option "W" ["wiki-link-count"]
        (ReqArg (Flag "wiki-link-count") "NUM")
        "The number of wiki links to show, when applicable",
    Option "x" ["external-sources"]
        (NoArg $ Flag "externals" "true") "Allow 'source' outside of FILES",
    Option "" ["help"]
        (NoArg $ Flag "help" "true") "Show this usage summary and exit"
    ]

We see a data constructor, Option, but this is something defined externally to ShellCheck. In fact, this data constructor and the type (OptDescr a) it comes along with are defined in System.Console.GetOpt. The Hackage synopsis of this library is:

This library provides facilities for parsing the command-line options in a standalone program. It is essentially a Haskell port of the GNU getopt library.

We can get more information by opening up a GHCi session, importing System.Console.GetOpt and querying the type:

GHCi> :t Option
Option :: [Char] -> [String] -> ArgDescr a -> String -> OptDescr a
GHCi> :i ArgDescr
data ArgDescr a
  = NoArg a
  | ReqArg (String -> a) String
  | OptArg (Maybe String -> a) String
    -- Defined in ‘System.Console.GetOpt’
instance [safe] Functor ArgDescr
  -- Defined in ‘System.Console.GetOpt’

We can see that the Option data constructor takes in 4 parameters:

  • [Char]: A list of characters representing 'short' flags (Such as -a)
  • [String]: A list of strings representing 'long' flag names (Such as --help)
  • ArgDescr a: ArgDescr is another type defined in System.Console.GetOpt, with 3 possible data constructors: NoArg, ReqArg, and OptArg. Flags such as 'help' take NoArg, as you don't need to pass any information to the help flag. Some flags, like 'Color', take OptArg, as the information you can pass to the flag is optional. The rest require that information be passed to them, and use ReqArg. We can see that both ReqArg and OptArg must take a function that expects a String (Or a Maybe String) and returns a value of type a. In the ShellCheck code, the 'function' used here is actually just an incomplete Flag data constructor.
  • String: The final parameter takes a string description of the option. Such as "Include warnings from sourced files" or "Consider only given types of warnings"

Next, we have these two lines of code:

getUsageInfo = usageInfo usageHeader options

printErr = lift . hPutStrLn stderr

Here, usageInfo is a function imported from System.Console.GetOpt, and has this type signature:

GHCi> :t usageInfo
usageInfo :: String -> [OptDescr a] -> String

So it takes a string and a list of options and returns a string, the usage info. In this case it's using the usageHeader and list options declared previously.

The definition of printErr is a little bit more interesting, being written in point-free style. It is the composition of lift with hPutStrLn, which has been partially applied to stderr. Taking a look at both of these functions:

GHCi> :t lift
lift :: (MonadTrans t, Monad m) => m a -> t m a
GHCi> :t hPutStrLn
hPutStrLn :: Handle -> String -> IO ()
GHCi> :t stderr
stderr :: Handle

A Handle is a datatype used in Haskell to manage IO with system objects, such as files. Altogether, printErr takes a string, applies it to the curried hPutStrLn, then applies that result to lift. To get deeper into what lift is really doing would involve a lengthier discussion of Monad transformers and Monads themselves, and I would much rather devote an entire post just for those topics than attempt to cover them here.

Posted on by:

aibhstin profile

Aibhstin

@aibhstin

I'm an Ethical Hacking & Cybersecurity student and a Haskell programmer.

Discussion

markdown guide