So this project started with a need - or, not really a need, but an annoyance I realized would be a good opportunity to strengthen my Haskell, even if the solution probably wasn't worth it in the end.
There's a blog I follow (Fake Nous) that uses Wordpress, meaning its comment section mechanics and account system are as convoluted and nightmarish as Haskell's package management. In particular I wanted to see if I could do away with relying on kludgy Wordpress notifications that only seem to work occasionally and write a web scraper that'd fetch the page, find the recent comments element and see if a new comment had been posted.
I've done the brunt of the job now - I wrote a Haskell script that outputs the "Name on Post" string of the most recent comment. And I thought it'd be interesting to compare the Haskell solution to Python and Go solutions.
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE TupleSections #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE MultiWayIf #-}
{-# LANGUAGE ViewPatterns #-}
import Network.HTTP.Req
import qualified Text.HTML.DOM as DOM
import qualified Text.XML.Cursor as Cursor
import qualified Text.XML.Selector as Selector
import qualified Data.XML.Types as Types
import qualified Text.XML as XML
import Data.Text (Text, unpack)
import Control.Monad
main = do
resp <- runReq defaultHttpConfig $ req GET (https "fakenous.net") NoReqBody lbsResponse mempty
let dom = Cursor.fromDocument $ DOM.parseLBS $ responseBody resp
recentComments = XML.toXMLNode $ Cursor.node $ head $ Selector.query "#recentcomments" $ dom
newest = head $ Types.nodeChildren recentComments
putStrLn $ getCommentText newest
getCommentText commentElem =
let children = Types.nodeChildren commentElem
in foldl (++) "" $ unwrap <$> children
unwrap :: Types.Node -> String
unwrap (Types.NodeContent (Types.ContentText s)) = unpack s
unwrap e = unwrap $ head $ Types.nodeChildren e
My Haskell clocs in at 25 lines, although if you remove unused language extensions, it comes down to 21 (The other four in there just because they're "go to" extensions for me). So 21 is a fairer count. If you don't count imports as lines of code, it can be 13.
Writing this was actually not terribly difficult; of the 5 or so hours I probably put into it in the end, 90% of that time was spent struggling with package management (the worst aspect of Haskell). In the end I finally resorted to Stack even though this is a single-file script that should be able to compile with just ghc
.
I'm proud of my work though, and thought it reflected fairly well on a language to do this so concisely. My enthusiasm dropped a bit when I wrote a Python solution:
import requests
from bs4 import BeautifulSoup
file = requests.get("https://fakenous.net").text
dom = BeautifulSoup(file, features='html.parser')
recentcomments = dom.find(id = 'recentcomments')
print(''.join(list(recentcomments.children)[0].strings))
6 lines to Haskell's 21, or 4 to 13. Damn. I'm becoming more and more convinced nothing will ever displace my love for Python.
Course you can attribute some of Haskell's relative size to having an inferior library, but still.
Here's a Go solution:
package main
import (
"fmt"
"net/http"
"github.com/ericchiang/css"
"golang.org/x/net/html"
)
func main() {
var resp, err = http.Get("https://fakenous.net")
must(err)
defer resp.Body.Close()
tree, err := html.Parse(resp.Body)
must(err)
sel, err := css.Compile("#recentcomments > *:first-child")
must(err)
// It will only match one element.
for _, elem := range sel.Select(tree) {
var name = elem.FirstChild
var on = name.NextSibling
fmt.Printf("%s%s%s\n", unwrap(name), unwrap(on), unwrap(on.NextSibling))
}
}
func unwrap(node *html.Node) string {
if node.Type == html.TextNode {
return node.Data
}
return unwrap(node.FirstChild)
}
func must(err error) {
if err != nil {
panic(err)
}
}
32 lines, including imports. So at least Haskell came in shorter than Go. I'm proud of you, Has- oh nevermind, that's not a very high bar to clear.
It would be reasonable to object that the Python solution is so brief because it doesn't need a main function, but in real Python applications you generally still want that. But even if I modify it:
import requests
from bs4 import BeautifulSoup
def main():
file = requests.get("https://fakenous.net").text
dom = BeautifulSoup(file, features='html.parser')
recentcomments = dom.find(id = 'recentcomments')
return ''.join(list(recentcomments.children)[0].strings)
if __name__ == '__main__': main()
It only clocs in at 8 lines, including imports.
An alternate version of the Go solution that doesn't hardcode the number of nodes (since the Python and Haskell ones don't):
package main
import (
"fmt"
"net/http"
"github.com/ericchiang/css"
"golang.org/x/net/html"
)
func main() {
var resp, err = http.Get("https://fakenous.net")
must(err)
defer resp.Body.Close()
tree, err := html.Parse(resp.Body)
must(err)
sel, err := css.Compile("#recentcomments > *:first-child")
must(err)
// It will only match one element.
for _, elem := range sel.Select(tree) {
fmt.Printf("%s\n", textOfNode(elem))
}
}
func textOfNode(node *html.Node) string {
var total string
var elem = node.FirstChild
for elem != nil {
total += unwrap(elem)
elem = elem.NextSibling
}
return total
}
func unwrap(node *html.Node) string {
if node.Type == html.TextNode {
return node.Data
}
return unwrap(node.FirstChild)
}
func must(err error) {
if err != nil {
panic(err)
}
}
Though it ends up being 39 lines.
Maybe Python's lead would decrease if I implemented the second half, having the scripts save the last comment they found in a file, read it on startup, and update if it's different and notify me somehow (email could be an interesting test). I doubt it, but if people like this post I'll finish them.
Edit: I finished them.
Discussion (20)
A Haskell one-liner:
Can you give some context for this? When I plug it in, even with all the imports I used, almost everything in there is undefined.
I'm ready with the full version that does the saving and emailing me for all three languages, but I'm holding off on posting now because I don't want to finalize if the Haskell can be improved by that much.
Apologies, I should have thought of this earlier. Anyway, adding more details:
The dependencies can be put in a
dev-to.cabal
file:Doing the saving and emailing you would be a simpler addition.
You can clone my repo from github.com/smunix/dev-to
Ah. Still, that doesn't seem to be a complete solution. I ran it with
cabal run
and the output is the object:Instead of the text.
I also wouldn't consider that one line. If I were to really use that code, I'd certainly break it into 2-4. Still, it is an impressive improvement! I'll have to look more into those libraries.
One may also argue that your python code uses the beautifulsoup library which has already done the hard work of parsing the html/xml for you!
(though in fairness, I don't know much about haskell or go to comment on how "bare metal" those pieces of code are).
True, beautiful soup seems much high-level than the other libraries. Though I am using at-least two non-standard libraries for all languages (Go has great high-level HTTP in the stdlib but needed 2 HTML/traversal libraries just to get there, for Haskell I'm using 5 libraries: req, html-conduit, dom-selector, xml-conduit and xml-types (might be a way to cut down on those but I really couldn't find it cause some of those libraries are just like 'provides HTML helpers for XML types' or something)).
I would recommend Colly (github.com/gocolly/colly) to get a better comparison since you are using BeautifulSoup for Python. Both scraper libraries have superb APIs.
Wow! I didn't know about that library. That does much more for me here than even BeautifulSoup! New&Improved Go version:
That gets it down to about same number of "meaningful" lines as Python. Technically can drop 2 more lines by putting the function inline, but I wouldn't do that IRL.
Great post, haven't tried Haskell yet, looks interesting. Can you do a performance test on each version? The LOC is surely a factor but knowing the performance would be even better.
Python seems to average about 1.6 seconds. The first run was 3 seconds which is probs because of filesystem caching or TLS resumption. Go is averaging about 1.25 and Haskell about 1.35.
I don't think performance really means much here though, because on such a short program, factors like the time to start the interpreter and parse source code, write to the console, etc, are much more significant than they should be. The Haskell binary dynamically loads 11 system libraries while the Go binary only loads 2 dynamically, and that might account for the speed difference there. I've heard dynamic linking increases startup costs.
Compiling the go binary with cgo disabled will reduce startup time even more (zero dynamic libraries), but you're right in that it's a terrible measure of the general strength of a programming language/runtime.
As a FaaS function, the startup/first request time is important. As a long-running service, that time doesn't matter.
LOC is an even worse measurement (worst? possibly) since I think we've all seen some of the abominable obfuscated C or golfed python solutions.
Putting all of the language battles aside, I'd say the python/bs4 and go/colly solutions are the most easily understandable and implementation-ready options. That says way more about the libraries and their API designs than the programming languages in which they're implemented.
Another point for API design over implementation language.
I know LOC as a blind measurement is problematic but I didn't golf these implementations. I believe LOC of an idiomatic implementation is a reasonable measure. Still flawed of course, but so is every metric.
I appreciate this post, but I still tend to agree with @Providence Salumu that Haskell can also do this in one or two lines depending on what package you might use.
Now, it's a good thing they don't give downvotes here because I'm about to anger many people. I'm not going to comment on Google Go, but while it is true Python has a lot of neat packages written for it, Python is (In My Opinion) a scripting language that is "broken by design"...
No matter how hard you try, Python will never be able to do certain things that Haskell can do (like purity). On the other hand, Haskell can likely be modified to do just about anything Python can.
I continue to be frustrated by coworkers that insist Haskell is difficult to learn and is obscure just because Python got better marketing over the course of the last 20 or so years. Python was adopted by the masses (In My Opinion) because people were convinced that it was a "good" language when in reality, it wasn't all that great. Yes, Python is easy to learn, but that doesn't make it a good language...
Lol, nothing to fear from me. Spicy opinions are fun. I've got a few myself that I haven't posted here mainly for fear of the reception. I'm gonna have to disagree with this one though...
I think this is really moot if not backward. Functional purity (and really all language features) isn't a goal, but a tool for reaching our goals, so it's wrong to describe it as something "Python will never be able to do no matter how hard you try". Lacking that feature doesn't reduce the domain of problems Python can solve. And it surely has features Haskell can never replicate, like
breakpoint
, default arguments, or proper struct inheritance.Not to mention, isn't it a tiny minority of languages that support language-enforced functional purity? Even among other compiled languages?
There are use cases Python can't do that Haskell can, like compiling to a shared library to be called from another language. But that applies to all scripting languages. Do you think all scripting languages are broken by design?
To be fair, if you do, I don't find that totally unreasonable. I prefer compiled languages and abhor not having type checking. My opinion of Python is more that it has bad core design in a couple areas, but is so much more practical than languages that try to be perfect and fail. Basically "the best that the wrong way of doing things can provide", while most other languages are "the worst that the right way of doing things can provide".
About that... something I started thinking recently was that Haskell is the only languages I know that tries to be perfect. The others don't seem like their designers ever intended to make something that would revolutionize programming. Go is the epitome of this, as its designers have said something like "Go isn't meant to advance programming theory, it's meant to advance programming practice" (translation: we don't want a good design we just want to special case the common stuff).
It's true that being easy to learn doesn't make it a good language, but I think it does count toward it. After all, tools exist to make work more efficient, so a tool that takes more time to learn is, all other things the same, a worse tool. And there is no way Haskell's learning curve is all due to inadequate tutorials and documentation. It's conceptually arcane.
I don't necessarily agree with you totally, but your goal of objectivity is at least refreshing so I liked your post. Thanks.
I was expecting performance write up.
Lol
Yes, I would tend to think "reasonably well written" Haskell would outperform Python, but I think this is too small of an example to really make a determination. I can't comment on Google Go. By "reasonably well written" I mean Haskell that is written without too many "rookie" mistakes like I might make. While Haskell (in general) likely has good performance, I think newbies such as myself can sometimes wind up doing things that are logically correct but inappropriate from an efficiency perspective.
Me too...
Would love to see more posts like this!