DEV Community

Cover image for Check links with HTTP::Simple
Tib
Tib

Posted on

Check links with HTTP::Simple

This is the sequel of Check links programmatically (with Perl)

This time, I use HTTP::Simple
Dancing

Rewrite with HTTP::Simple

I restarted from my previous simple code that was using <> (read file on command line or get input from |, split it by carriage return).

It's a loop, that get a link and prints "✓" or "✗" depending the status of the link.

HTTP::Simple provides almost the same API than LWP::Simple with slightly different behaviour and some extends.

So except the use HTTP::Simple and the usage of a surrounding eval, this is the same code.

(if something is unclear in the following code, check previous blog post)

#!/usr/bin/env perl

use open ':std', ':encoding(UTF-8)';
use Term::ANSIColor;
use HTTP::Simple;

$| = 1;

while(<>) {
    chomp;
    my $link = $_;
    print "Checking [$link]...";
    eval { get($link) };
    if($@) {
            print color('red') . " \x{2717}\n" . color('reset');
    } else {
            print color('green') . " \x{2713}\n" . color('reset');
    }
}
Enter fullscreen mode Exit fullscreen mode

Exceptions and status

Interesting enough, with HTTP::Simple, some functions are throwing exceptions. Read carefully the doc since some are throwing exceptions for connections and HTTP errors (like get) when others only for connections errors (like getprint)

Actually with this get method I can't retrieve easily the status. I could have printed the exception, but it also gives the line number where the exception occured and it is not very pretty.

I can get the status from getprint or getstore. Since I don't need to print nor store anything, it's a bit overkill, but I can probably handle this 😃. So I tried to use getstore($link, "/dev/null") but it was not well accepted 😁

My trick is then to use getprint and trash output using select. Here is the idea:

my $TRASH;
open($TRASH, '>', '/dev/null');

# Later in code
select $TRASH;
print "Trashed\n";
select STDOUT;
Enter fullscreen mode Exit fullscreen mode

Final code

Below are all the final code snippets.

The links are stored in a file, one per line:

http://cpantesters.org
https://img.shields.io/badge/Language-Perl-blue
https://www.perltutorial.org/
http://stratopan.com
https://www.perl.org/
Enter fullscreen mode Exit fullscreen mode

Beside I have my checklinks.pl:

#!/usr/bin/env perl

use open ':std', ':encoding(UTF-8)';
use Term::ANSIColor;
use HTTP::Simple;

my $TRASH;
open($TRASH, '>', '/dev/null');

$| = 1;

while(<>) {
    chomp;
    my $link = $_;
    print "Checking [$link]...";

    select $TRASH;
    my $status = getprint($link);
    select STDOUT;
    if(! is_success($status)) {
            print color('red') . " \x{2717} --> $status\n" . color('reset');
    } else {
            print color('green') . " \x{2713}\n" . color('reset');
    }
}
Enter fullscreen mode Exit fullscreen mode

Then I execute my link checker through a pipe:

$ cat links.txt | perl checklinks.pl
Enter fullscreen mode Exit fullscreen mode

And I get my pretty links checker report:
Check

Top comments (1)

Collapse
 
grinnz profile image
Dan Book

I have always felt the exception handling was a bit inconsistent. But glad the inconsistency was actually useful for you since the getprint handling was what you needed. You could also maybe getstore to a File::Temp temporary file path so you don't have to affect the global selected output handle.

The reason getprint and getstore can return the HTTP status is because they don't need to return the content they retrieved. Trying to have get return anything more complicated strays a bit too far from "Simple". There's always the option to just use HTTP::Tiny yourself, the source of get and head are pretty small and should point you in the right direction.