DEV Community

Discussion on: Create a Simple Web Scraper in C#

Collapse
 
matthewzar profile image
Matthew F. • Edited

I found this useful, but I admit to getting a bit stuck around connecting each of your steps together. To help other's in the future, here's a Gist that links everything together.

I admit it's output isn't as neat as yours, so I have a mistake somewhere... but it's a start. One quick note: it's WPF rather than WinForms, so take that into consideration for all UI-interactions.

Collapse
 
aaroncarrick profile image
Aaron L Carrick

Follow the link Mathew F. linked to, but edit these lines and everything will work!

Reguarding:
gist.github.com/CodeCommissions/43...

Edit:

        foreach (var term in QueryTerms)
        {
            articleLink = document.All.Where(x =>
                x.ClassName == "views-field views-field-nothing" &&
                (x.ParentElement.InnerHtml.Contains(term) || x.ParentElement.InnerHtml.Contains(term.ToLower())));

            //Overwriting articleLink above means we have to print it's result for all QueryTerms
            //Appending to a pre-declared IEnumerable (like a List), could mean taking this out of the main loop.
            if (articleLink.Any())
            {
                PrintResults(articleLink);
            }
        }

TO THIS:

        foreach (var term in QueryTerms)
        {
            articleLink = document.All.Where(x =>
                x.ClassName == "views-field views-field-nothing" &&
                (x.ParentElement.InnerHtml.Contains(term) || x.ParentElement.InnerHtml.Contains(term.ToLower()))).Skip(1);

            //Overwriting articleLink above means we have to print it's result for all QueryTerms
            //Appending to a pre-declared IEnumerable (like a List), could mean taking this out of the main loop.
            if (articleLink.Any())
            {
                PrintResults(articleLink);
            }
        }

Take note of the:

.Skip(1)

The reason it was ugly is because the first element in the IEnumerable was not filtered properly so instead of spending lots of time filtering through that mess we simply skip the first element :)

Collapse
 
matthewzar profile image
Matthew F.

Thanks for the fix ^_^
I've updated the Gist to include your suggestion.