DEV Community

Cover image for Extract Data From Websites - Javascript Data Processing
Ramiro - Ramgen
Ramiro - Ramgen

Posted on • Updated on

Extract Data From Websites - Javascript Data Processing

I think this is a great video for people starting with javascript to learn more about array manipulation and also data extraction.

We are going to use the browser inspector to extract data and put it into useful formats like JSON or CSV files.

Check out the video for this post:

If you like follow for more and consider subscribing to the YT channel ramgendeploy 😁

So what you'll learn:

  • Document Element Selection
  • Data Processing with Javascript, array methods
    • map
    • reducer
    • filter
  • Javascript optional chaining example

So let's go over some snippets:

First if you are using chrome, when you select an element, you can then reference that element in the console tab with $0 this is useful to see the childrens and extract a "route" to the data you want.

Now the main thing to select elements is:

let selEle = document.querySelectorAll('selector')
Enter fullscreen mode Exit fullscreen mode

Selector can be:

  • Element name
  • class
  • id
  • css syntax like: .container > .btn

I don't know if there is more but those are the most useful.

Now after you select all the elements that you want, you going to have a nodelist, but to use array methods you need to convert this to an array.

So there is a bunch of ways to convert this to an array but here we are going to use some the spread operator to create a new array from our nodelist.

let selEleArray = [...selEle]
Enter fullscreen mode Exit fullscreen mode

With that now we can use the array methods and process our data:

let parsedData = selEleArray.map(
  (item)=>[item.children[0].innerText,item.src, item.innerHTML]
)
Enter fullscreen mode Exit fullscreen mode

Now Having this object with an array of arrays, is not enough to do console.log(parsedDate) to be able to copy the data and have it elsewhere, sometimes the browser says nope I won't display 1500 lines.

To solve this we are going to call our friend JSON, and we are going to convert the object to a string using stringify

JSON.stringify(parsedData)
Enter fullscreen mode Exit fullscreen mode

You don't need to actually log here, the inspector does it implicitly.

Now with our object as a JSON string, we can grab this and use it elsewhere that supports JSON.

But what if you want a CSV file, well .reduce to the rescue.
We are going to grab that array and reduce it to a single string with a csv format.

let data_cvs = parsedData.reduce(
(accumulator,current)=>{
  return accumulator+`\n${item[0]},${item[1]},${item[2]}`
},
  'header_1,header_2,header_3')
Enter fullscreen mode Exit fullscreen mode

So reduce needs two parameters, a reduce function that will run with each item in the array and a starter value, in this case our started value is the headers of the csv file.

Then in each iteration, we add to the string a return escape and our comma separated values 😂 in order, notice that we use the `` quotes to have variable interpretation inside the string.

And that's about it, in the video i go more in depth and with a couple more examples but this is the core concept, hope you like it drop a like.

Follow for more and consider subscribing to the YT channel ramgendeploy 😁

Discussion (0)