DEV Community

Illia Zub for SerpApi

Posted on • Originally published at serpapi.com

How we reverse-engineered Google Maps pagination

In this story, you'll see the process of decoding URL parameters for pagination on Google Maps. It involved deobfuscation of Closure-compiled JavaScript, reverse-engineering of Protobuf data structures, and a bit of math. We tried to decode URL parameters by ourselves, by using pbtk, and attempted to outsource this work. In the end, we succeeded after several pair programming sessions.

How Google Maps pagination works

We can get the link for the next page only by clicking on the β€œnext page” button.

Next page button on Google Maps results

Links look like this.

Page 1

https://www.google.com/search?tbm=map&authuser=0&hl=en&gl=us&pb=!4m8!1m3!1d24182.00605141337!2d-74.0083012!3d40.7455096!3m2!1i1024!2i768!4f13.1!7i20!8i20!10b1!12m18!2m3!5m1!6e2!20e3!6m11!4b1!23b1!26i1!27i1!41i2!45b1!63m0!67b1!73m0!74i150000!89b1!10b1!16b1!19m4!2m3!1i360!2i120!4i8!20m57!2m2!1i203!2i100!3m2!2i4!5b1!6m6!1m2!1i86!2i86!1m2!1i408!2i240!7m42!1m3!1e1!2b0!3e3!1m3!1e2!2b1!3e2!1m3!1e2!2b0!3e3!1m3!1e3!2b0!3e3!1m3!1e8!2b0!3e3!1m3!1e3!2b1!3e2!1m3!1e9!2b1!3e2!1m3!1e10!2b0!3e3!1m3!1e10!2b1!3e2!1m3!1e10!2b0!3e4!2b1!4b1!9b0!22m3!1sjWLeXbmnHIXt-gTm2ouwDg%3A23!2zMWk6Mix0OjEyNjk2LGU6MSxwOmpXTGVYYm1uSElYdC1nVG0yb3V3RGc6MjM!7e81!24m40!1m12!13m6!2b1!3b1!4b1!6i1!8b1!9b1!18m4!3b1!4b1!5b1!6b1!2b1!5m5!2b1!3b1!5b1!6b1!7b1!10m1!8e3!14m1!3b1!17b1!20m2!1e3!1e6!24b1!25b1!26b1!30m1!2b1!36b1!43b1!52b1!55b1!56m2!1b1!3b1!26m4!2m3!1i80!2i92!4i8!30m28!1m6!1m2!1i0!2i0!2m2!1i458!2i768!1m6!1m2!1i974!2i0!2m2!1i1024!2i768!1m6!1m2!1i0!2i0!2m2!1i1024!2i20!1m6!1m2!1i0!2i748!2m2!1i1024!2i768!34m9!3b1!4b1!6b1!8m2!1b1!3b1!9b1!12b1!14b1!37m1!1e81!42b1!46m1!1e9!47m0!49m1!3b1!50m40!1m39!2m7!1u3!4sOpen+now!5e1!9s0ahUKEwj35K6aqYrmAhWFtp4KHWbtAuYQ_KkBCIMHKBc!10m2!3m1!1e1!2m7!1u2!4sTop+rated!5e1!9s0ahUKEwj35K6aqYrmAhWFtp4KHWbtAuYQ_KkBCIQHKBg!10m2!2m1!1e1!2m7!1u1!4sCheap!5e1!9s0ahUKEwj35K6aqYrmAhWFtp4KHWbtAuYQ_KkBCIUHKBk!10m2!1m1!1e1!2m7!1u1!4sUpscale!5e1!9s0ahUKEwj35K6aqYrmAhWFtp4KHWbtAuYQ_KkBCIYHKBo!10m2!1m1!1e2!3m1!1u2!3m1!1u1!3m1!1u3!4BIAE!59BQ2dBd0Fn&q=Coffee&tch=1&ech=1&psi=jWLeXbmnHIXt-gTm2ouwDg.1574855312836.1
Enter fullscreen mode Exit fullscreen mode

Page 2

https://www.google.com/search?tbm=map&authuser=0&hl=en&gl=us&pb=!4m8!1m3!1d24182.00605141337!2d-74.0083012!3d40.7455096!3m2!1i1024!2i768!4f13.1!7i20!8i20!10b1!12m18!2m3!5m1!6e2!20e3!6m11!4b1!23b1!26i1!27i1!41i2!45b1!63m0!67b1!73m0!74i150000!89b1!10b1!16b1!19m4!2m3!1i360!2i120!4i8!20m57!2m2!1i203!2i100!3m2!2i4!5b1!6m6!1m2!1i86!2i86!1m2!1i408!2i240!7m42!1m3!1e1!2b0!3e3!1m3!1e2!2b1!3e2!1m3!1e2!2b0!3e3!1m3!1e3!2b0!3e3!1m3!1e8!2b0!3e3!1m3!1e3!2b1!3e2!1m3!1e9!2b1!3e2!1m3!1e10!2b0!3e3!1m3!1e10!2b1!3e2!1m3!1e10!2b0!3e4!2b1!4b1!9b0!22m3!1sjWLeXbmnHIXt-gTm2ouwDg%3A78!2zMWk6Mix0OjEyNjk2LGU6MSxwOmpXTGVYYm1uSElYdC1nVG0yb3V3RGc6Nzg!7e81!24m40!1m12!13m6!2b1!3b1!4b1!6i1!8b1!9b1!18m4!3b1!4b1!5b1!6b1!2b1!5m5!2b1!3b1!5b1!6b1!7b1!10m1!8e3!14m1!3b1!17b1!20m2!1e3!1e6!24b1!25b1!26b1!30m1!2b1!36b1!43b1!52b1!55b1!56m2!1b1!3b1!26m4!2m3!1i80!2i92!4i8!30m28!1m6!1m2!1i0!2i0!2m2!1i458!2i768!1m6!1m2!1i974!2i0!2m2!1i1024!2i768!1m6!1m2!1i0!2i0!2m2!1i1024!2i20!1m6!1m2!1i0!2i748!2m2!1i1024!2i768!34m9!3b1!4b1!6b1!8m2!1b1!3b1!9b1!12b1!14b1!37m1!1e81!42b1!46m1!1e9!47m0!49m1!3b1!50m40!1m39!2m7!1u3!4sOpen+now!5e1!9s0ahUKEwj35K6aqYrmAhWFtp4KHWbtAuYQ_KkBCIMHKBc!10m2!3m1!1e1!2m7!1u2!4sTop+rated!5e1!9s0ahUKEwj35K6aqYrmAhWFtp4KHWbtAuYQ_KkBCIQHKBg!10m2!2m1!1e1!2m7!1u1!4sCheap!5e1!9s0ahUKEwj35K6aqYrmAhWFtp4KHWbtAuYQ_KkBCIUHKBk!10m2!1m1!1e1!2m7!1u1!4sUpscale!5e1!9s0ahUKEwj35K6aqYrmAhWFtp4KHWbtAuYQ_KkBCIYHKBo!10m2!1m1!1e2!3m1!1u2!3m1!1u1!3m1!1u3!4BIAE!59BQ2dBd0Fn&q=Coffee&tch=1&ech=2&psi=jWLeXbmnHIXt-gTm2ouwDg.1574855312836.1
Enter fullscreen mode Exit fullscreen mode

They lead to a f.txt file that contains the next page results.

The Plan

  1. Find out how the pb (protobuf) string is constructed.
  2. Generate the next page link by setting the required parameters.
  3. Catch and parse the data.

Decoding the URL parameters for Google Maps pagination

Google Maps URLs contain the pb parameter contains string-encoded Protobuf. The format is the same as for the data parameter in the browser URL on Google Maps. It contains !-separated values. There are several answers on StackOverflow, gists on GitHub, some blog posts about decoding, and even a kinda official guide on reverse engineering protobuf, but none of this touches pagination.

We tried to use pbtk but it wasn't able to extract structures and crashed. Several attempts of reading pretty-printed obfuscated JavaScript didn't work.

After pairing with Milos, we found out most of the variables in Google Maps pagination URI: latitude, longitude, altitude_in_feets, pagination_offset, some parameter that is equal to psi but I don't know what it's meaning. psi changes after each page reload and it's in window.APP_OPTIONS[11].

psi parameter in the window.APP_OPTIONS on Google Maps

Another moving part is a list of filters, but we don't know how to parse them.

List of filters in Google Maps URLs for pagination

We understood that we can make the first request for Google Maps, extract variables and construct pagination URI. Like for our API to scrape YouTube.

if offset
  long ||= -73.91476977539236
  lat ||= 40.68525694561602
  alt ||= 120027.44487325678

  offset ||= 40

  psi ||= "b24JYPPGOoaJrwTXlbHACw"

  google_query = "#{query_scheme_and_domain}/search?tbm=map&authuser=0&hl=en&gl=ua&pb=!4m12!1m3!1d#{alt}!2d#{lat}!3d#{long}!2m3!1f0!2f0!3f0!3m2!1i1920!2i549!4f13.1!7i20!8i#{offset}!10b1!12m8!1m1!18b1!2m3!5m1!6e2!20e3!10b1!16b1!19m4!2m3!1i360!2i120!4i8!20m65!2m2!1i203!2i100!3m2!2i4!5b1!6m6!1m2!1i86!2i86!1m2!1i408!2i240!7m50!1m3!1e1!2b0!3e3!1m3!1e2!2b1!3e2!1m3!1e2!2b0!3e3!1m3!1e3!2b0!3e3!1m3!1e8!2b0!3e3!1m3!1e3!2b1!3e2!1m3!1e10!2b0!3e3!1m3!1e10!2b1!3e2!1m3!1e9!2b1!3e2!1m3!1e10!2b0!3e3!1m3!1e10!2b1!3e2!1m3!1e10!2b0!3e4!2b1!4b1!9b0!22m3!1s#{psi}!2s1i%3A2%2Ct%3A12696%2Ce%3A1%2Cp%3A#{psi}%3A1273!7e81!24m56!1m16!13m7!2b1!3b1!4b1!6i1!8b1!9b1!20b0!18m7!3b1!4b1!5b1!6b1!9b1!13b0!14b0!2b1!5m5!2b1!3b1!5b1!6b1!7b1!10m1!8e3!14m1!3b1!17b1!20m4!1e3!1e6!1e14!1e15!24b1!25b1!26b1!29b1!30m1!2b1!36b1!43b1!52b1!54m1!1b1!55b1!56m2!1b1!3b1!65m5!3m4!1m3!1m2!1i224!2i298!89b1!26m4!2m3!1i80!2i92!4i8!30m28!1m6!1m2!1i0!2i0!2m2!1i458!2i549!1m6!1m2!1i1870!2i0!2m2!1i1920!2i549!1m6!1m2!1i0!2i0!2m2!1i1920!2i20!1m6!1m2!1i0!2i529!2m2!1i1920!2i549!31b1!34m16!2b1!3b1!4b1!6b1!8m4!1b1!3b1!4b1!6b1!9b1!12b1!14b1!20b1!23b1!25b1!26b1!37m1!1e81!42b1!46m1!1e9!47m0!49m1!3b1!50m73!1m68!2m7!1u3!4sOpen+now!5e1!9s0ahUKEwiPrMCgia3uAhUjlYsKHanUBmYQ_KkBCNUJKBg!10m2!3m1!1e1!2m7!1u2!4sTop+rated!5e1!9s0ahUKEwiPrMCgia3uAhUjlYsKHanUBmYQ_KkBCNYJKBk!10m2!2m1!1e1!2m7!1u1!4sCheap!5e1!9s0ahUKEwiPrMCgia3uAhUjlYsKHanUBmYQ_KkBCNcJKBo!10m2!1m1!1e1!2m7!1u1!4sUpscale!5e1!9s0ahUKEwiPrMCgia3uAhUjlYsKHanUBmYQ_KkBCNgJKBs!10m2!1m1!1e2!2m7!1u16!4sVisited!5e1!9s0ahUKEwiPrMCgia3uAhUjlYsKHanUBmYQ_KkBCNkJKBw!10m2!16m1!1e1!2m7!1u16!4sHaven%27t+visited!5e1!9s0ahUKEwiPrMCgia3uAhUjlYsKHanUBmYQ_KkBCNoJKB0!10m2!16m1!1e2!3m1!1u3!3m1!1u2!3m2!1u1!3e1!3m11!1u16!2m4!1m2!16m1!1e1!2sVisited!2m4!1m2!16m1!1e2!2sHaven%27t+visited!4BIAE!2e2!3m2!1b1!3b1!59BQ2dBd0Fn!65m0!69i540&q=#{safe_query}&tch=1&ech=1&psi=#{psi}.1611230833287.1"
end
Enter fullscreen mode Exit fullscreen mode

The quick check confirmed that the algorithm works.

$ bundle exec rails runner 'puts Search.new(engine: :google_maps, q: "coffee", lat: 36.3996184, long: -113.9511419, alt: 2124931.1267513777, offset: 20, psi: "rZkJYJuoINHmrgTP1IVI").query_randomized' | xargs curl -s -k -x $HTTP_PROXY -A 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.46' - > f.txt

# "https://www.google.com/search?tbm=map&authuser=0&hl=en&gl=ua&pb=!4m8!1m3!1d2124931.1267513777!2d36.3996184!3d-113.9511419!3m2!1i1024!2i768!4f13.1!7i20!8i20!10b1!12m25!1m1!18b1!2m3!5m1!6e2!20e3!6m16!4b1!23b1!26i1!27i1!41i2!45b1!49b1!63m0!67b1!73m0!74i150000!75b1!89b1!105b1!109b1!110m0!10b1!16b1!19m4!2m3!1i360!2i120!4i8!20m65!2m2!1i203!2i100!3m2!2i4!5b1!6m6!1m2!1i86!2i86!1m2!1i408!2i240!7m50!1m3!1e1!2b0!3e3!1m3!1e2!2b1!3e2!1m3!1e2!2b0!3e3!1m3!1e3!2b0!3e3!1m3!1e8!2b0!3e3!1m3!1e3!2b1!3e2!1m3!1e10!2b0!3e3!1m3!1e10!2b1!3e2!1m3!1e9!2b1!3e2!1m3!1e10!2b0!3e3!1m3!1e10!2b1!3e2!1m3!1e10!2b0!3e4!2b1!4b1!9b0!22m3!1sgpYJYKSwMsH9rgT0_7nAAg!2zMWk6Mix0OjEyNjk2LGU6MSxwOmdwWUpZS1N3TXNIOXJnVDBfN25BQWc6MjM!7e81!24m55!1m15!13m7!2b1!3b1!4b1!6i1!8b1!9b1!20b0!18m6!3b1!4b1!5b1!6b1!13b0!14b0!2b1!5m5!2b1!3b1!5b1!6b1!7b1!10m1!8e3!14m1!3b1!17b1!20m4!1e3!1e6!1e14!1e15!24b1!25b1!26b1!29b1!30m1!2b1!36b1!43b1!52b1!54m1!1b1!55b1!56m2!1b1!3b1!65m5!3m4!1m3!1m2!1i224!2i298!89b1!26m4!2m3!1i80!2i92!4i8!30m28!1m6!1m2!1i0!2i0!2m2!1i458!2i768!1m6!1m2!1i974!2i0!2m2!1i1024!2i768!1m6!1m2!1i0!2i0!2m2!1i1024!2i20!1m6!1m2!1i0!2i748!2m2!1i1024!2i768!34m17!2b1!3b1!4b1!6b1!7b1!8m4!1b1!3b1!4b1!6b1!9b1!12b1!14b1!20b1!23b1!25b1!26b1!37m1!1e81!42b1!46m1!1e9!47m0!49m1!3b1!50m72!1m68!2m7!1u3!4sOpen+now!5e1!9s0ahUKEwizhPrmpK3uAhXBvosKHfR_DigQ_KkBCMgJKBY!10m2!3m1!1e1!2m7!1u2!4sTop+rated!5e1!9s0ahUKEwizhPrmpK3uAhXBvosKHfR_DigQ_KkBCMkJKBc!10m2!2m1!1e1!2m7!1u1!4sCheap!5e1!9s0ahUKEwizhPrmpK3uAhXBvosKHfR_DigQ_KkBCMoJKBg!10m2!1m1!1e1!2m7!1u1!4sUpscale!5e1!9s0ahUKEwizhPrmpK3uAhXBvosKHfR_DigQ_KkBCMsJKBk!10m2!1m1!1e2!2m7!1u16!4sVisited!5e1!9s0ahUKEwizhPrmpK3uAhXBvosKHfR_DigQ_KkBCMwJKBo!10m2!16m1!1e1!2m7!1u16!4sHaven%27t+visited!5e1!9s0ahUKEwizhPrmpK3uAhXBvosKHfR_DigQ_KkBCM0JKBs!10m2!16m1!1e2!3m11!1u16!2m4!1m2!16m1!1e1!2sVisited!2m4!1m2!16m1!1e2!2sHaven%27t+visited!3m1!1u2!3m2!1u1!3e0!3m1!1u3!4BIAE!2e2!3m1!3b1!59BQ2dBd0Fn!65m0!69i540&q=coffee&tch=1&ech=1&psi=rZkJYJuoINHmrgTP1IVI.1611241093531.1"

# Hit next page in browser, copy URL and curl it.

$ curl -s -k -x $HTTP_PROXY -A 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.46' 'https://www.google.com/search?tbm=map&authuser=0&hl=en&gl=ua&pb=!4m8!1m3!1d2124931.1267513777!2d-113.9511419!3d36.3996184!3m2!1i1024!2i768!4f13.1!7i20!8i20!10b1!12m25!1m1!18b1!2m3!5m1!6e2!20e3!6m16!4b1!23b1!26i1!27i1!41i2!45b1!49b1!63m0!67b1!73m0!74i150000!75b1!89b1!105b1!109b1!110m0!10b1!16b1!19m4!2m3!1i360!2i120!4i8!20m65!2m2!1i203!2i100!3m2!2i4!5b1!6m6!1m2!1i86!2i86!1m2!1i408!2i240!7m50!1m3!1e1!2b0!3e3!1m3!1e2!2b1!3e2!1m3!1e2!2b0!3e3!1m3!1e3!2b0!3e3!1m3!1e8!2b0!3e3!1m3!1e3!2b1!3e2!1m3!1e10!2b0!3e3!1m3!1e10!2b1!3e2!1m3!1e9!2b1!3e2!1m3!1e10!2b0!3e3!1m3!1e10!2b1!3e2!1m3!1e10!2b0!3e4!2b1!4b1!9b0!22m3!1srZkJYJuoINHmrgTP1IVI!2zMWk6Mix0OjEyNjk2LGU6MSxwOnJaa0pZSnVvSU5IbXJnVFAxSVZJOjIz!7e81!24m55!1m15!13m7!2b1!3b1!4b1!6i1!8b1!9b1!20b0!18m6!3b1!4b1!5b1!6b1!13b0!14b0!2b1!5m5!2b1!3b1!5b1!6b1!7b1!10m1!8e3!14m1!3b1!17b1!20m4!1e3!1e6!1e14!1e15!24b1!25b1!26b1!29b1!30m1!2b1!36b1!43b1!52b1!54m1!1b1!55b1!56m2!1b1!3b1!65m5!3m4!1m3!1m2!1i224!2i298!89b1!26m4!2m3!1i80!2i92!4i8!30m28!1m6!1m2!1i0!2i0!2m2!1i458!2i768!1m6!1m2!1i974!2i0!2m2!1i1024!2i768!1m6!1m2!1i0!2i0!2m2!1i1024!2i20!1m6!1m2!1i0!2i748!2m2!1i1024!2i768!34m17!2b1!3b1!4b1!6b1!7b1!8m4!1b1!3b1!4b1!6b1!9b1!12b1!14b1!20b1!23b1!25b1!26b1!37m1!1e81!42b1!46m1!1e9!47m0!49m1!3b1!50m72!1m68!2m7!1u3!4sOpen+now!5e1!9s0ahUKEwig-8Lpp63uAhVRs4sKHU9qAQkQ_KkBCN8KKBY!10m2!3m1!1e1!2m7!1u2!4sTop+rated!5e1!9s0ahUKEwig-8Lpp63uAhVRs4sKHU9qAQkQ_KkBCOAKKBc!10m2!2m1!1e1!2m7!1u1!4sCheap!5e1!9s0ahUKEwig-8Lpp63uAhVRs4sKHU9qAQkQ_KkBCOEKKBg!10m2!1m1!1e1!2m7!1u1!4sUpscale!5e1!9s0ahUKEwig-8Lpp63uAhVRs4sKHU9qAQkQ_KkBCOIKKBk!10m2!1m1!1e2!2m7!1u16!4sVisited!5e1!9s0ahUKEwig-8Lpp63uAhVRs4sKHU9qAQkQ_KkBCOMKKBo!10m2!16m1!1e1!2m7!1u16!4sHaven%27t+visited!5e1!9s0ahUKEwig-8Lpp63uAhVRs4sKHU9qAQkQ_KkBCOQKKBs!10m2!16m1!1e2!3m11!1u16!2m4!1m2!16m1!1e1!2sVisited!2m4!1m2!16m1!1e2!2sHaven%27t+visited!3m1!1u2!3m2!1u1!3e0!3m1!1u3!4BIAE!2e2!3m1!3b1!59BQ2dBd0Fn!65m0!69i540&q=coffee&tch=1&ech=1&psi=rZkJYJuoINHmrgTP1IVI.1611241905488.1' > correct.txt

# Compare f.txt and correct.txt in a text editor - they almost the same.
Enter fullscreen mode Exit fullscreen mode

We didn't want to add new parameters (long, lat, alt) to our API for pagination specifically, so we tried to found ways to convert alt from zoom. But those formulas don't equal the altitude in pagination URLs that Google Maps use.

Also, altitude depends on the number of pixels per inch which is different on different devices, and Google re-scales map to fit all places on the map. (This was irrelevant actually). Milos combined multiple formulas from the JS code in Google Maps to the formula to convert altitude to zoom for the given latitude.

const EARTH_RADIUS_IN_METERS = 6371010
const TILE_SIZE = 256
const SCREEN_PIXEL_HEIGHT = 768

var zoom = (altitude, latitude) => Math.log(1 / Math.tan(Math.PI / 180 * 13.1 / 2) * (SCREEN_PIXEL_HEIGHT / 2) * 2 * Math.PI / (TILE_SIZE * altitude / (EARTH_RADIUS_IN_METERS * Math.cos(Math.PI / 180 * latitude)))) / Math.LN2;

zoom(24182.00605141337, 40.7455096);

// => 14
Enter fullscreen mode Exit fullscreen mode

The last missing part is the reverse formula.

Convert zoom to an altitude

We've simplified the zoom formula in Wolfram Alpha.

The initial formula for zoom = f(alt, lat)

The simplified formula in Wolfram Alpha. Pretty nice, huh?

After several hours of reading middle-school math books on logarithmic equations, we reversed the formula.

z = f(alt, lat); alt = f(z, lat); and my reflection on the whiteboard

The Ruby code of alt = f(zoom, lat):

EARTH_RADIUS_IN_METERS = 6371010
TILE_SIZE = 256
SCREEN_PIXEL_HEIGHT = 768
RADIUS_X_PIXEL_HEIGHT = 27.3611 * EARTH_RADIUS_IN_METERS * SCREEN_PIXEL_HEIGHT

def altitude(zoom, latitude)
  (RADIUS_X_PIXEL_HEIGHT * Math.cos((latitude * Math::PI) / 180)) / ((2 ** zoom) * TILE_SIZE)
end
Enter fullscreen mode Exit fullscreen mode

Code to generate pb parameter

The pb parameter for the Google Maps pagination is a function of ll from the URL and theoffset.

ll can contain negative and positive latitude, longitude, and zoom. We decided to extract those parameters with regular expression.

PAGINATION_PARAMETERS_REGEX = %r{
  \A                                      # Start of string
  (?:\s*)                                 # initial possible whitespace
  @(?<latitude>[-+]?\d{1,2}(?:[.,]\d+)?)  # latitude: @10.78472
  (?:\s*,\s*)                             # separator between latitude and longitude
  (?<longitude>[-+]?\d{1,3}(?:[.,]\d+)?)  # longitude: @-110
  (?:\s*,\s*)                             # separator between longitude and zoom
  (?<zoom>\d{1,2}(?:[.,]\d+)?)z           # zoom: 9.22
  \z                                      # End of string
}x
Enter fullscreen mode Exit fullscreen mode

Conversion of the ll URL parameter to pb for the specific results offset (start) on Google Maps looks like this.

def pagination(ll, start)
  extracted_parameters = ll.match(PAGINATION_PARAMETERS_REGEX)

  return "" unless extracted_parameters

  "!4m8!1m3!1d" +
    altitude(extracted_parameters[:zoom].to_f, extracted_parameters[:latitude].to_f) +
    "!2d" +
    extracted_parameters[:longitude] +
    "!3d" +
    extracted_parameters[:latitude] +
    "!3m2!1i1024!2i768!4f13.1!7i20!8i" +
    (start || "0") +
    "!10b1!12m25!1m1!18b1!2m3!5m1!6e2!20e3!6m16!4b1!23b1!26i1!27i1!41i2!45b1!49b1!63m0!67b1!73m0!74i150000!75b1!89b1!105b1!109b1!110m0!10b1!16b1!19m4!2m3!1i360!2i120!4i8!20m65!2m2!1i203!2i100!3m2!2i4!5b1!6m6!1m2!1i86!2i86!1m2!1i408!2i240!7m50!1m3!1e1!2b0!3e3!1m3!1e2!2b1!3e2!1m3!1e2!2b0!3e3!1m3!1e3!2b0!3e3!1m3!1e8!2b0!3e3!1m3!1e3!2b1!3e2!1m3!1e10!2b0!3e3!1m3!1e10!2b1!3e2!1m3!1e9!2b1!3e2!1m3!1e10!2b0!3e3!1m3!1e10!2b1!3e2!1m3!1e10!2b0!3e4!2b1!4b1!9b0!22m3!1s!2z!7e81!24m55!1m15!13m7!2b1!3b1!4b1!6i1!8b1!9b1!20b0!18m6!3b1!4b1!5b1!6b1!13b0!14b0!2b1!5m5!2b1!3b1!5b1!6b1!7b1!10m1!8e3!14m1!3b1!17b1!20m4!1e3!1e6!1e14!1e15!24b1!25b1!26b1!29b1!30m1!2b1!36b1!43b1!52b1!54m1!1b1!55b1!56m2!1b1!3b1!65m5!3m4!1m3!1m2!1i224!2i298!89b1!26m4!2m3!1i80!2i92!4i8!30m28!1m6!1m2!1i0!2i0!2m2!1i458!2i768!1m6!1m2!1i974!2i0!2m2!1i1024!2i768!1m6!1m2!1i0!2i0!2m2!1i1024!2i20!1m6!1m2!1i0!2i748!2m2!1i1024!2i768!34m16!2b1!3b1!4b1!6b1!8m4!1b1!3b1!4b1!6b1!9b1!12b1!14b1!20b1!23b1!25b1!26b1!37m1!1e81!42b1!46m1!1e9!47m0!49m1!3b1!50m53!1m49!2m7!1u3!4s!5e1!9s!10m2!3m1!1e1!2m7!1u2!4s!5e1!9s!10m2!2m1!1e1!2m7!1u16!4s!5e1!9s!10m2!16m1!1e1!2m7!1u16!4s!5e1!9s!10m2!16m1!1e2!3m11!1u16!2m4!1m2!16m1!1e1!2s!2m4!1m2!16m1!1e2!2s!3m1!1u2!3m1!1u3!4BIAE!2e2!3m1!3b1!59B!65m0!69i540"
end
Enter fullscreen mode Exit fullscreen mode

Putting all together

# https://regex101.com/r/nOoiJ6/2
PAGINATION_PARAMETERS_REGEX = %r{
  \A                                      # Start of string
  (?:\s*)                                 # initial possible whitespace
  @(?<latitude>[-+]?\d{1,2}(?:[.,]\d+)?)  # latitude: @10.78472
  (?:\s*,\s*)                             # separator between latitude and longitude
  (?<longitude>[-+]?\d{1,3}(?:[.,]\d+)?)  # longitude: @-110
  (?:\s*,\s*)                             # separator between longitude and zoom
  (?<zoom>\d{1,2}(?:[.,]\d+)?)z           # zoom: 9.22
  \z                                      # End of string
}x

EARTH_RADIUS_IN_METERS = 6371010
TILE_SIZE = 256
SCREEN_PIXEL_HEIGHT = 768
RADIUS_X_PIXEL_HEIGHT = 27.3611 * EARTH_RADIUS_IN_METERS * SCREEN_PIXEL_HEIGHT

def pagination(ll, start)
  extracted_parameters = ll.match(PAGINATION_PARAMETERS_REGEX)

  return "" unless extracted_parameters

  "!4m8!1m3!1d" +
    altitude(extracted_parameters[:zoom].to_f, extracted_parameters[:latitude].to_f) +
    "!2d" +
    extracted_parameters[:longitude] +
    "!3d" +
    extracted_parameters[:latitude] +
    "!3m2!1i1024!2i768!4f13.1!7i20!8i" +
    (start || "0") +
    "!10b1!12m25!1m1!18b1!2m3!5m1!6e2!20e3!6m16!4b1!23b1!26i1!27i1!41i2!45b1!49b1!63m0!67b1!73m0!74i150000!75b1!89b1!105b1!109b1!110m0!10b1!16b1!19m4!2m3!1i360!2i120!4i8!20m65!2m2!1i203!2i100!3m2!2i4!5b1!6m6!1m2!1i86!2i86!1m2!1i408!2i240!7m50!1m3!1e1!2b0!3e3!1m3!1e2!2b1!3e2!1m3!1e2!2b0!3e3!1m3!1e3!2b0!3e3!1m3!1e8!2b0!3e3!1m3!1e3!2b1!3e2!1m3!1e10!2b0!3e3!1m3!1e10!2b1!3e2!1m3!1e9!2b1!3e2!1m3!1e10!2b0!3e3!1m3!1e10!2b1!3e2!1m3!1e10!2b0!3e4!2b1!4b1!9b0!22m3!1s!2z!7e81!24m55!1m15!13m7!2b1!3b1!4b1!6i1!8b1!9b1!20b0!18m6!3b1!4b1!5b1!6b1!13b0!14b0!2b1!5m5!2b1!3b1!5b1!6b1!7b1!10m1!8e3!14m1!3b1!17b1!20m4!1e3!1e6!1e14!1e15!24b1!25b1!26b1!29b1!30m1!2b1!36b1!43b1!52b1!54m1!1b1!55b1!56m2!1b1!3b1!65m5!3m4!1m3!1m2!1i224!2i298!89b1!26m4!2m3!1i80!2i92!4i8!30m28!1m6!1m2!1i0!2i0!2m2!1i458!2i768!1m6!1m2!1i974!2i0!2m2!1i1024!2i768!1m6!1m2!1i0!2i0!2m2!1i1024!2i20!1m6!1m2!1i0!2i748!2m2!1i1024!2i768!34m16!2b1!3b1!4b1!6b1!8m4!1b1!3b1!4b1!6b1!9b1!12b1!14b1!20b1!23b1!25b1!26b1!37m1!1e81!42b1!46m1!1e9!47m0!49m1!3b1!50m53!1m49!2m7!1u3!4s!5e1!9s!10m2!3m1!1e1!2m7!1u2!4s!5e1!9s!10m2!2m1!1e1!2m7!1u16!4s!5e1!9s!10m2!16m1!1e1!2m7!1u16!4s!5e1!9s!10m2!16m1!1e2!3m11!1u16!2m4!1m2!16m1!1e1!2s!2m4!1m2!16m1!1e2!2s!3m1!1u2!3m1!1u3!4BIAE!2e2!3m1!3b1!59B!65m0!69i540"
end

def altitude(zoom, latitude)
  ((RADIUS_X_PIXEL_HEIGHT * Math.cos((latitude * Math::PI) / 180)) / ((2 ** zoom) * TILE_SIZE)).to_s
end
Enter fullscreen mode Exit fullscreen mode

Parse the paginated data

We already have an API to scrape Google Maps that extracts data from the inline JavaScript in the HTML. Milos refactored it to support extraction from inline JS in HTML and from pagination responses. What we can say here is our parser gets the data from the deeply nested arrays and objects.


It's possible to extract data from complex single-page applications without browser automation. For us, it's more fun to understand how the scraped website works instead of tuning timeouts in waitFor function calls. It also runs faster and is simpler to maintain. If this is something that excites you, we'd love for you to join us.

Top comments (0)