DEV Community

Cover image for How to get 250k+ pages indexed by Google

How to get 250k+ pages indexed by Google

David Künnen on January 04, 2019

When creating Devsnap I was pretty naive. I used create-react-app for my frontend and Go with GraphQL for my backend. A classic SPA with client sid...
Collapse
 
mylifeasrunner profile image
Koji Kawano

Nice! It makes sense to bypass Web Rendering Service if HTML source has everything Google needs to understand your pages. Curious, have you seen positive impact on Google search traffic to your site after this?

Collapse
 
kunnendavid profile image
David Künnen

Yeah, a huge one even. You see that last image? That's the traffic change. Today it's even more.

Collapse
 
bgadrian profile image
Adrian B.G.

I used an alternative to SSR, I used prerender hub.docker.com/r/bgadrian/docker-p... as a hosted service (in a container).

At the first visit it caches the result and serves the HTML only to bots.

To speed things up (and have the cache ready when google bots arrive) I created this tool that visits all the pages of an website github.com/bgadrian/warmcache

As for the sitemap, you can create a lot of them and have a master Sitemap that links to the other ones: sitemap index file.

Collapse
 
christian_wilde profile image
Christian Oliveira

Thanks for sharing your experience!

There is a lot of misunderstanding about how Googlebot deals with JS, and even though they improved a lot in the last years, for big and/or fast changing websites depending on JS to display content and/or links is a very risky thing to do if SEO is a priority.

I noticed you used my image to illustrate Google's 2 waves of indexing, it would be nice if you link to the original article where you found it :) (christianoliveira.com/blog/en/seo-... )

Collapse
 
kunnendavid profile image
David Künnen

Hey.. Of course! I loved the article an got a lot of information there. Will add a reference as soon as I'm home again.

Collapse
 
thesureshg profile image
Suresh Kumar Gondi • Edited

Hey David, little help in this code..

if(isBot(req)) {
completeHtml = completeHtml.replace(/]<em>&gt;(?:(?!&lt;\/script&gt;)[<sup>])</sup></em>&lt;\/script&gt;/g, &quot;&quot;)<br> } </p> <p>The function isBot , How are you detecting the bots with that? Can you please share the code of that function? </p> <p>I&#39;m going to apply it for the static site and going to see how it works or non JS based website. </p> <p>Thanks in advance.</p>

Collapse
 
ben profile image
Ben Halpern

Wow, I’ve never heard of turning off JS for bots. Can you think of any possible downside to this (assuming the site still works appropriately), like Google won’t penalize you or anything?

Collapse
 
kunnendavid profile image
David Künnen • Edited

From what I understand, that's even how Google wants it. Here is an article from google itself about it.

Google dynamic rendering

I don't think Google wants to waste a lot of resources to render the JavaScript of a website if the HTML has already completely been rendered by the server.

Be extra careful with JSON-LD tho. You should not remove that:

"Make sure JSON-LD script tags are included in the dynamically rendered HTML of your content. Consult the documentation of your rendering solution for more information."

Collapse
 
fzammit profile image
Fabio Zammit

Great article, out of curiosity any reason why you didnt use a library like next.js as they offer full support for SSR?

Got quite a bit of experience with technical SEO and we used to avoid handing Google bot a different version as that could be seen as you are trying to manipulate the bot.

Obviously Google bot and its metrics are a black box so sometimes things are not as clear as one would like. Thanks 🙂

Thread Thread
 
kunnendavid profile image
David Künnen • Edited

Since React's renderToString method already offers 95% of what I need to create SSR for my website, I felt like using some new library like next.js was just way too overkill.

It would probably even have cost me more time, since I'd have to learn next.js first. This way it only took me a few hours to get SSR done and it's just a tiny and fast codebase doing all that.

As for the manipulation, as Google mentions in this article, it even promotes exactly that different handling of the Googlebot. I mean it's a really logical thing to do. They save tons of resources if you cut out anything unnecessary for the bot.

After playing around with the bot for some time I think Google just wants you to make it Google as easy as possible to crawl your site and rewards it greatly.

Collapse
 
musicgo36623468 profile image
heaven on earth • Edited

where do i implement this code i own a wordpress website

if(isBot(req)) {
completeHtml = completeHtml.replace(/]<em>&gt;(?:(?!&lt;\/script&gt;)[^])</em>&lt;\/script&gt;/g, &quot;&quot;)<br> } </p>

Collapse
 
kunnendavid profile image
David Künnen

This snippet is something I implement server side, but you have to be very careful to not remove the JSON-LD when using it. Since I have no experience with Wordpress unfortunately I can't help you much. :-/

Collapse
 
thesureshg profile image
Suresh Kumar Gondi

HI good morning,

Just curious about the code part. Does this code work as expected on wordpress? :)

I'm also implementing the same on the WordPress.

Thanks

Collapse
 
ffcc772 profile image
Не в тренде • Edited

After the final tag /article if not using json in widgets.
As an experiment, I added this code to the first sidebar widget.

Collapse
 
thesureshg profile image
Suresh Kumar Gondi

David, Can we able to use the same code for any type of website CMS? I mainly work on the Wordpress. The code has to be placed on the top of the head?

Thanks.

Collapse
 
kunnendavid profile image
David Künnen

This snippet is something I implement server side, but you have to be very careful to not remove the JSON-LD when using it. Since I have no experience with Wordpress unfortunately I can't help you much. :-/

Collapse
 
thesureshg profile image
Suresh Kumar Gondi • Edited

Not a problem, David. I'm planning to convert my whole site into HTML based on the static site generators.

Then that code you mentioned most likely to work :)

And also thanks for pointing out the JSON-LD part, I will paste the code below that.

Thank you, David.

Collapse
 
bemile profile image
Benoît Emile

Hey David, thank you for your post.

Could you please explain where did you get your 5th image from ? I can't find in in Google Search Console, is it somewhere else, like a Kibana or something ?

Collapse
 
kunnendavid profile image
David Künnen

Of course. Those are custom statistics I generated from my Google Logs (Stackdriver). It displays the amount of time Googlebot hit my website within 30 Minutes of time.

Collapse
 
nilportugues profile image
Nil Portugués Calderó

Super awesome, thanks for sharing.

I'm going to try all of these. Thanks!

Collapse
 
melonique profile image
melonique

Awesome post! thank you!

I was wondering tho, one year later, do you still recommend doing it?