Serial podcast creator and .NET Core maniac.
Can often be found talking about everything and nothing on one of the many podcasts that he produces (only one of them is about .NET Core, honest)
Location
Leeds, UK
Education
Computer Science with Games Development - BSc
Work
.NET Development Contractor; Podcast host, producer and editor
For those who don't know, it's a header which you can include in page response which tells a web crawler what it's permitted to do with the page. It's not a replacement for the robots.txt, and (just like the robots.txt file) the web search companies don't have to support it.
An example of the robots header would be something like:
X-Robots-Tag: noarchive, nosnippet
This instructs a web crawler which finds the page that it is not permitted to archive the page or provide snippets from it (in search results).
I'm a bit torn by the robots header. On one hand, it allows really fine control on a per-page basis. On the other hand, you have to do a request to the page to find whether you are allowed to keep the data or not which feels like a waste of bandwidth.
I mean, you could do a HEAD request to find out but then you might end up with two HTTP requests just to get content in an "allowed" scenario.
That said, I do see value in the header. I'm actually building my own web crawler (which I will do another post about in the future) and I want to add support for the header.
For further actions, you may consider blocking this person and/or reporting abuse
We're a place where coders share, stay up-to-date and grow their careers.
How do you feel about the robots HTTP header?
For those who don't know, it's a header which you can include in page response which tells a web crawler what it's permitted to do with the page. It's not a replacement for the robots.txt, and (just like the robots.txt file) the web search companies don't have to support it.
An example of the robots header would be something like:
X-Robots-Tag: noarchive, nosnippet
This instructs a web crawler which finds the page that it is not permitted to archive the page or provide snippets from it (in search results).
I'm a bit torn by the robots header. On one hand, it allows really fine control on a per-page basis. On the other hand, you have to do a request to the page to find whether you are allowed to keep the data or not which feels like a waste of bandwidth.
I mean, you could do a HEAD request to find out but then you might end up with two HTTP requests just to get content in an "allowed" scenario.
That said, I do see value in the header. I'm actually building my own web crawler (which I will do another post about in the future) and I want to add support for the header.