DEV Community

Mateusz Jasiński
Mateusz Jasiński

Posted on • Originally published at wizarddos.github.io

PHP 0 to hero pt.16 - .htaccess and robots.txt files

Welcome - Sadly or not, that's the last part of this series

We finally come to an end - today I won't be talking about PHP itself

You know, when we were writing a code, we've always had .php at the end. Like /login.php

But it can be better

We want it more appealing, like /login, don't we?
That's the perfect example of .htaccess file usefulness

Apart from that, I'll also explain usage and importance of robots.txt file - by the way talking about it's security issues

What will we learn today?

  1. When can we use .htaccess
  2. How to add SEO-friendly addresses to our website
  3. Adding error pages
  4. Basic robots.txt structures

No more talking - let's go

When can we use .htaccess files?

.htaccess is a config overwrite mechanism made for Apache HTTP server - and only for it

So, If you are using Nginx or Microsoft IIS - sorry to say, but you need to use native solutions.

Right now (I assume) we are working with Apache server so using .htaccess shouldn't be a problem for us

Also

Quick disclaimer

Using .htaccess files, even despite it's popularity, comes with some flaws.

.htaccess slows down requests - It's server-side mechanism so It can't be cached in a browser. Every request requires "executing" .htaccess file(s)

99% of things we'll be doing in .htaccess can be done in default Apache config file - httpd.conf

If you have write access to it (like on VPS) - use it as much as you can. Your site will perform much better

Also, .htaccess is very vulnerable to syntax errors - one minor typo and you get error 500 all over your website

So to sum up - use .htaccess files only if you don't have access to httpd.conf (for example on shared hosting) and use only 1 at a time

With this said - let's go

SEO-friendly addresses

I think, while browsing the internet you've seen website with link like this http://domain.com/home - That's where .htaccess comes in handy

Using RewriteEngine we can create such addresses for pages

First - enable RewriteEngine

RewriteEngine on
Enter fullscreen mode Exit fullscreen mode

Then, we can start adding our rules

RewriteRule home index.html
RewriteRule about-us aboutus.html
RewriteRule privacy-policy privacy-policy.html
Enter fullscreen mode Exit fullscreen mode

To find some pattern - it looks like this

RewriteRule [new address] [file]
Enter fullscreen mode Exit fullscreen mode

When creating them, pay attention to letter case - write everything in lowercase and use snake-case to separate words

Why?

  1. Lowercase helps SEO - sometimes search engine might index My-page 2 times. As My-page and my-page which makes you loose your position in search results
  2. snake-case (with dashes instead of spaces) - When Google sees my-page it replaces - with space, as it can't distinguish any other meaning for it. Then it matches typical Google searches

On the other hand - using _ suggests that there is a letter, but we don't know which one. So my_page could be read as myopage or myjpage and so on

Error Documents

From part 8 we know that sometimes websites don't work as we want - so we have 4xx and 5xx errors

Now, displaying typical 404 Apache page looks unprofessional - some websites have their own error pages, with matching theme and some funny images

We can do this in .htaccess as well - for example with 404 page

ErrorDocument 404 https://example.com/404.php
Enter fullscreen mode Exit fullscreen mode

One question? Can't we just use relative path - /404.php

Not really - sometimes .htaccess reads / not as website's root directory (/var/www/html or C:/xampp/htdocs), but as filesystem's root directory (/ or C:/)

So, using domain and absolute path in general assures us, that no matter where 404 will happen, Apache will always find valid error document.

What is robots.txt?

Imagine this:

You've created a website. What to do now?

Why not publish it and get it indexed (visible) at Google (and other search engines)

But we don't really want to show our style sheets or account configuration panel in Google, do we?
That's the main purpose of robots.txt files

Google defines it as

[robots.txt] tells search engine crawlers which URLs the crawler can access on your site

I highly recommend you to get familiar with robots.txt docs from Google, before publishing the website

Now, let's say we don't want to index topsecretpage.php.
We need to specify:

  1. To What crawlers this rule should be applied
User-Agent: *
Enter fullscreen mode Exit fullscreen mode

* stands for all (just like in SQL)

  1. Where crawlers can't or can enter
Disallow: /topsecretpage.php
Enter fullscreen mode Exit fullscreen mode

Or

Allow: /openpage.php
Enter fullscreen mode Exit fullscreen mode

In robots.txt we also declare sitemap - XML file containing every sub-page on your website

Sitemap: https://www.example.com/sitemap.xml
Enter fullscreen mode Exit fullscreen mode

Security issues

Don't use robots.txt to hide pages from everyone - this file is easily accessible.

For example - this is YouTube's robots.txt

robots.txt from YouTube

And I didn't hack YT or anything - just went to https://www.youtube.com/robots.txt

So yeah, robots.txt is cool to hide style sheets or some files that require logging in - but don't put credential or employee-only files in there

Conclusion

As you seen, this was more of site administration article - but hey!
Every knowledge is valuable - especially if it resolves somewhere around our main interest

Thanks for reading - check out my blog and rest of this series

They all will get a re-write soon - so more valuable content and better summarized knowledge for you.

Little personal afterword

As that's the last part of this "chapter" - I'd like to say something
Thank you so much for every view, comment and like - Even though it wasn't much, still I'm glad that I could help someone.

Also, big thanks to everyone who took their time and reviewed this course - from Forum Pasja Informatyki and Reddit

I'll definitely post something like this again - when my work will reach satisfying level

I've learned a lot, I hope you had as well - and what more can I say.

Thanks for everything and see you while creating projects (and in some other articles)

Top comments (0)