DEV Community

Daily Challenge #89 - Extract domain name from URL

dev.to staff on October 12, 2019

Write a function that, when given a URL as a string, returns only the domain name as a string. domainName(https://twitter.com/explore) == "twitt...

Read full post

I'm Luis! \^-^/ • Oct 12 '19 • Edited

Javascript!

function domainName(domain) {
  const a = document.createElement('a');
  a.href = domain;
  const { hostname } = a;
  const hostSplit = hostname.split('.');
  hostSplit.pop();
  if (hostSplit.length > 1) {
    hostSplit.shift();
  }
  return hostSplit.join();
}

domainName('https://twitter.com/explore') == "twitter"
domainName('https://github.com/thepracticaldev/dev.to') == "github"
domainName('https://www.youtube.com') == "youtube"

Harsh Saglani • Oct 12 '19

Can you please explain what

const { hostname } = a;

does and how?

Afief S • Oct 13 '19

It has the same effect as:
const hostname = a.hostname

Midas/XIV • Oct 12 '19

That's called destructuring assignment. "a" probably is an object which has the hostname property so that assignment extracts hostname.
I'm just writin about this 😅. Hopefully that'll help you.

erezwanderman • Oct 12 '19

Actually the domain name is the whole thing - "twitter.com", "github.com", "youtube.com". What do you want to get for something like "a.b.c.ac.il"?

Jay Thompson • Oct 13 '19 • Edited

Even "com" or whatever other TLD is a domain name. The most specific domain name in "www dot youtube dot com" is "www".

(edited because def dot to removed the www from my youtube URL.)

SavagePixie • Oct 12 '19

JavaScript one-liner

const domainName = url => url.replace(/https?:\/\/(?:www\.)?/, "").split(".")[0]

Amin • Oct 12 '19 • Edited

Haskell

Note: Assuming the string will always start with either http:// or https:// or nothing but not supporting any other protocols (by lazyness).

Note2: This won't work for the URIs that have a dot in their name, still working on it.

Note3: Working all good but the end is a mess haha...

import Control.Arrow
import Data.List (isPrefixOf)

removeProtocol :: String -> String
removeProtocol "" = ""
removeProtocol string@(firstCharacter : rest)
    | isPrefixOf https string = removeProtocol (drop (length https) string)
    | isPrefixOf http string = removeProtocol (drop (length http) string)
    | otherwise = firstCharacter : removeProtocol rest
    where
        https = "https://"
        http = "http://"

countDots :: String -> Int
countDots =
    filter (== '.') >>> length

dropUntilWithDot :: String -> String
dropUntilWithDot =
    dropWhile (/= '.') >>> drop 1 

domainName :: String -> String
domainName url 
    | countDots url == 1 = takeWhile (/= '.') urlWithoutProtocol
    | otherwise = takeWhile (/= '.') $ iterate dropUntilWithDot url !! (countDots (takeWhile (/= '/') urlWithoutProtocol) - 1)
    where urlWithoutProtocol = removeProtocol url

main :: IO ()
main = do
    print $ domainName "domain.com"                                 -- domain
    print $ domainName "http://domain.com"                          -- domain
    print $ domainName "https://domain.com"                         -- domain
    print $ domainName "api.domain.com"                             -- domain
    print $ domainName "http://api.domain.com"                      -- domain
    print $ domainName "https://api.domain.com"                     -- domain
    print $ domainName "dev.api.domain.com"                         -- domain
    print $ domainName "http://dev.api.domain.com"                  -- domain
    print $ domainName "https://dev.api.domain.com"                 -- domain
    print $ domainName "https://dev.api.domain.com/something.cool"  -- domain

Try it online

Hosted on Repl.it.

Gab • Oct 14 '19

Have you tried it using a .co.uk TLD? :D

Amin • Oct 14 '19 • Edited

No I didn't and I assume it will fail for this particular case. Do you have any way on improving this one?

Gab • Oct 14 '19

Unfortunately, I think the only way to improve on it is to use the list of all TLDs to find how much of the end of the domain is TLD.

erezwanderman • Oct 13 '19

Javascript, using npm package "tldjs"

const { parse } = require("tldjs");

const regexify = str => {
  return str.replace(/[|\\{}()[\]^$+*?.]/g, "\\$&");
};

const getDomain = url => {
  const parseResult = parse(url);
  console.log(parseResult);
  if (parseResult.domain) {
    return parseResult.domain.replace(
      RegExp(regexify("." + parseResult.publicSuffix) + "$"),
      ""
    );
  } else {
    return parseResult.hostname;
  }
};

Try it: codesandbox.io/s/affectionate-dawn...

ZizzyZizzy • Oct 13 '19

Jesus, NO! This is what's wrong with development today. This is equivalent to killing a fly with a Sherman tank.

Please, please, for the love of God and companies everywhere that are sick of obfuscated, confusing unmanageable, unmaintainable and insecure code, please look at the one line pure Javascript code above this answer.

There is absolutely no reason on earth to include a library with hundreds of lines of code to perform a simple operation.

erezwanderman • Oct 14 '19

Your comment is what's wrong with development today. Taking the shortest solution that seems to somehow solve the vague requirements and declaring it solved and secure.
Each of the JS solutions will fail for one of these test URLs: 'a.b.c.ac.il/', 'news.com.au/', 'youtube.com'.
Except for mine.

K.V.Harish • Oct 12 '19

My solution in js

const domainName = (url) => {
  const match = url.match(/:\/\/(www[0-9]?\.)?(.[^/:]+)/i);
  return (match && match.length > 2 && typeof match[2] === 'string' && match[2].length) ? match[2].split('.')[0] : null;
}

willsmart • Oct 13 '19 • Edited

One in JS

hostName is based on a quick reading of the spec and should cope with usernames and ports. Uniform_Resource_Identifier on wikipedia

salientSubdomain is the human-readable part of a domain name host as reqd. It just clips off www's and TLD's, with a little complication to handle non-matching strings cleanly.

const hostName = url => /^(?:[^:]+:\/\/(?:[^@\/?]+@)?([^:\/?]+))?/.exec(url)[1]
const salientSubdomain = url => /^(?:(?:www\.)?(.+?)(?:\.[a-zA-Z]+)?)?$/.exec(domainName(url)||'')[1]

const testUrls = [
"https://twitter.com/explore",
"https://github.com/thepracticaldev/dev.to",
"https://www.youtube.com",
"https://will:p4ssw0rd@google.com?q=cybersecurity",
"https://a.b.c.d:8080",
"mailto:mr@willsm.art",
"http://192.168.1.60:3000/home"
];

console.log(testUrls.map(hostName))
console.log(testUrls.map(salientSubdomain))

output:

[ "twitter.com", "github.com", "www.youtube.com", "google.com", "a.b.c.d", undefined, "192.168.1.60" ]
[ "twitter", "github", "youtube", "google", "a.b.c", undefined, "192.168.1.60" ]

The hostname extractor regex looks fairly funky but isn't too bad if you break it down into parts...

(vis by Debuggex, which rocks)

Emilie Gervais • Oct 12 '19 • Edited

JavaScript:

const domainName = url => {
    let hostName = new URL(url).hostname;
    let domain = hostName;
    if (hostName != null) {
        let str = hostName.split('.').reverse();
        if (str != null && str.length > 1) {
            domain = str[1] + '.' + str[0];
            if (hostName.indexOf(/[^/]+((?=\/)|$)/g) != -2 && str.length > 2) {
                domain = str[2] + '.' + domain;
            }
        }
    }
    return domain.split('.')[0];
}

considering subdomains & second-level domains (ie. '.bc.ca')

mauamy • Oct 13 '19

Python one-liner:

def domain_name(url):
    return url.split("/")[2].split(".")[-2]

Harsh Saglani • Oct 12 '19

O(N) approach:

#include<iostream>
#include <string>
using namespace std;
string domainName(string url)
{
    string domain = "";
    bool flag = true;
    for(int i = 0; i < url.size(); i++)
    {
        if(flag)
        {
            if(url[i] == '/')
            {
                flag = false;
                i+=2;
                if(url[i] == 'w')
                    i = i+3;
                else
                    i--;
            }
            continue;
        }
        else if(url[i] == '.')
            return domain;
        domain += url[i];
    }
    return domain;
}
int main()
{
    string url;
    cin >> url;
    cout << domainName(url) << endl;
    return 0;
}

Naive approach:

#include<iostream>
#include <string>
using namespace std;
string domainName(string url)
{
    int x = url.find("www");
    if(x==string::npos)
    {
        x = url.find("//");
        x+=2;
    }
    else
        x+=4;
    return url.substr(x,url.find(".com")-x);
}
int main()
{
    string url;
    cin >> url;
    cout << domainName(url) << endl;
    return 0;
}

Krzysztof Hankiewicz • Oct 12 '19

Quick & dirty ugly chain if you don't want to research regex:

const domainName = (domain) => domain.split('://')[1].split('/')[0].includes('www.') ? domain.split('://')[1].split('/')[0].split('www.')[1].split('.')[0] : domain.split('://')[1].split('/')[0].split('.')[0]

Afief S • Oct 13 '19

Oneline with javascript

const domainName = d => new URL(d).hostname.split(".").shift()

Sai Kiran • Oct 14 '19 • Edited

No regex needed

 function domainName(url){
     return url.split("/")[2].split(".").slice(-2)[0]
 }

BenyamWorku • Jul 8 '21

doesn't work

Jonathan Akwetey • Oct 12 '19 • Edited

it got to be in my fav JavaScript!!