DEV Community

Daily Challenge #89 - Extract domain name from URL

dev.to staff on October 12, 2019

Write a function that, when given a URL as a string, returns only the domain name as a string. domainName(https://twitter.com/explore) == "twitt...
Collapse
 
zerquix18 profile image
I'm Luis! \^-^/ • Edited

Javascript!

function domainName(domain) {
  const a = document.createElement('a');
  a.href = domain;
  const { hostname } = a;
  const hostSplit = hostname.split('.');
  hostSplit.pop();
  if (hostSplit.length > 1) {
    hostSplit.shift();
  }
  return hostSplit.join();
}

domainName('https://twitter.com/explore') == "twitter"
domainName('https://github.com/thepracticaldev/dev.to') == "github"
domainName('https://www.youtube.com') == "youtube"
Enter fullscreen mode Exit fullscreen mode
Collapse
 
ashawe profile image
Harsh Saglani

Can you please explain what

const { hostname } = a;

does and how?

Collapse
 
midasxiv profile image
Midas/XIV

That's called destructuring assignment. "a" probably is an object which has the hostname property so that assignment extracts hostname.
I'm just writin about this 😅. Hopefully that'll help you.

Collapse
 
ap13p profile image
Afief S

It has the same effect as:
const hostname = a.hostname

Collapse
 
erezwanderman profile image
erezwanderman

Actually the domain name is the whole thing - "twitter.com", "github.com", "youtube.com". What do you want to get for something like "a.b.c.ac.il"?

Collapse
 
eyemyth profile image
Jay Thompson • Edited

Even "com" or whatever other TLD is a domain name. The most specific domain name in "www dot youtube dot com" is "www".

(edited because def dot to removed the www from my youtube URL.)

Collapse
 
savagepixie profile image
SavagePixie

JavaScript one-liner

const domainName = url => url.replace(/https?:\/\/(?:www\.)?/, "").split(".")[0]
Enter fullscreen mode Exit fullscreen mode
Collapse
 
aminnairi profile image
Amin • Edited

Haskell

Note: Assuming the string will always start with either http:// or https:// or nothing but not supporting any other protocols (by lazyness).

Note2: This won't work for the URIs that have a dot in their name, still working on it.

Note3: Working all good but the end is a mess haha...

import Control.Arrow
import Data.List (isPrefixOf)

removeProtocol :: String -> String
removeProtocol "" = ""
removeProtocol string@(firstCharacter : rest)
    | isPrefixOf https string = removeProtocol (drop (length https) string)
    | isPrefixOf http string = removeProtocol (drop (length http) string)
    | otherwise = firstCharacter : removeProtocol rest
    where
        https = "https://"
        http = "http://"

countDots :: String -> Int
countDots =
    filter (== '.') >>> length

dropUntilWithDot :: String -> String
dropUntilWithDot =
    dropWhile (/= '.') >>> drop 1 

domainName :: String -> String
domainName url 
    | countDots url == 1 = takeWhile (/= '.') urlWithoutProtocol
    | otherwise = takeWhile (/= '.') $ iterate dropUntilWithDot url !! (countDots (takeWhile (/= '/') urlWithoutProtocol) - 1)
    where urlWithoutProtocol = removeProtocol url

main :: IO ()
main = do
    print $ domainName "domain.com"                                 -- domain
    print $ domainName "http://domain.com"                          -- domain
    print $ domainName "https://domain.com"                         -- domain
    print $ domainName "api.domain.com"                             -- domain
    print $ domainName "http://api.domain.com"                      -- domain
    print $ domainName "https://api.domain.com"                     -- domain
    print $ domainName "dev.api.domain.com"                         -- domain
    print $ domainName "http://dev.api.domain.com"                  -- domain
    print $ domainName "https://dev.api.domain.com"                 -- domain
    print $ domainName "https://dev.api.domain.com/something.cool"  -- domain

Try it online

Hosted on Repl.it.

Collapse
 
larisho profile image
Gab

Have you tried it using a .co.uk TLD? :D

Collapse
 
aminnairi profile image
Amin • Edited

No I didn't and I assume it will fail for this particular case. Do you have any way on improving this one?

Thread Thread
 
larisho profile image
Gab

Unfortunately, I think the only way to improve on it is to use the list of all TLDs to find how much of the end of the domain is TLD.

Collapse
 
erezwanderman profile image
erezwanderman

Javascript, using npm package "tldjs"

const { parse } = require("tldjs");

const regexify = str => {
  return str.replace(/[|\\{}()[\]^$+*?.]/g, "\\$&");
};

const getDomain = url => {
  const parseResult = parse(url);
  console.log(parseResult);
  if (parseResult.domain) {
    return parseResult.domain.replace(
      RegExp(regexify("." + parseResult.publicSuffix) + "$"),
      ""
    );
  } else {
    return parseResult.hostname;
  }
};
Enter fullscreen mode Exit fullscreen mode

Try it: codesandbox.io/s/affectionate-dawn...

Collapse
 
zizzyzizzy profile image
ZizzyZizzy

Jesus, NO! This is what's wrong with development today. This is equivalent to killing a fly with a Sherman tank.

Please, please, for the love of God and companies everywhere that are sick of obfuscated, confusing unmanageable, unmaintainable and insecure code, please look at the one line pure Javascript code above this answer.

There is absolutely no reason on earth to include a library with hundreds of lines of code to perform a simple operation.

Collapse
 
erezwanderman profile image
erezwanderman

Your comment is what's wrong with development today. Taking the shortest solution that seems to somehow solve the vague requirements and declaring it solved and secure.
Each of the JS solutions will fail for one of these test URLs: 'a.b.c.ac.il/', 'news.com.au/', 'youtube.com'.
Except for mine.

Collapse
 
kvharish profile image
K.V.Harish

My solution in js

const domainName = (url) => {
  const match = url.match(/:\/\/(www[0-9]?\.)?(.[^/:]+)/i);
  return (match && match.length > 2 && typeof match[2] === 'string' && match[2].length) ? match[2].split('.')[0] : null;
}
Collapse
 
willsmart profile image
willsmart • Edited

One in JS

hostName is based on a quick reading of the spec and should cope with usernames and ports. Uniform_Resource_Identifier on wikipedia

salientSubdomain is the human-readable part of a domain name host as reqd. It just clips off www's and TLD's, with a little complication to handle non-matching strings cleanly.

const hostName = url => /^(?:[^:]+:\/\/(?:[^@\/?]+@)?([^:\/?]+))?/.exec(url)[1]
const salientSubdomain = url => /^(?:(?:www\.)?(.+?)(?:\.[a-zA-Z]+)?)?$/.exec(domainName(url)||'')[1]

const testUrls = [
"https://twitter.com/explore",
"https://github.com/thepracticaldev/dev.to",
"https://www.youtube.com",
"https://will:p4ssw0rd@google.com?q=cybersecurity",
"https://a.b.c.d:8080",
"mailto:mr@willsm.art",
"http://192.168.1.60:3000/home"
];

console.log(testUrls.map(hostName))
console.log(testUrls.map(salientSubdomain))

output:

[ "twitter.com", "github.com", "www.youtube.com", "google.com", "a.b.c.d", undefined, "192.168.1.60" ]
[ "twitter", "github", "youtube", "google", "a.b.c", undefined, "192.168.1.60" ]

The hostname extractor regex looks fairly funky but isn't too bad if you break it down into parts...
Regular expression visualization
(vis by Debuggex, which rocks)

Collapse
 
hexangel616 profile image
Emilie Gervais • Edited

JavaScript:

const domainName = url => {
    let hostName = new URL(url).hostname;
    let domain = hostName;
    if (hostName != null) {
        let str = hostName.split('.').reverse();
        if (str != null && str.length > 1) {
            domain = str[1] + '.' + str[0];
            if (hostName.indexOf(/[^/]+((?=\/)|$)/g) != -2 && str.length > 2) {
                domain = str[2] + '.' + domain;
            }
        }
    }
    return domain.split('.')[0];
}

considering subdomains & second-level domains (ie. '.bc.ca')

Collapse
 
mauamy profile image
mauamy

Python one-liner:

def domain_name(url):
    return url.split("/")[2].split(".")[-2]
Collapse
 
ashawe profile image
Harsh Saglani

O(N) approach:

#include<iostream>
#include <string>
using namespace std;
string domainName(string url)
{
    string domain = "";
    bool flag = true;
    for(int i = 0; i < url.size(); i++)
    {
        if(flag)
        {
            if(url[i] == '/')
            {
                flag = false;
                i+=2;
                if(url[i] == 'w')
                    i = i+3;
                else
                    i--;
            }
            continue;
        }
        else if(url[i] == '.')
            return domain;
        domain += url[i];
    }
    return domain;
}
int main()
{
    string url;
    cin >> url;
    cout << domainName(url) << endl;
    return 0;
}

Naive approach:

#include<iostream>
#include <string>
using namespace std;
string domainName(string url)
{
    int x = url.find("www");
    if(x==string::npos)
    {
        x = url.find("//");
        x+=2;
    }
    else
        x+=4;
    return url.substr(x,url.find(".com")-x);
}
int main()
{
    string url;
    cin >> url;
    cout << domainName(url) << endl;
    return 0;
}
Collapse
 
kjhank profile image
Krzysztof Hankiewicz

Quick & dirty ugly chain if you don't want to research regex:

const domainName = (domain) => domain.split('://')[1].split('/')[0].includes('www.') ? domain.split('://')[1].split('/')[0].split('www.')[1].split('.')[0] : domain.split('://')[1].split('/')[0].split('.')[0]
Enter fullscreen mode Exit fullscreen mode
Collapse
 
ap13p profile image
Afief S

Oneline with javascript

const domainName = d => new URL(d).hostname.split(".").shift()
Collapse
 
akwetey profile image
Jonathan Akwetey • Edited

it got to be in my fav JavaScript!!

Alt text of image

Collapse
 
saikiran profile image
Sai Kiran • Edited

No regex needed

 function domainName(url){
     return url.split("/")[2].split(".").slice(-2)[0]
 }
Enter fullscreen mode Exit fullscreen mode
Collapse
 
benyam profile image
BenyamWorku

doesn't work