Daily Challenge #89 - Extract domain name from URL

twitter logo ・1 min read

Daily Challenge (114 Part Series)

1) Daily Challenge #1 - String Peeler 2) Daily Challenge #2 - String Diamond 3 ... 112 3) Daily Challenge #3 - Vowel Counter 4) Daily Challenge #4 - Checkbook Balancing 5) Daily Challenge #5 - Ten Minute Walk 6) Daily Challenge #6 - Grandma and her friends 7) Daily Challenge #7 - Factorial Decomposition 8) Daily Challenge #8 - Scrabble Word Calculator 9) Daily Challenge #9 - What's Your Number? 10) Daily Challenge #10 - Calculator 11) Daily Challenge #11 - Cubic Numbers 12) Daily Challenge #12 - Next Larger Number 13) Daily Challenge #13 - Twice Linear 14) Daily Challenge #14 - Square into Squares 15) Daily Challenge #15 - Stop gninnipS My sdroW! 16) Daily Challenge #16 - Number of People on the Bus 17) Daily Challenge #17 - Double Trouble 18) Daily Challenge #18 - Triple Trouble 19) Daily Challenge #19 - Turn numbers into words 20) Daily Challenge Post #20 - Number Check 21) Daily Challenge #21 - Human Readable Time 22) Daily Challenge #22 - Simple Pig Latin 23) Daily Challenge #23 - Morse Code Decoder 24) Daily Challenge #24 - Shortest Step 25) Daily Challenge #25 - Double Cola 26) Daily Challenge #26 - Ranking Position 27) Daily Challenge #27 - Unlucky Days 28) Daily Challenge #28 - Kill the Monster! 29) Daily Challenge #29 - Xs and Os 30) Daily Challenge #30 - What is the price? 31) Daily Challenge #31 - Count IPv4 Addresses 32) Daily Challenge #32 - Hide Phone Numbers 33) Daily Challenge #33 - Did you mean...? 34) Daily Challenge #34 - WeIrD StRiNg CaSe 35) Daily Challenge #35 - Find the Outlier 36) Daily Challenge #36 - Let's go for a run! 37) Daily Challenge #37 - Name Swap 38) Daily Challenge #38 - Middle Name 39) Daily Challenge #39 - Virus 40) Daily Challenge #40 - Counting Sheep 41) Daily Challenge #41 - Greed is Good 42) Daily Challenge #42 - Caesar Cipher 43) Daily Challenge #43 - Boardgame Fight Resolver 44) Daily Challenge #44 - Mexican Wave 45) Daily Challenge #45 - Change Machine 46) Daily Challenge #46 - ??? 47) Daily Challenge #47 - Alphabets 48) Daily Challenge #48 - Facebook Likes 49) Daily Challenge #49 - Dollars and Cents 50) Daily Challenge #50 - Number Neighbor 51) Daily Challenge #51 - Valid Curly Braces 52) Daily Challenge #52 - Building a Pyramid 53) Daily Challenge #53 - Faro Shuffle 54) Daily Challenge #54 - What century is it? 55) Daily Challenge #55 - Building a Pile of Cubes 56) Daily Challenge #56 - Coffee Shop 57) Daily Challenge #57 - BMI Calculator 58) Daily Challenge #58 - Smelting Iron Ingots 59) Daily Challenge #59 - Snail Sort 60) Daily Challenge #60 - Find the Missing Letter 61) Daily Challenge #61 - Evolution Rate 62) Daily Challenge #62 - Josephus Survivor 63) Daily Challenge #63- Two Sum 64) Daily Challenge #64- Drying Potatoes 65) Daily Challenge #65- A Disguised Sequence 66) Daily Challenge #66- Friend List 67) Daily Challenge #67- Phone Directory 68) Daily Challenge #68 - Grade Book 69) Daily Challenge #69 - Going to the Cinema 70) Daily Challenge #70 - Pole Vault Competition Results 71) Daily Challenge #71 - See you next Happy Year 72) Daily Challenge #72 - Matrix Shift 73) Daily Challenge #73 - ATM Heist 74) Daily Challenge #74 - Free Pizza 75) Daily Challenge #75 - Set Alarm 76) Daily Challenge #76 - Bingo! (or not...) 77) Daily Challenge #77 - Bird Mountain 78) Daily Challenge #78 - Number of Proper Fractions with Denominator d 79) Daily Challenge #79 - Connect Four 80) Daily Challenge #80 - Longest Vowel Change 81) Daily Challenge #81 - Even or Odd 82) Daily Challenge #82 - English Beggars 83) Daily Challenge #83 - Deodorant Evaporator 84) Daily Challenge #84 - Third Angle of a Triangle 85) Daily Challenge #85 - Unwanted Dollars 86) Daily Challenge #86 - Wouldn't, not Would. 87) Daily Challenge #87 - Pony Express 88) Daily Challenge #88 - Recursive Ninjas 89) Daily Challenge #89 - Extract domain name from URL 90) Daily Challenge #90 - One Step at a Time 91) Daily Challenge #91 - Bananas 92) Daily Challenge #92 - Boggle Board 93) Daily Challenge #93 - Range Extraction 94) Daily Challenge #94 - Last Digit 95) Daily Challenge #95 - CamelCase Method 96) Daily Challenge #96 - Easter Egg Crush Test 97) Daily Challenge #97 - Greed is Good 98) Daily Challenge #98 - Make a Spiral 99) Daily Challenge #99 - Balance the Scales 100) Daily Challenge #100 - Round Up 101) Daily Challenge #101 - Parentheses Generator 102) Daily Challenge #102 - Pentabonacci 103) Daily Challenge #103 - Simple Symbols 104) Daily Challenge #104 - Matrixify 105) Daily Challenge #105 - High-Sum Matrix Drop 106) Daily Challenge #106 - Average Fuel Consumption 107) Daily Challenge #107 - Escape the Mines 108) Daily Challenge #108 - Find the Counterfeit Coin 109) Daily Challenge #109 - Decorate with Wallpaper 110) Daily Challenge #110 - Love VS. Friendship 111) Daily Challenge #111 - 99 Bottles of Beer 112) Daily Challenge #112 - Functions of Integers on the Cartesian Plane 113) Daily Challenge #113 - Iterative Rotation Cipher 114) Daily Challenge #114 - Speed Control

Write a function that, when given a URL as a string, returns only the domain name as a string.

domainName(https://twitter.com/explore) == "twitter"
domainName(https://github.com/thepracticaldev/dev.to) == "github"
domainName(https://www.youtube.com) == "youtube"

Good luck!


Want to propose a challenge idea for a future post? Email yo+challenge@dev.to with your suggestions!

twitter logo DISCUSS (23)
markdown guide
 

Actually the domain name is the whole thing - "twitter.com", "github.com", "youtube.com". What do you want to get for something like "a.b.c.ac.il"?

 

Even "com" or whatever other TLD is a domain name. The most specific domain name in "www dot youtube dot com" is "www".

(edited because def dot to removed the www from my youtube URL.)

 

Javascript!

function domainName(domain) {
  const a = document.createElement('a');
  a.href = domain;
  const { hostname } = a;
  const hostSplit = hostname.split('.');
  hostSplit.pop();
  if (hostSplit.length > 1) {
    hostSplit.shift();
  }
  return hostSplit.join();
}

domainName('https://twitter.com/explore') == "twitter"
domainName('https://github.com/thepracticaldev/dev.to') == "github"
domainName('https://www.youtube.com') == "youtube"
 

Can you please explain what

const { hostname } = a;

does and how?

 

It has the same effect as:
const hostname = a.hostname

 

That's called destructuring assignment. "a" probably is an object which has the hostname property so that assignment extracts hostname.
I'm just writin about this 😅. Hopefully that'll help you.

 

JavaScript one-liner

const domainName = url => url.replace(/https?:\/\/(?:www\.)?/, "").split(".")[0]
 

Haskell

Note: Assuming the string will always start with either http:// or https:// or nothing but not supporting any other protocols (by lazyness).

Note2: This won't work for the URIs that have a dot in their name, still working on it.

Note3: Working all good but the end is a mess haha...

import Control.Arrow
import Data.List (isPrefixOf)

removeProtocol :: String -> String
removeProtocol "" = ""
removeProtocol string@(firstCharacter : rest)
    | isPrefixOf https string = removeProtocol (drop (length https) string)
    | isPrefixOf http string = removeProtocol (drop (length http) string)
    | otherwise = firstCharacter : removeProtocol rest
    where
        https = "https://"
        http = "http://"

countDots :: String -> Int
countDots =
    filter (== '.') >>> length

dropUntilWithDot :: String -> String
dropUntilWithDot =
    dropWhile (/= '.') >>> drop 1 

domainName :: String -> String
domainName url 
    | countDots url == 1 = takeWhile (/= '.') urlWithoutProtocol
    | otherwise = takeWhile (/= '.') $ iterate dropUntilWithDot url !! (countDots (takeWhile (/= '/') urlWithoutProtocol) - 1)
    where urlWithoutProtocol = removeProtocol url

main :: IO ()
main = do
    print $ domainName "domain.com"                                 -- domain
    print $ domainName "http://domain.com"                          -- domain
    print $ domainName "https://domain.com"                         -- domain
    print $ domainName "api.domain.com"                             -- domain
    print $ domainName "http://api.domain.com"                      -- domain
    print $ domainName "https://api.domain.com"                     -- domain
    print $ domainName "dev.api.domain.com"                         -- domain
    print $ domainName "http://dev.api.domain.com"                  -- domain
    print $ domainName "https://dev.api.domain.com"                 -- domain
    print $ domainName "https://dev.api.domain.com/something.cool"  -- domain

Try it online

Hosted on Repl.it.

 
 

No I didn't and I assume it will fail for this particular case. Do you have any way on improving this one?

Unfortunately, I think the only way to improve on it is to use the list of all TLDs to find how much of the end of the domain is TLD.

 

Javascript, using npm package "tldjs"

const { parse } = require("tldjs");

const regexify = str => {
  return str.replace(/[|\\{}()[\]^$+*?.]/g, "\\$&");
};

const getDomain = url => {
  const parseResult = parse(url);
  console.log(parseResult);
  if (parseResult.domain) {
    return parseResult.domain.replace(
      RegExp(regexify("." + parseResult.publicSuffix) + "$"),
      ""
    );
  } else {
    return parseResult.hostname;
  }
};

Try it: codesandbox.io/s/affectionate-dawn...

 

Jesus, NO! This is what's wrong with development today. This is equivalent to killing a fly with a Sherman tank.

Please, please, for the love of God and companies everywhere that are sick of obfuscated, confusing unmanageable, unmaintainable and insecure code, please look at the one line pure Javascript code above this answer.

There is absolutely no reason on earth to include a library with hundreds of lines of code to perform a simple operation.

 

Your comment is what's wrong with development today. Taking the shortest solution that seems to somehow solve the vague requirements and declaring it solved and secure.
Each of the JS solutions will fail for one of these test URLs: 'a.b.c.ac.il/', 'news.com.au/', 'youtube.com'.
Except for mine.

 

Python one-liner:

def domain_name(url):
    return url.split("/")[2].split(".")[-2]
 

One in JS

hostName is based on a quick reading of the spec and should cope with usernames and ports. Uniform_Resource_Identifier on wikipedia

salientSubdomain is the human-readable part of a domain name host as reqd. It just clips off www's and TLD's, with a little complication to handle non-matching strings cleanly.

const hostName = url => /^(?:[^:]+:\/\/(?:[^@\/?]+@)?([^:\/?]+))?/.exec(url)[1]
const salientSubdomain = url => /^(?:(?:www\.)?(.+?)(?:\.[a-zA-Z]+)?)?$/.exec(domainName(url)||'')[1]

const testUrls = [
"https://twitter.com/explore",
"https://github.com/thepracticaldev/dev.to",
"https://www.youtube.com",
"https://will:p4ssw0rd@google.com?q=cybersecurity",
"https://a.b.c.d:8080",
"mailto:mr@willsm.art",
"http://192.168.1.60:3000/home"
];

console.log(testUrls.map(hostName))
console.log(testUrls.map(salientSubdomain))

output:

[ "twitter.com", "github.com", "www.youtube.com", "google.com", "a.b.c.d", undefined, "192.168.1.60" ]
[ "twitter", "github", "youtube", "google", "a.b.c", undefined, "192.168.1.60" ]

The hostname extractor regex looks fairly funky but isn't too bad if you break it down into parts...
Regular expression visualization
(vis by Debuggex, which rocks)

 

My solution in js

const domainName = (url) => {
  const match = url.match(/:\/\/(www[0-9]?\.)?(.[^/:]+)/i);
  return (match && match.length > 2 && typeof match[2] === 'string' && match[2].length) ? match[2].split('.')[0] : null;
}
 

JavaScript:

const domainName = url => {
    let hostName = new URL(url).hostname;
    let domain = hostName;
    if (hostName != null) {
        let str = hostName.split('.').reverse();
        if (str != null && str.length > 1) {
            domain = str[1] + '.' + str[0];
            if (hostName.indexOf(/[^/]+((?=\/)|$)/g) != -2 && str.length > 2) {
                domain = str[2] + '.' + domain;
            }
        }
    }
    return domain.split('.')[0];
}

considering subdomains & second-level domains (ie. '.bc.ca')

 

O(N) approach:

#include<iostream>
#include <string>
using namespace std;
string domainName(string url)
{
    string domain = "";
    bool flag = true;
    for(int i = 0; i < url.size(); i++)
    {
        if(flag)
        {
            if(url[i] == '/')
            {
                flag = false;
                i+=2;
                if(url[i] == 'w')
                    i = i+3;
                else
                    i--;
            }
            continue;
        }
        else if(url[i] == '.')
            return domain;
        domain += url[i];
    }
    return domain;
}
int main()
{
    string url;
    cin >> url;
    cout << domainName(url) << endl;
    return 0;
}

Naive approach:

#include<iostream>
#include <string>
using namespace std;
string domainName(string url)
{
    int x = url.find("www");
    if(x==string::npos)
    {
        x = url.find("//");
        x+=2;
    }
    else
        x+=4;
    return url.substr(x,url.find(".com")-x);
}
int main()
{
    string url;
    cin >> url;
    cout << domainName(url) << endl;
    return 0;
}
 

Quick & dirty ugly chain if you don't want to research regex:

const domainName = (domain) => domain.split('://')[1].split('/')[0].includes('www.') ? domain.split('://')[1].split('/')[0].split('www.')[1].split('.')[0] : domain.split('://')[1].split('/')[0].split('.')[0]
 

Oneline with javascript

const domainName = d => new URL(d).hostname.split(".").shift()
 
 

No regex needed

 function domainName(url){
     return url.split("/")[2].split(".").slice(-2)[0]
 }
Classic DEV Post from Jun 14

FreeCodeCamp violated the rights of Medium authors

dev.to staff profile image
The hardworking team behind dev.to ❤️

👋 Hey dev.to reader.

Do you prefer sans serif over serif?

You can change your font preferences in the "misc" section of your settings. ❤️