loading...

re: Building a URL shortener service series, Introduction. VIEW POST

FULL DISCUSSION
 

Adonis, Hi!

Curious thing! I was asked once to "architecture" such a service on a paper while being interviewed for bigdata developer position.

It quickly boiled down to understanding that we expect url shortener service to be able to scale infinitely. Then it comes to questions "what bottlenecks we expect" and what approaches and technologies may help in workarounding them.

The task itself in "small scale" is rather exercise than a challenge (though, of course, useful in some sense).

Briefly, imagine we have endpoint (or page) which allows to submit url, store id to our database (or other storage) assign short identifier and return it. Other endpoint just fetches data from storage by these identifiers, right?

Now, if amount of requests is huge we may expect database becoming bottleneck. If you use some distributed AWS storage, you most likely can avoid this. But then you'll need to decide how to assign short identifiers so that they never duplicate (doing this in parallel!) At last, when all those issues are solved we may need to invent how to deal with load balancer or other "edge-side" proxy...

I just share this for case that you may yourself either think of this beforehand or at least prepare some answers for case if you will be asked about your solution in future :)

Good luck, looking forward to your posts!

 

Hello Rodion,

Thank for you response. My architecture is exactly as you have defined, one endpoint to reduce urls and save them into the system and another one to do the redirection, and yes one of the main bottleneck to expect is how to scale that.

Initially i think it can be easily scaled out by putting those two endpoint in separate process (or services) and do the main processing by using a in-memory db like redis and some other one in background tasks, those processing are for example:

  • saving a short version of url and the url
  • retrieving the original url for redirection when someone uses the short url (here to remain quick)
  • saving click's stats as user location etc, this should be done in async mode if possible
  • a background job who will pull content from redis and organize them into the relationnal DB (short and long url, click stats, etc)

As everything should be stored on AWS it shouldn't be too difficult to scale a such architecture, like you said with a load balancer in front and the two services in backend.

Anyway i will continue todo some reseach on the subject. It seem like i will need to put each services in a separated docker image in order to easily scale them (horizontally or vertically) if needed, what do you think about this possibility ?

Thanks for all.

 

Well, thanks for that detailed answer :)

I think the scalability question may still arose when supposed "in-memory db" won't be enough. But hopefully by this time you will have further ideas :)

put each services in a separated docker image

I think it is not necessarily, as you are going for AWS. Much depends on which AWS service you are going to use. I think there is no sense in using anything you can put docker into. AWS has something (sorry, forget the name) which works as a service for you, so you just deploy the code and it is scaled almost to infinity.

I just prefer google services instead, here we have famous AppEngine with the same property. You don't need to care of hardware. It allows free tier for infinite time.

Other free alternative (supporting Python and Flask) by the way is PythonAnywhere - though perhaps it is not that interesting in terms of scalability (it itself lives on AWS I suspect).

Yeah you certainly refer to AWS ElasticBeanStalk, in fact it is like AppEgine. I even use it now.

But the thing is actually i wanted to put the frontend app in the same folder than the backend app and put maybe everything in a Docker. Or divide it in two service each in separated docker images.

I didn't think of deploying on ELB because it seemed to be too huge for a such small app.... but it seem like no that huge.

i wanted to put the frontend app in the same folder than the backend app and put maybe everything in a Docker

Honestly, that is (IMHO) the thing which we'd prefer to avoid. All frontend stuff should live at least on some static serving proxy (like nginx) or better some CDN (or something alike, like github pages).

In your application the traffic for logic (sending urls and short handles between backend and frontend) is going to be much smaller than size of all JS/CSS/PNG etc, right?

Code of Conduct Report abuse