DEV Community

Cover image for 12 System Design Fundamentals
Daniel The Developer
Daniel The Developer

Posted on • Updated on

12 System Design Fundamentals

This article is intended for software engineers with prior experience in development.

How to Approach System Design Interviews?

Think like a tech lead guiding junior engineers how to implement your design.

What interviewers want to see:

  • base-level understanding of system design fundamentals
  • back-and-forth about problem constraints and parameters of your service
  • well-reasoned, qualified decisions based on engineering trade-offs
  • unique direction your experience and decisions take them
  • holistic view of a system and its users

1) API

REST

  • APIs must be modelled based on the resources in the system. For instance, a single URL with HTTP verbs (GET, POST, PATCH, PUT, DELETE)
  • Good: versioning, structured
  • Bad: unneeded data also get fetched

RPC

  • Write code that executes on another remote machine internally
  • APIS are thought of as an action/command (ex. /postAnOrder(OrderDetails order)
  • Good: no special syntax to be learned, space-efficient
  • Bad: only to be used for internal communication because of timing issues (it becomes challenging to distinguish concurrent multiple communications between machines)

GraphQL

  • Data are structured in a graph relationships. Vertices (entities) and Edges (relationships)
  • Good: ideal for customer-facing apps; you get what you ask; no more routing in backend to get and modify information
  • Bad: less friendly to generate documentations like REST; not suitable for aggregate data

2) Databases (SQL vs NoSQL)

SQL

  • composed of rows and tables
  • strong ACID (emphasis: strong consistency)
  • support powerful queries
  • bad: writes are slow due to B-Trees splitting/merging pages/blocks.

NoSQL

  • nested key-val store
  • multiple writes can be easily handled
  • emphasis: eventual consistency
  • bad: reads might be stale for a couple of seconds (due to log-structured merge-tree)

Other types

  • document-type (JSON)
  • columnar-type (good for queries involving computing the same value types across multiple values)
  • graph-type

3) Scaling (horizontal vs vertical)

Database scaling

  • utilize replicas, then shard into separate databases. Sharding uses a hash function for even distribution and retrieval of entries.

Compute Scaling

  • divide a processing into pieces and designate each piece as a job in a queue so that multiple computers can work together in parallel.

  • both approaches may introduce some latency between calls/requests.

  • replicas ensures the reliability of a system by avoiding a single point of failure.


4) CAP Theorem

  • In real world, it's impossible to achieve all three
  • one of key fundamentals of distributed system design

Consistency

  • every node in a network will have access toe the same data

Availability

  • even if one or more nodes are down, any client making a data request receives a response

Partition Tolerance (necessary for modern systems)

  • In case of a fault in a network or communication, the system will continue to work

5) Web Authentication and Basic Security

  • It's all about the trade-offs between total safety and total convenience
  • Authentication (JWT, session tokens/cookies) is about verifying identity, whereas authorization is allowing actions.
  • For instance, user password can be secured with hashing and salting.

6) Load Balancers

  • It's used to distribute traffic across machines (adding or removing servers in case of a failure).
  • 3 common techniques: round-robin, least connections/response time, consistent hashing.

Round-Robin

  • sends request to servers one by one
  • can overload a server
  • ideal when servers are stable and loads are random

Least Connections/Response Time

  • ideal when servers with similar compute power and requests have varying connection time

Consistent Hashing

  • install N number of virtual nodes for each server, so that loads are distributed as evenly as possible and only partial of the hash ring is affected when a server is added or removed.

7) Caching

  • To reduce latency of an expensive network computation/network calls/database queries/asset fetching.
  • Popular caching patterns: cache-aside, and write-through/write-back.

Cache-aside

  • fetch from cache first, if not found, fetch from database, then cache it.
  • data can become stale in cache if there's frequent write to the database. "Time-to-Live" can resolve it.
  • Checking cache first might introduce extra latency.

Write-through and write-back

  • Application writes data directly to the cache: asynchronously (write-back) or synchronously (write-through)

Write-back

  • data goes into a queue and writes the data back to database.

Write-through

  • opposite of write-back. Hence synchronous workflow, it can slow down whole streaming process.

  • cache invalidation strategy: Least Recently Used (LRU)


8) Message Queues (Pub/Sub)

  • beneficial if there can be a spike of traffic that potentially brings a server or a database down.
  • queues can send requests to multiple servers/systems instead of clients sending the same request to multiple servers/systems.
  • queues decouple the client from the server by eliminating the need to know the server address.

Common properties (based on implementations)

  • guaranteed delivery
  • no duplicate messages are delivered
  • ensure that the order of messages is maintained

9) Indexing

  • great for fetching a block of data from the hard disk to primary memory
  • can be multi-levelled
  • B-tree (self-adjusting; sorted order of pages)

10) Failover (active-passive or leader-follower)

  • replications are used to avoid a single point of failure. It also helps a system serve global users across geographical locations/regions, and increases throughput.

leaders

  • machine that handles write requests to the data-store

followers

  • replicas of the leader that handles read requests

synchronous replication

  • a write request to the followers must be acknowledged (by the leader machine). It slows down streaming, but ensures guaranteed delivery.

asynchronous replication

  • opposite of synchronous replication.
  • less-time consuming, but no guarantee on delivery.

  • most common types of replication systems: single-leader, multi-leader (multiple machines can handle writes, but each needs to catch up with writes on other machines for consistency)

  • to resolve concurrent write conflicts:

    • keep the update with the largest client timestamp
    • sticky routing: writes from the same client go to the same leader
    • keep all the updates and return all the updates from each other

Top comments (0)