hridyesh bisht for AWS Community Builders

Posted on Jan 19, 2021

Networking in Computers ?

Some basic key definitions:

Bandwidth: Data transferred per unit time. So 10 Mbps is equal to transfer of 10 million bits every second.
Throughput: Total data transmission for a specified time range.
Latency: Transfer delay of of data packets
Router: It helps you connect multiple devices to the Internet, and connect the devices to each other.
1. A router acts as a dispatcher, choosing the best route for your information to travel.
Internet Protocol (IP): Is the address system of the Internet and has the core function of delivering packets of information from a source device to a target device.

Q.Is there any difference between URL, URN and URI?

Uniform Resource Identifier: A string of characters used to identify a name or a resource on the Internet. A URI has two specializations known as URL and URN.
Uniform Resource Locator: Specifies where an identified resource is available and the mechanism for retrieving it. A URL defines how the resource can be obtained.
Uniform Resource Name: Uses the URN scheme, and does not imply availability of the identified resource

Both URNs (names) and URLs (locators) are URIs, and a particular URI may be both a name and a locator at the same time.

Application Architecture

1.Client Server Model:

The client computer sends a request for data to the server through the internet, the server accepts the requested, process it and deliver the data packets requested back to the client. One special feature is that the server computer has the potential to manage numerous clients at the same time.

A single client can connect to numerous servers at a single timestamp, where each server provides a different set of services to that specific client. It creates a centralized network.

An example, would be my computer and the WordPress server.

2.Peer to Peer Model:

All the computers and devices that are part of them are referred to as peers, and they share and exchange workloads. Each peer in a peer-to-peer network is equal to the other peers. There are no privileged peers, and there is no primary administrator device in the center of the network.

The primary goal of peer-to-peer networks is to share resources and help computers and devices work collaboratively, provide specific services, or execute specific tasks. It creates a decentralized network.

An example would be torrent sites, as transfer of data happens from peer to peer.

3.Open System Interconnection model(OSI Model):

The OSI model can be seen as a universal language for computer networking. It’s based on the concept of splitting up a communication system into seven abstract layers, each one stacked upon the last.

1.Application Layer:

Acts as an interface between users and software applications. Data is packed from this interaction to transmit. Application layer is responsible for the protocols and data manipulation that the software relies on to present meaningful data to the user.

2.Presentation Layer:

Preparing data by converting it into binary code, so that it can be used by the application layer. The presentation layer is responsible for translation, encryption, and compression of data.

3.Session Layer:

The time between when the communication is opened and closed is known as the session.

The session layer ensures that the session stays open long enough to authenticate, authorize and transfer all the data being exchanged, and then promptly closes the session in order to avoid wasting resources.

4.Transport Layer:

Responsible for end-to-end communication between the two devices. This includes taking data from the session layer and breaking it up into chunks called segments before sending it. The transport layer on the receiving device is responsible for reassembling the segments into data the session layer can consume. The transport layer is also responsible for flow control and error control.

Flow control determines an optimal speed of transmission to ensure that a sender with a fast connection doesn’t overwhelm a receiver with a slow connection.
Performs error control on the receiving end by ensuring that the data received is complete, and requesting a re-transmission if it isn’t.

5.Network layer:

Responsible for facilitating data transfer between two different networks. If the two devices communicating are on the same network, then the network layer is unnecessary. The network layer breaks up segments from the transport layer into smaller units, called packets, on the sender’s device, and reassembling these packets on the receiving device.

6.Data Link layer:

Facilitates data transfer between two devices on the same network. The data link layer takes packets from the network layer and breaks them into smaller pieces called frames.

Like the network layer, the data link layer is also responsible for flow control and error control in intra-network communication . The transport layer only does flow control and error control for inter-network communications.

7.Physical layer:

This layer includes the physical equipment involved in the data transfer, such as the cables and switches. This is also the layer where the data gets converted into a bit stream, which is a string of 1s and 0s. The physical layer of both devices must also agree on a signal convention so that the 1s can be distinguished from the 0s on both devices.

For more information on OSI model,

4.Transmission Control Protocol/Internet Protocol (TCP/IP) Model:

Transmission Control Protocol/Internet Protocol (TCP/IP) network provides a framework for transmitting this data, and it requires some basic information from us to move this data.

There was redundancy and repetition of functionalities in different layers in OSI Model, So we combined Application, Presentation and Session Layer into one layer called Application layer. This layer was responsible for all the functionalities of three layers.

1.Application Layer:

Responsible for the protocols and data manipulation that the software relies on to present meaningful data to the user.
Responsible for converting data into binary code, encrypting data and compressing data, before beginning the session.
Responsible for ensuring that the session stays open long enough to authenticate, authorize and transfer all the data being exchanged, and then promptly closes the session in order to avoid wasting resources.

2.Transport Layer:

Establishes the connection between applications running on different hosts. It keeps track of the processes running in the applications above it by assigning port numbers to them and uses the Network layer to access the TCP/IP network.

3.Network Layer:

Responsible for creating the packets that move across the network. It uses IP addresses to identify the packet’s source and destination.

4.Data Link Layer:

Responsible for creating the frames that move across the network. These frames encapsulate the packets and use MAC addresses to identify the source and destination.

5.Physical Layer:

The Physical layer encodes and decodes the bits found in a frame and includes the transceiver that drives and receives the signals on the network.

For more information on tcp/IP model,

https://www.youtube.com/watch?v=3b_TAYtzuho

Network Performance:

We have to minimize the Delay and Packet Loss, and maximize Throughput.

D(proc): Between the time the packet is correctly received at the head node of the incoming link and the time the packet is assigned to an outgoing link queue for transmission.
D(queue): Between the time the packet is assigned to a queue for transmission and the time it starts being transmitted. During this time, the packet waits while other packets in the transmission queue are transmitted.
D(trans): Between the times that the first and last bits of the packet are transmitted.
D(prop): Between the time the last bit is transmitted at the head node of the link queue and the time the last bit is received at the next router. This is proportional to the physical distance between transmitter and receiver.

Total Delay= D(proc)+ D(delay) + D(trans) + D(prop)

Application Layer

Protocols Used in Application Layer are:

1. Hyper Text Transfer Protocol(HTTP):

This is a basis for data communication in the internet. The data communication starts with a request sent from a client and ends with the response received from a web server.

Various Methods used are:

GET Method: Requests a representation of the specified resource. Requests using GET should only retrieve data.
HEAD Method: Asks for a response identical to that of a GET request, but without the response body.
POST Method: Used to submit an entity to the specified resource, often causing a change in state or side effects on the server.
PUT Method: Replaces all current representations of the target resource with the request payload.

HTTP Request:

A simple request message from a client computer consists of the following components:

A request line to get a required resource. It contains Method, Path and HTTP version.
Headers. They contain name value pairs.
An empty line.
A message body which is optional. Contains extra information to be delivered to the server.

HTTP Response:

A simple response from the server contains the following components:

HTTP Status Code. It contains HTTP version, Status code and Response
Headers. They contain name value pairs.
An empty line.
A message body which is optional. Contains extra information to be delivered to the host.

Types of Status Code: For example 404 or 502 error code,

1XX: Information regarding message
2XX: Information regarding success
3XX: Information regarding Re directional
4XX: Information regarding Client
5XX: Information regarding Server

HTTPS OVER HTTP ?

HTTPS is the secured HTTP protocol required to send and receive information securely over internet. Nowadays it is mandatory for all websites to have HTTPS protocol to have secured internet.

Besides the security and encryption, the communication structure of HTTPS protocol remains same as HTTP protocol as explained above.

2.Domain Name System(DNS):

It’s like a global phone book of the internet which contains numeric addresses instead of alphabetic addresses. DNS is a protocol used for exchanging information on the internet, using domain names as the upper layer to match to IP address as the layer beneath it.

End users access information using domain names, not IP addresses, for example, you are searching for my blog, now instead of remembering the site’s IP address number, you just have to remember the site’s name to connect and access information from it.

There are 4 DNS servers involved in loading a webpage:

DNS recursor - A server designed to receive queries from client machines through applications such as web browsers and resolve these queries. Caching is a data persistence process that helps short-circuit the necessary requests by serving the requested resource record earlier in the DNS lookup.
1. The recursor can be thought of as a librarian who is asked to go find a particular book somewhere in a library.
Root nameserver - The root server is the first step in translating human readable host names into IP addresses.
1. It can be thought of like an index in a library that points to different racks of books - typically it serves as a reference to other more specific locations.
TLD nameserver - The top level domain server (TLD) is the next step in the search for a specific IP address, and it hosts the last portion of a host name, for example .in or .com.
1. It can be thought of as a specific rack of books in a library.
Authoritative nameserver - The authoritative nameserver is the last stop in the nameserver query. If the authoritative name server has access to the requested record, it will return the IP address for the requested host name back to the DNS Recursor that made the initial request.
1. It can be thought of as a dictionary on a rack of books, in which a specific name can be translated into its definition.

What is a DNS resolver?

The DNS resolver is the first stop in the DNS lookup, and it is responsible for dealing with the client that made the initial request. The resolver starts the sequence of queries that ultimately leads to a URL being translated into the necessary IP address.

DNS Query and Response:

The DNS messages are sent over UDP( smaller than 512 bytes for common requests and responses) or TCP.

Question Section:This is a section consisting of one or more question records. It is present on both query and response messages.
Answer Section: This is a section consisting of one or more resource records. It is present only on response messages. This section includes the answer from the server to the client (resolver).
Authoritative Section: This is a section consisting of one or more resource records. It is present only on response messages. This section gives information (domain name) about one or more authoritative servers for the query.
Additional Information Section: This is a section consisting of one or more resource records. It is present only on response messages. This section provides additional information that may help the resolver.

Information about flags in header:

QR (query/response): This is a 1-bit subfield that defines the type of message. If it is 0, the message is a query. If it is 1, the message is a response.
OpCode: This is a 4-bit subfield that defines the type of query or response (0 if standard, 1 if inverse, and 2 if a server status request).
AA (authoritative answer): This is a 1-bit subfield. When it is set (value of 1)it means that the name server is an authoritative server. It is used only in a response message.
TC (truncated): This is a 1-bit subfield. When it is set (value of 1), it means that the response was more than 512 bytes and truncated to 512.
RD (recursion desired): This is a 1-bit subfield. When it is set (value of 1) it means the client desires a recursive answer.
RA (recursion available): This is a 1-bit subfield. When it is set in the response, it means that a recursive response is available.
Reserved: This is a 3-bit subfield set to 000.
RCode: This is a 4-bit field that shows the status of the error in the response.

3.Email Protocols:

When ever we are sending mail from Bob to Alice, the protocol followed will be,

Bob will send the email to his mail server using SMTP.
Bob's server is going to check if Alice also has the same mail server.
1. If they have same mail server then mail server will allow Alice to pull mail directly from it. An example for Inter organization mail communication.
2. If they have different Mail server then, Bob's mail server will send the mail to Alice's mail server and then Alice will be able to pull her mail from her mail server.

Q.What do you mean by a Port and a Socket?

Port: Just as the IP address identifies the computer, The network port identifies the application or service running on the computer.
1. A port number uses 16 bits and so can therefore have a value from 0 to 65535 decimal
Socket: A connection between two computers uses a socket. It is a tuple of IP address and Port.

Host A has an application running on port 7, It creates a socket or an endpoint to send /receive information.
Host B also has an application running on port 81, It creates a socket or an endpoint to send/ receive information.
Host A and B can communicate over internet/network using their respective sockets.

For more information on Sockets and ports,

Transport Layers:

1.Transmission Control Protocol (TCP):

The computer sending the data connects directly to the computer it is sending the data it to, and stays connected for the duration of the transfer.

A real life comparison to this method would be to pick up the phone and call a friend. Your friend picks up the phone and responds with a hello or a an acknowledgment that he his free to talk right now. When he gives you an acknowledgement, then you start speaking. You have a conversation and when it is over, you both hang up, releasing the connection.

This proves the point that TCP connections are:

A three way setup, connection request -> Acknowledgment -> Start sending data,
Connection Oriented
Reliable

TCP is a connection-oriented Layer. It moves data in a continuous, unstructured byte stream. Sequence numbers identify bytes within that stream.

Source Port and Destination Port fields (16 bits each) identify the end points of the connection.
Sequence Number field (32 bits) specifies the number assigned to the first byte of data in the current message. Under certain circumstances, it can also be used to identify an initial sequence number to be used in the upcoming transmission.
Acknowledgement Number field (32 bits) contains the value of the next sequence number that the sender of the segment is expecting to receive, if the ACK control bit is set.
Data Offset(a.k.a. Header Length) field (variable length) tells how many 32-bit words are contained in the TCP header. This information is needed because the Options field has variable length, so the header length is variable too.
Reserved field (6 bits) must be zero.
Flags field (6 bits) contains the various flags:
1. URG—Indicates that some urgent data has been placed.
2. ACK—Indicates that acknowledgement number is valid.
3. PSH—Indicates that data should be passed to the application as soon as possible.
4. RST—Resets the connection.
5. SYN—Synchronizes sequence numbers to initiate a connection.
6. FIN—Means that the sender of the flag has finished sending data.
Window field (16 bits) specifies the size of the sender's receive window .
Checksum field (16 bits) indicates whether the header was damaged in transit.
Urgent pointer field (16 bits) points to the first urgent data byte in the packet.
Options field (variable length) specifies various TCP options.
Data field (variable length) contains upper-layer information.

Applications where TCP is used are:

Emails
Online Transactions
Socket Programming

2.User Datagram Protocol (UDP):

UDP does not connect directly to the receiving computer like TCP does, but rather sends the data out and relies on the devices in between the sending computer and the receiving computer to get the data where it is supposed to go properly.

A real life comparison to this method would be to be send a letter to friend via postal service. You put the letter in the postbox, and then you hope it reaches the destination. Sometimes it does not reach the destination.

This proves the point that UDP connections are:

No acknowledgements are involved.
Fast but no reliable.

UDP is a connection-less Layer. It has Checksum value for basic detection of faulty data packets.

Source Port and Destination Port fields (16 bits each) identify the end points of the connection.
Length field (16 bits) specifies the length of the header and data.
Checksum field (16 bits) allows packet integrity checking or error detection in the packet.

Applications where UDP is used are:

YouTube
Online Images

Sliding Window Protocol:

A window is a space that holds or can hold multiple bytes. For example, if there is a window of 1000 bytes and the size of the individual message or frame is 100 bytes long. There could be a maximum of 10 messages at any time in the window.

The sliding window is a technique for sending multiple frames at a time. In this technique, each frame has sent from the sequence number. The sequence numbers are used to find the missing data in the receiver end. The purpose of the sliding window technique is to avoid duplicate data, so it uses the sequence number.

1.Go-Back-N Automatic Repeat Request(Go back N ARQ):

It is a data link layer protocol that uses a sliding window method. In this, if any frame is corrupted or lost, all subsequent frames have to be sent again.

If the receiver receives a corrupted frame, it cancels it. The receiver does not accept a corrupted frame. When the timer expires, the sender sends the correct frame again.

2.Selective Repeat Automatic Repeat Request(Selective Repeat ARQ):

The Go-back-N ARQ protocol works well if it has fewer errors. But if there is a lot of error in the frame, lots of bandwidth loss in sending the frames again. So, we use the Selective Repeat ARQ protocol. In this protocol, the size of the sender window is always equal to the size of the receiver window. The size of the sliding window is always greater than 1.

If the receiver receives a corrupt frame, it does not directly discard it. It sends a negative acknowledgment to the sender. The sender sends that frame again as soon as on the receiving negative acknowledgment. There is no waiting for any time-out to send that frame.

When we have network issue, then we reduce the window size by half to debug the issue. When the window size reaches 1, it is called as Silly window.

Flow Control:

Flow control methods are techniques that determine the data flow between sender and receiver. It determines how much data should be send by the sender before receiving acknowledgement. It makes sender to wait for some sort of an acknowledgement before continuing to send some more data.

Avoids the sender from overwhelming the receiver
Sender transmits the data slowly to the receiver

Algorithms Used for Flow control are Nagle, Karn and Patridge, and Jacobson.

Congestion Control:

Congestion is a situation in Communication Networks in which too many packets are present in a part of the subnet, performance degrades.

Congestion in a network may occur when the load on the network is greater than the capacity of the network.

Avoid senders from overwhelming the network
Transmits the data to the network slowly

Network congestion occurs in case of traffic overloading. We have two Algorithms to help to regulate rate of data transmission and reduces congestion,

1.Leaky Bucket:

Suppose we have a bucket with a hole at the bottom of it, now we may input water into the bucket at any rate but the water will pour at a constant rate from the bottom.

Java code for leaky bucket algorithm

2.Token Bucket:

To over come the problem of water overflowing in the leaky bucket. We use Token bucket, it has a finite set of tokens. Each packet is given a token, if token are finished then the packet has to wait in a queue, until token becomes available.

Java code for token bucket algorithm

It's recommended to first put a token bucket and then a leaky bucket in the network.

Q.How to reduce packet drops?

1.Additive Increase Multiplicative Decrease(AIMD):

It checks the traffic in the network and

If there is less traffic in the network, then increase the window size by 1.
If there is more traffic in the network, then decrease the window size by half.

2.Random Early Detection(RED):

It calculates the probability of a packet to be dropped, If the probability is high it will drop it otherwise will enqueue it.

For more Information on flow and congestion control:

Network Layer:

Bandwidth and Queue of the Router:

How to decide which source will be allowed to first input data packets in Queue router, to solve this we have three strategies

First in First out: Which ever data packet is first received by the Queue router will be sent on the output link. This approach does not work as different data packages have different sizes.
Priority Queue: Which ever data packet has more priority than the other will be sent on the output link.
Fair Queuing: Each source receives a time slice to send it's packet, for example source 1 and source 2 both have a time slice of a min.

IP Header:

IP header is the peace of information that is inserted by the IP layer while sending the network packet to the remote peer.

For a received message from the peer, the IP layer removes the header. The header information works as a piece of control information for the user data.
Mainly the IP header does end to end routing and ensures the quality of service.

The IP packet format consists of these fields:

Version field (4 bits) indicates the version of IP currently used.
IP Header Length (IHL) field (4 bits) indicates how many 32-bit words are in the IP header.
Type-of-service field (8 bits) specifies how a particular upper-layer protocol would like the current datagram to be handled. Datagrams can be assigned various levels of importance through this field.
Total Length field (16 bits) specifies the length of the entire IP packet, including data and header, in bytes.
Identification field (16 bits) contains an integer that identifies the current datagram. This field is used to help reconstruct datagram fragments.
Flags field (4 bits; one is not used) controls whether routers are allowed to fragment a packet and indicates the parts of a packet to the receiver.
Time-to-live field (8 bits) maintains a counter that gradually decrements to zero, at which point the datagram is discarded. This keeps packets from looping endlessly.
Protocol field (8 bits) indicates which upper-layer protocol receives incoming packets after IP processing is complete.
Header Checksum field (16 bits) helps ensure IP header integrity.
Source Address field (32 bits) specifies the sending node.
Destination Address field(32 bits) specifies the receiving node.
Options field (32 bits) allows IP to support various options, such as security.
Data field (32 bits) contains upper-layer information.

Fragmentation: To distinguish between the ending of a data packet and starting of the another one.

1.IPv4 addresses:

An IPv4 address is 32 bit address that uniquely and universally defines the connection of a device.

Address Space

An address space is the total number of addresses used by IPv4 protocol. If N bit address is used, the total addresses in the address space will be 2^N. IPv4 uses 32 bit addresses then the total number of addresses in the address space is

2³² = 42, 94,967,296

2.IPv6 addresses:

An IPv6 address is 64 bit address that uniquely and universally defines the connection of a device.

Address Space

An address space is the total number of addresses used by IPv6 protocol. If N bit address is used, the total addresses in the address space will be 2^N. IPv6 uses 64 bit addresses then the total number of addresses in the address space is

2⁶⁴ = 18,446,744,073,709,552,000

For more information on Logical addressing,

Routing:

Finding the least expensive way from source to destination for a data packet.

Static: A predefined route has been set from data packet A to B.
Dynamic: It dynamically finds the cheapest route from A to B at each node, the algorithm used here is Bellman ford algorithm.

Algorithm for Bellman Ford,

Initialize the distance from the source node S to all other nodes as infinite (999999999) and to itself as 0.
For every node in the graph do
For every edge E in the EdgeList do
Node_u = E.first, Node_v = E.second
Weight_u_v = EdgeWeight ( Node_u, Node_v )
If ( Distance [v] > Distance [u] + Weight_u_v )
Distance [v] = Distance [u] + Weight_u_v

Data Link Layer

Data-link layer uses some error control mechanism to ensure that frames are transmitted with certain level of accuracy. But to understand how errors is controlled, it is essential to know what types of errors may occur.

1.Error detection:

1.Parity check: It counts the number of 1s either they are even or odd.

2.Checksum: It calculates the hash value of the data packet, if the hash value is equal at both sides then data was error free.

3.Cyclic redundancy check(CRC): This technique involves binary division of the data bits being sent. The divisor is generated using polynomials. The sender performs a division operation on the bits being sent and calculates the remainder. Before sending the actual bits, the sender adds the remainder at the end of the actual bits. Actual data bits plus the remainder is called a codeword. The sender transmits data bits as codewords.

Error Correction:

1.Hamming code:

They are used to detect errors that helps in recovering the original binary word.

The basic concept of Hamming Code is to add Parity bits to the data stream to verify that the data received is correct or matches with the input. These bits are transmitted in such a way that they identify where the error has occurred in the data stream and rectifies it.

In order to detect ’d’ errors, it needs a code with (d + 1) bits and similarly to correct d errors it needs a code with (2d + 1) bits .These Codes uses extra Parity bits to identify the error.

For more information,

https://users.cis.fiu.edu/~downeyt/cop3402/hamming.html

Media Access Control:

A media access control is a network data transfer policy that determines how data is transmitted between two computer terminals through a network cable.

The essence of the MAC protocol is to ensure non-collision and eases the transfer of data packets between two computer terminals. A collision takes place when two or more terminals transmit data/information simultaneously.

1.Carrier Sense Multiple Access with Collision Avoidance (CSMA/CA)

Regulates how data packets are transmitted between two computer nodes. This method avoids collision by configuring each computer terminal to make a signal before transmission. The signal is carried out by the transmitting computer to avoid a collision.

Multiple access implies that many computers are attempting to transmit data. Collision avoidance means that when a computer node transmitting data states its intention, the other waits at a specific length of time before resending the data.

CSMA/CA is data traffic regulation is slow and adds cost in having each computer node signal its intention before transmitting data. It used only on Apple networks.

2.Carrier Sense Multiple Access with Collision Detection (CSMA/CD)

Carrier sense multiple access with collision detection (CSMA/CD) is the opposite of CSMA/CA. Instead of detecting data to transmit signal intention to prevent a collision, it observes the cable to detect the signal before transmitting.

Collision detection means that when a collision is detected by the media access control policy, transmitting by the network stations stops at a random length of time before transmitting starts again.

It is faster than CSMA/CA as it functions in a network station that involves fewer data frames being transmitted. CSMA/CD is not as efficient as CSMA/CA in preventing network collisions. This is because it only detects huge data traffic in the network cable. Huge data traffic increases the possibility of a collision taking place. It is used on the Ethernet network.

3.Demand Priority:

The demand priority is an improved version of the Carrier sense multiple access with collision detection (CSMA/CD). This data control policy uses an ‘active hub’ in regulating how a network is accessed. Demand priority requires that the network terminals obtain authorization from the active hub before data can be transmitted.

Another distinct feature of this MAC control policy is that data can be transmitted between the two network terminals at the same time without collision. In the Ethernet media, demand priority directs that data is transmitted directly to the receiving network terminal.‍

4.Token Passing:

This media access control method uses free token passing to prevent a collision. Only a computer that possesses a free token, which is a small data frame, is authorized to transmit. Transmission occurs from a network terminal that has a higher priority that one with a low priority.

Token passing flourishes in an environment where a large number of short data frames are transmitted. This media access control policy is highly efficient in avoiding a collision. Possession of the free token is the only key to transmitting data by a network node. Each terminal holds this free token for a specific amount of time if the network with the high priority does not have data to transmit, the token is passed to the adjoining station in the network.

Code is present at https://github.com/kakabisht/CN_Lab

Top comments (1)

Panchanan Panigrahi • Feb 23

You are a genius in computer networking.