Representing and Manipulating Information in modern computer - Part 2

#computerscience #softwareengineering #c #systems

Link to Part 1

Addressing and Byte ordering

A 4-byte int in a 32-bit machine stores all its 4 bytes in contigous sequence of bytes. It can be store in two way namely Little endian and Big endian depending on machine. Not going much into details a Little endian stores an int(4bytes on 32-bit) of hexadeciaml value 0x01234567 something like following(assuming starting address is 0x100):
Addresses/Values
0x100 67
0x101 45
0x102 23
0x103 01
and similary a Big endian will look like following:
Addresses/Values
0x100 01
0x101 23
0x102 45
0x103 67
I hope you can see the difference in the ordering. Linux 32bit, Windows, Linux 64bit follows Little endian whereas SunOS/SPARC follows Big endian.

This is important because when sending a message over a network from Little endian byte ordering machine to Big endian byte ordering machines and vice-versa could be an issue. Most of the programmers don't find it is an issue because networking applications are written in a way which does this convertions for us but if you are writting an network application, you might need to consider this.

Integer Arithmetic

You might be surpised to know, adding two positive number can result in a negative number and x < y can give you different result then x - y < 0.

Let me give you an example, lets say we have a computer which stores an int as 4-bit and we have two unsigned int x and y.

unsigned int x = 10; // binary rep: 1010
unsigned int y = 15; // binary rep: 1111
unsigned int z = x + y; // ???

The value of z is 25. right? right?

Well no. If you convert 25 into its binary representation, it comes out to be 11001 but as I mentioned our computer can only store 4-bit integers(values from 0-15 incase of unsigned). So, what will our computer do with the extra 1-bit? You are right, it will drop the higher-order bit(first bit from left) and we will get 1001 which converts to 9. This is same as doing module with 16 ie 25 mod 16=9. This behavior of computer not limited to arthmetic is also called Overflow.

But why I'm using unsigned int here? Will this addition behave differently with signed integers?

Answer: Yes but before explaining what will the result and how our computer ended up with that, lets first understand how signed and unsigned is different with our 4-bit size integer.

signed integers

They can store positive and negative both numbers values from -8(bin rep: 1000) to 7(bin rep: 0111). The higher-order bit(first bit from left) is the one which gives signed integer negative values and rest of the bits yields in positive. So, to get smallest number we need to flip higher-order to 1 and other bits 0 and to get largest number we need to flip higher-order bit to 0 and other bits to 1.

unsigned integers

They can only store positive numbers values from 0(binary rep: 0000) to 15(bin rep: 1111).

Now, because x=10 and y=15 will overflow before addition, we will use something smaller:
int x = 5; // 0101
int y = 6; // 0110
int z = x + y // ???

The binary representation should be 1011 if we ignore signed consideration. As you can see, the higher-order bit is flipped to 1 and from above, the value of z will be -5(= -1*2ˆ3 + 2ˆ1 + 2ˆ0) instead of 11.

and also, adding two negative can result in postivie. eg,
int x = -8 // 1000
int y = -5 // 0101
int z = x + y // ???

Now z will be -13 which is 10011 in binary(the higher-order bit is for negatives ie -1*2ˆ4 = -16) but our computer can only store 4-bits so it will drop higher-order bit and become 0011 which is 3 in decimal. Again, overflow.

This is why x < y could result differently from x - y < 0 if we does not handle arthmetic overflows properly. As a programmer, we should always pay attention while choosing datatypes by considering their capacties and behavior in different situations as they might become result of hours of debugging.

That's all for today. Please comment out if some information here is wrong or is missing. Thank you.