DEV Community

Cover image for Computer System Cp2.2 Truncating and expanding in C
Yuehui Ruan
Yuehui Ruan

Posted on

Computer System Cp2.2 Truncating and expanding in C

If there is a question asking you, what these binary number means in C? (Basing on 32-bit machine, and we defaultly regrad that we are using 32-bit machine, except for spcially mention we are using others)

  • 1011?
  • 0100 0001?

Actually, it's hard to be determined, due to the fact that I haven't told you what data type you are using to contain them in this 32-bit machine.

That is, what binary number 1011 actually is, is basing on the data type that you want to assign.
In different data type, for example:

  • unsigned integer: 11
  • signed integer: -5 (the difference of the range of signed and unsigned are described in chapter 2.1)

And, we have 0100 0001:

  • unsigned integer: 65
  • char: 'A'

That is, the data type you declare in C program, will be determined to how this 32-bit machine "read".

  • If you declare unsigned int a = 65; a is printed as integer 65
  • If you declare int *p = 65; p is actually an address number 65, pointing a memory place containing an int variable.
  • If you declare unsigned int c = 'A'; c is actually 65, because when a char type variable is assigned to an int type, the machine will "read" this memory as an integer, where the ascii code of A is 65.

But, how it works?

I mean, we already know from chapter 2.1 that, char using 1 byte (that is 8 bits) to store the value, and unsigned int use 4 bytes (that is 32 bits) to store the value.

So now, we are going to thinking questions about value casting and bits truncating (extanding) between different data type in C.

There are two kinds of casting, obviously having upcasting and downcasting. However, when having casting problem, there are also be with Extanding and Truncating.

  • Upcasting: Low byte(s) cast to high bytes.
  • Downcasting: High bytes cast to low byte(s).
  • There is also casting value between same byte:
    signed int => unsigned int
    or
    unsigned int => signed int

  • Extanding:

    Happens in low bytes to high bytes. When a low-byte variable is cast to high-byte variable, the new high bits of high-byte variable will be extanded by

  • 0: if the original value is positive

  • 1: if the original value is nagetive

Example 1:

// In memory, c is 0100 0001, which is 1 byte (8 bits).
unsigned char c = 65;

unsigned int a = c; // Upcasting and extanding

//Now in memory, due to c is originally positive 
//number, variable a will be extanding by 0 on new high bit, 
//that is, adding 0s on the left(significant 
//position) till filling all 4 bytes (32 bits)
//a in memory:
//0000 0000 0000 0000 0000 0000 0100 0001

printf("%d", a);
//65
Enter fullscreen mode Exit fullscreen mode

Example 2:

signed char c = -1;
signed int a = c;
printf("%d", a);
//-1
//Explaination:
//c in memory stored as:
//1111 1111
//(The way to figure out how to get the nagetive value from decimal, 
//please read the last chapter, section about 2's complement)
//Then, upcasting the 1111 1111, adding 1 (because original value 
//is nagetive value) on high bit position, so we have this in 4 bytes:
//1111 1111 1111 1111 1111 1111 1111 1111
Enter fullscreen mode Exit fullscreen mode

Answer by calculator:
Image description

Truncating:

How about when a high-bit variable casted to a low-bit variable? Obviously, the low-bit variable can not contain that many bits. So it must do the truncating on high bit position.

Example 3:

unsigned int a = 129;
//which is 0000 0000 0000 0000 0000 0000 1000 0001
//stored in memory, in 32-bit machine
//now I want to downcast it to signed char
signed char c = a;
printf("%d", c);
//what is the answer?
//It's -127
Enter fullscreen mode Exit fullscreen mode

Explaination:
Always remember one sentence:
The data type decide the way that the computer reads!
No matter how this value being stored in memory, the real value that computer will use just depends on how it reads this value.

Even though 129 is stored as an unsigned integer like:
0000 0000 0000 0000 0000 0000 1000 0001
But eventually the computer reads this value as a data of signed char, that is, making this value being read only 8 bits from the lowest bit position, which is only
0000 0000 0000 0000 0000 0000 1000 0001
And, 1000 0001 in signed char is -127.
Image description

Casting between *signed * and *unsigned *:

If you follow these rules, you can hardly get lost:

  1. It's only about how computer reads!
  2. Remember, the most significant position of signed value is describing if it is positive(0) or nagetive(1).

Besides, bit operaters like |, &, <<, >>, <<< should be familiar with, even they are seldom used in our program. But, it did be used so many in like registers, memory allocation and ALU, which are located in the deeper level of system.

If you have any questions, feel free to comment here!

Latest comments (0)