James Robb

Posted on Oct 24, 2021

Data types in Rust

#rust #beginners

Following on from the previous article in this series we will now take a look at the data types rust supports.

First let's define what a data type is:

In computer science and computer programming, a data type or simply type is an attribute of data which tells the compiler or interpreter how the programmer intends to use the data. [...] A data type constrains the values that an expression, such as a variable or a function, might take. This data type defines the operations that can be done on the data, the meaning of the data, and the way values of that type can be stored.

Source: Data Type Wikipedia Page

With this definition we can see that data types provide meaning to our code and allow us to set expectations of what data should come in and flow out of our applications.

There are 2 overarching categories of data types supported by Rust. These categories are scalar types and compound types. We will look at both of these categories in this article and explain what types lay within each category.

Scalar Types

A scalar type can be defined as any type that represents a single value. Rust supports four primary scalar types: integers, floating-point numbers, Booleans, and characters.

Integers

An integer represents a whole number with no fractional component. This means that 100 and -27 are integers but 123.45 and -91.2 are not.

Integer Types

The supported integer types are:

Bit Length	Signed Type	Unsigned Type
8-bit	i8	u8
16-bit	i16	u16
32-bit	i32	u32
64-bit	i64	u64
128-bit	i128	u128
arch	isize	usize

Sidenote 1:

The isize and usize types allow us to let the system decide what size our integers should be. For example if we are on a 32 bit system then an isize integer would be equivalent to an i32 whereas on a 64 bit system it would be equivalent to an i64, etc.

Signed integers hold a range from $-(2^{n-1})$ to $2^{n-1} - 1$ inclusive, where n is the bit length that variant uses. Unsigned integers on the other hand hold a range from 0 to $2^{n-1}$ .

As an example we can see the range of an i8 and u8 integer below:

Type	Lower Bound	Upper Bound
i8	-128	127
u8	0	255

It is important to be aware of these ranges because if you enter an integer that is out of range for the given type then Rust will panic as an integer overflow error will occur. The compiler tries to catch as many cases as possible for you but try to always consider which type is most suitable for the use case at hand to avoid any issues that could come up such as when working with third party data for example.

Sidenote 2:

One caveat to understand is that when Rust is compiled in release mode, it doesn't panic with an integer overflow error.

This behaviour is described within the integer overflow documentation. In short though, instead of panicking, in release mode, Rust will opt to allow the integer overflow to occur using two’s complement wrapping.

This simply means that when a value is too high for a specific signed or unsigned integer type, it will roll forwards to the lowest possible value for that type. If however a value is too low for a specific signed or unsigned integer type then it will roll backwards to the highest possible value for that type

Example 1 (u8: 0 to 255): 256 -> 0
Example 2 (u8: 0 to 255): 257 -> 1
Example 3 (i8: -128 to 127): -129 -> 127
Example 4 (i8: -128 to 127): -130 -> 126
Example 5 (i8: -128 to 127): 130 -> -127

Please refer to the integer overflow documentation for more information on this topic.

Sidenote 3:

Rust integers have methods that can check for overflow via the checked_* functions, for example: checked_mul.

All of the checked_* functions will return an Option where the value could be Some(value) or None. The None type in this case would represent an overflow occurring. This means that we can be sure in advance that if an overflow occurs, we have handled it properly!

Another variation on handling overflows can be seen in the saturating_* functions such as saturating_mul for example.

All of the saturating_* functions will just stop at the upper or lower limit instead of returning an Option.

Example usage of each integer type:

  // Signed integers
  let signed_8: i8 = -1;
  let signed_16: i16 = 2;
  let signed_32: i32 = -3;
  let signed_64: i64 = 4;
  let signed_128: i128 = -5;
  let signed_size: isize = -5;

  // Unsigned integers
  let unsigned_8: u8 = 1;
  let unsigned_16: u16 = 2;
  let unsigned_32: u32 = 3;
  let unsigned_64: u64 = 4;
  let unsigned_128: u128 = 5;
  let unsigned_size: usize = 5;

Here we have declared a list of signed and unsigned integers of each possible type variation. The type is declared by using the : <type> declaration after each variable name but when declaring variables we don't always have to manually add the type because rust will automatically assume an integer to be of type i32 unless otherwise stated.

Sidenote 4:

A signed integer is an integer that can be negative or positive whereas an unsigned integer can only ever be positive.

Sidenote 5:

Signed integers are stored using the two’s complement method.

Integer Literals

Integer literals are a way of writing integers in different notations, rust supports 5 notations out of the box:

Number Literal Type	Example	Decimal Equivelant
Binary	0b1111_0000	240
Byte (u8 only)	b'A'	65
Decimal	253	253
Hexadecimal	0xff	255
Octal	0o77	63

Sidenote 6:

All integer literals except the byte literal allow a type suffix to be used, such as -29i8 to cast -29 to an i8 or 254u8 to cast 254 as a u8.

Integer literals also support _ as a visual separator, such as 2_53 for 253 or 1_021_000 for 1,021,000. This separator can be of any length meaning that, for example, 1_________0 is perfectly acceptable, to the compiler at least, as a representation for 10.

Floating Point Numbers

A floating point number or float for short is any number with a fractional component. For example 12.75 and 3.1 are floats but 10 and 287 are not.

There are two kinds of floating point numbers supported in rust:

Bit Length	Type
32-bit	f32
64-bit	f64

The default type used by Rust, if no type decorator is added, is f64 because on a modern CPU it’s roughly the same speed as an f32 but is capable of far more precision.

Example usage of each floating point type:

 let float_32: f32 = 1.1;
 let float_64: f64 = 3.5;
 let another_float_64 = 3.5;

Floating-point numbers are represented according to the IEEE-754 standard by Rust. The f32 type is a single-precision float, and f64 is a double-precision float.

Sidenote 7:

Floating point numbers support can use _ as a visual separator just like integers, for example: 1_234.56.

Booleans

Booleans represent either a true or false value and take up exactly 1 byte of memory due to true being represented as a 1 and false as a 0 internally.

We can see below how we can assign booleans with or without a type prefix:

  let boolean_true = true;
  let boolean_false: bool = false;

Boolean data allows us to test if a statement is true or false and booleans are generally used within a control flow.

Most of the time you won't manually write true or false either but instead use logical operators against values to test their truthiness, for example:

  let number = 3;

  if number == 3 {
    println!("condition was true");
  } else {
    println!("condition was false");
  }

In this example we state that if the statement 3 is equal to 3 is true then we should print out "condition was true" and if it is false then we should print out "condition was false". Of course in this case it is true and so "condition was true" will be printed.

Characters

A character or char as it is known in Rust represents a single unicode scalar value and takes up four bytes when allocated to memory.

Sidenote 8:

I don't want to get too deep into what unicode is as that is not really relevant to this article but if you are interested, you can look at the characters unicode supports for yourself.

To create a char we use single quotes around the character we wish to represent. For example:

  let a = 'a';
  let one = '1';
  let rabbit = '🐇';
  let warning = '⚠️';
  let japanese_katakana_n = 'ン';

As you can see, each char is a single character representing a unicode compliant value such as a letter, number, emoji or characters of other non-latin languages such as Japanese as shown in the above example thanks to Unicode having these characters supported in the standard too!

Sidenote 9:

Here you will find a list of all languages supported by unicode.

Compound Types

Now that we have looked into scalar types we can move onto what is known as compound types. A compound type can group multiple values into a single representation and rust has two primitive compound types: tuples and arrays.

Tuples

A tuple allows us to group together values of different types and once declared cannot grow or shrink because tuples have a fixed length once defined.

A tuple is represented by rounded brackets containing values, for example: ('a', 1, 2.4).

We can also manually add a type definition if we want to, like so:

  let tuple: (char, u8, f32) = ('a', 1, 2.4);

We don't need to add the manual type annotation unless we wish to be more precise about the types of our values than Rusts inferred type system is.

Tuples also support a nice feature known in most languages as destructuring, for example:

  let user = ("James", 27);
  let (name, age) = user;
  println!("{} is {} years old", name, age);

This is nice because it allows us to provide a name to each item in our tuple instead of directly accessing the value.

If we want to directly access values though, tuples are zero indexed data structures and so we can access data like so:

  let user = ("James", 27);
  println!("{} is {} years old", user.0, user.1);

This works exactly the same as before by taking the first and second values in the tuple and outputting them but personally I would always use the destructured version as it is more descriptive as to what the data represents.

Arrays

Unlike tuples, arrays must contain values of the same type but are otherwise quite similar in that they also have a fixed length and cannot grow or shrink once defined.

An array uses square brackets and elements within are seperated by a comma, for example: ['a', 'b', 'c'].

Arrays are great when you want to guarantee a uniform data type within the collection or if you don't want the collection to change size over time.

Sidenote 10:

A lot of developers avoid arrays when beginning with Rust because in Rust, unlike JavaScript for example, arrays are inflexible and so they reach for a vector instead.

Vectors are great for many use cases but should not be used for all circumstances because we want to be as strict as possible with our types.

If there is a case where limiting the amount of values in the collection or guaranteeing a uniform data type for members of the collection is important, just use an array!

One example use case shown to us in the Rust docs for using an array would be for storing a list of months:

  let months = ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"];

Even though Rust will infer the types for us, we can declare the type of data our array should contain and how many elements the array should hold too. Expanding the last example we could write:

  let months: [&str; 12] = ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"];

This explicitly says that our list contains elements of type &str and there should be 12 of them altogether.

Sidenote 11:

We stated above that the variable months is of type [&str; 12] but one thing worth noting is that if the value type &str or the given length 12 or both change, the entire type itself changes as far as the compiler is concerned. For example:

If variable A is of type [i32; 3] and variable B is of type [i32; 4], these are totally different types according to the compiler, even though both represent an array of i32 values.

Another cool trick that arrays can do is repeating a value a set amount of times succinctly. The following code will generate us an array with 3 elements, each holding the char value 'a':

  let months = ['a'; 3]; // -> ['a', 'a', 'a']

We can also index arrays just like we could with tuples but instead of using . notation we use square bracket notation:

  let alphabet = ['a', 'b', 'c'];

  let a = alphabet[0];
  let b = alphabet[1];
  let c = alphabet[2];

We can see here that arrays, just like tuples, are zero indexed data structures. That is to say, the first element is at index 0, the second element is at index 1, and so on.

Sidenote 12:

If you try to access an array index which does not exist the program will panic at runtime, not compile time, and stop your application from executing any further. Thus, be careful when trying to access values by their index by adding a check to make sure that the index you want to use is within range!

The Rust array data type documentation describes the reasons behind this behaviour in more detail.

We will discuss more about Rust guarantees, errors, compilation and more in the future as we progress in this series.

We can now see that arrays are a useful data structure for our toolbelt when developing applications with Rust and serve a niche where a collections value type uniformity and length are an important aspect for the items we wish to store in memory.

Conclusions

Rust gives us a lot of types out of the box, each bringing their own use cases and value to the table.

As we continue through this series we will eventually start working with custom types and sub types such as &str and collections such as String, Vector and HashMap but until then this should give you a good overview of the initial building blocks that these types and collections inevitably themselves end up using.

In the next article we will look at functions in Rust: What they are, how we work with them, when they are useful to use, etc.

I look forward to seeing you then and as ever, feedback and questions are always welcome 😊!

Top comments (4)

Akash • Dec 19 '21

I was just wondering if the build size of applications written in a language that has a type system smaller than those built w/o one? coz if you can declare the type to something like unsigned integer, then you know that the value wont be a negative and consequently might end up saving some space, am i right? or will it is the runtime memory that will be affected and not the build size?

James Robb • Dec 19 '21 • Edited

I don't think your question is specific enough to give a solid answer to, could you reiterate perhaps with an example of what you wish to be compared?

Languages, compilers, etc are a very complex topic and comparing JavaScript to Rust, for example, would be like comparing the sun to the moon. It just doesn't work that way since they are in seperate classes.

If you can focus your question down though, I will happily answer!

Akash • Dec 21 '21

Sure. The question of build size struck me from the thought that we can restrain the variable range in certain languages beforehand like making it unsigned. This might mean that it wont have to store the sign bit (not sure, just pondering and speculating) and thus perhaps may save some memory. Also, since we know more information before hand, then maybe operations on such data types performed when using pre-built helper functions could be performed in a more optimised way under the hood which we might know about. Small differences like these could possibly be playing a role in the build of the application. Ofc things also depend upon how good the code is written and yes I understand that compilers and languages are a very very complex topic. Its just that I was wondering if things like this could be impacting the build size.

James Robb • Dec 23 '21 • Edited

It really depends on the language, compiler and compile target at hand.

Example 1: C# / Java compile to intermediate languages which run on the CLR and JVM respectively
Example 2: ELM compiles to JS which executed in the browser
Example 3: Rust compiles to an executable and can be customised for specific compilation targets such as the x86 64 bit MSVC compile target which I use on my windows machine for example

etc..

Types could indeed decrease output sizes when compared to non-typed languages if the compiler optimises for such scenarios but their main use case is to statically find bugs at the compilation step itself. If the compiler then optimises the code that it believes is now safe, that is a different subject entirely. In general compiled languages provide smaller footprints and more performant outputs in my experience but as always, it is a nuanced subject and isn't always going to be the case. As a generalisation from my experience though, one could state such a thing with the contextual knowledge of such nuances though.