DEV Community

loading...

C Struct and Union

freethebit profile image Bruno Diniz ・11 min read

TOC


Struct vs Union

a structure is a type consisting of a sequence of members, whose storage is allocated in an ordered sequence, and a union is a type consisting of a sequence of members whose storage overlap.

N1570 - 6.7.2.1

The word structure refers to struct, union is not a structure because, during runtime, it only holds a value of one member at a time.

Components

Type-specifier Identifier { Declaration-list } Tags ;

typedef Type-specifier Identifier { Declaration-list } Alias ;

Type-specifier

Keywords struct or union.

Identifier

The name given to a struct or union, this is not a variable, its just a name of the model:

struct s { int x; }; // 's' is the identifier
union u { int x; };  // 'u' is the identifier
Enter fullscreen mode Exit fullscreen mode

The identifier is optional if the struct or union have 1 or more tags:

// No identifier
struct { int x; } z; // 'z' is a tag
union { int x; } w; // 'w' is a tag
Enter fullscreen mode Exit fullscreen mode

No identifier for anonymous struct[C11] or anonymous union[C11].

struct a { // Regular struct, has identifier 'a'
  struct { int x; }; // Anonymous struct, no identifier and no tag
  union { int y; }; // Anonymous union, no identifier and no tag
};
Enter fullscreen mode Exit fullscreen mode

Declaration-list or Members

Can be one or more, same or different types, plus bit-fields.

The members are delimited by curly braces: Left Curly Bracket { at the beginning, and Right Curly Bracket } at the end.

Multiple members of the same type can be declared together, programmer style choice:

struct s { int x, y, z; };
union u { int x, y, z; };
Enter fullscreen mode Exit fullscreen mode

A struct or union can't have a instance of itself, but a pointer to itself is valid. Example, data structures: lists, linked lists, and many more:

// πŸ’©
struct s { struct s z; };  // instance of itself

// πŸ‘
struct s { struct s *z; }; // pointer to itself

// πŸ‘
struct Node {
  int value;
  struct Node *next;       // pointer to itself
  struct Node *prev;       // pointer to itself
};
Enter fullscreen mode Exit fullscreen mode

For struct only. The only incomplete type valid as a member is flexible array type and must be the last member, because the members are stored in sequence. It can’t be the last and only member, the struct must have at least one complete type before the flexible array. Avoid VLA's at all cost.

// πŸ‘
struct s {
  int x;
  int z[];
};

// πŸ’©
struct s {
  int z[]; // Flexible array is not the last member
  int x;
};

// πŸ’©
struct s {
  int z[]; // Flexible array is the last member, but is the only member
};
Enter fullscreen mode Exit fullscreen mode

Tags

Optional. None, one or many tags separated by comma ,. But if you have a typedef the tag part is not a tag anymore, it is an alias for that type.

struct { int x; } x, y; // 'x' and 'y' are tags
union { int x; } z, w;  // 'z' and 'w' are tags
Enter fullscreen mode Exit fullscreen mode

⚠️ [C11] An anonymous struct or anonymous union does not have tag.

Alias

Exist only if a typedef is used, means that a new name for this boring_name is an awesome_name. Using typedef in the same line if you have tags, the tags part are not tags anymore, instead they are alias an behave different than a tag.

struct s { inx x; } y;         // 'y' is a tag
typedef struct s { inx x; } y; // 'y' is an alias
union u { inx x; } y;          // 'y' is a tag
typedef union u { inx x; } y;  // 'y' is an alias
Enter fullscreen mode Exit fullscreen mode

Using Tag and Alias together:

#include <stdio.h>

struct s { int x; } S; // 'S' is a tag
typedef struct s SS;   // 'SS' is an alias

int main(void) {
  SS myStruct;
  myStruct.x = 10;

  S.x = 15;

  printf("Tag: %d\n", S.x);
  printf("Alias: %d\n", myStruct.x);

  return 0;
}
Enter fullscreen mode Exit fullscreen mode

Struct, Union, Members and Semicolon

The struct and union and all members shall be terminated by semicolon character ;.

// πŸ’©
struct s { int x }
union u { int x }
struct s { int x; }
union u { int x; }

// πŸ’©
struct s {
  int x;
  char z
};

union u {
  int x;
  char z
};

// Some compilers accept this style if you have only 1 member
struct s { int x };
union u { int x };

// πŸ‘
struct s { int x; };
union u { int x; };
Enter fullscreen mode Exit fullscreen mode

Declaring and Defining a Struct and Union

Forward declaration in form of struct x; or union x;, at this point, x is considered an incomplete type. A struct or union is only considered complete type after }.

struct s1;          // Incomplete type
union u1;           // Incomplete type
typedef struct s1 S;    // Incomplete type, 'S' is only an alias to incomplete struct 's1'
typedef union u1 U;     // Incomplete type, 'U' is only an alias to incomplete struct 'u1'
struct s2 { int x; };   // Complete
union u2 { int x; };    // Complete
struct { int x; } s3;   // Also complete, 's3' is a tag
struct { int x; } u3;   // Also complete, 'u3' is a tag
Enter fullscreen mode Exit fullscreen mode

At the same file and scope, a tag and an identifier with the same name. They are two different struct and two different union, yes, C allow this behavior. Avoid this style.

// Valid, but πŸ‘Ž πŸ’© 😠
struct s { int x; };
typedef struct { int x; } s;
union u { int x; };
typedef union { int x; } u;
Enter fullscreen mode Exit fullscreen mode

At the same file and scope the example bellow is invalid, because they have the same identifier and typedef is trying to redefine the struct s and union u that is already defined.

// πŸ’©
struct s { int x; };
typedef struct s { int x; };
union u { int x; };
typedef union u { int x; };
Enter fullscreen mode Exit fullscreen mode

May be compiled, but the keyword typedef is useless, because there is no alias. If you have a typedef is because you want an alias. To get an error on this, and not only warning, for clang you need the flag -Werror.

// πŸ’©
typedef struct s { int x; };
typedef union u { int x; };
Enter fullscreen mode Exit fullscreen mode

Declaring Variable and Accessing the Content

struct example:

#include <stdio.h>

struct a { int x; };
typedef struct a aa; // Alias 'aa' for the struct 'a'
struct { int x; } b; // Tag 'b'
typedef struct b bb; // Compile, but unusable.
struct c { int x; } C; // struct 'c' and tag 'C'
typedef struct { int x; } d; // Alias 'd'
typedef struct e { int x; } ee; // struct 'e' and alias 'ee'

int main(void) {
  struct a a1; // 'a1' is a variable of struct type 'a'
  a1.x = 1;

  aa a2; // 'a2' is a variable of struct type 'a'. 'aa' is an alias of 'struct a'
  a2.x = (++a1.x);

  b.x = (++a2.x); // You can't do 'b b1', because 'b' is not a type, but has a type.

  struct c c1;
  c1.x = (++b.x);

  C.x = (++c1.x);

  d d1;
  d1.x = (++C.x);

  struct e e1;
  e1.x = (++d1.x);

  ee e2;
  e2.x = (++e1.x);

  printf("%d\n", e2.x);    
  return 0;
}
Enter fullscreen mode Exit fullscreen mode

union example, because union members overlap the memory every time it is assigned something, you need attention on the usage sequence:

#include <stdio.h>

union e {
  int x;
  char y;
};

int main(void) {    
  union e u;

  u.x = 30200; // Remember not to cause signed integer overflow
  printf("%d\n", u.x);
  u.y = 'a'; // After this point, you will get garbage trying to read u.x
  printf("%c\n", u.y);
  printf("%d\n", u.x);
  u.x = 11111; // After this point, you will get garbage trying to read u.y
  printf("%d\n", u.x);
  printf("%c\n", u.y);
  return 0;
}
Enter fullscreen mode Exit fullscreen mode

Size of a Struct and Union

struct size is defined by all members, plus compiler optimization in the form of hidden padding bits for performance reasons.

struct s {
  int x;
  char y;
};
Enter fullscreen mode Exit fullscreen mode

In the example above the size will be the size of an int plus char plus hidden padding bits that a compiler judges its needed.

union size is defined by his largest member, plus compiler optimization in the form of hidden padding bits for performance reasons.

union u {
  int x;
  char y;
};
Enter fullscreen mode Exit fullscreen mode

In the example above the size of the union u will be the size of an int, and not int plus char.

Anonymous Struct and Union [C11]

It is a anonymous struct or anonymous union if it have no identifier and no tag, and exist only inside another struct or union.

struct s { // Regular struct
  struct { int x; };     // Anonymous struct, no identifier and no tag
  union { int y; };      // Anonymous union, no identifier and no tag

  struct a { int x; };   // NOT Anonymous struct, has an identifier 'a'
  union b { int x; };    // NOT Anonymous union, has an identifier 'b'
  struct { int x; } c;   // NOT Anonymous struct, has a tag 'c'
  union { int x; } d;    // NOT Anonymous union, has a tag 'd'
  struct e { int x; } E; // NOT Anonymous struct, has an identifier and a tag
  union f { int x; } F;  // NOT Anonymous union, has an identifier and a tag
};

union u { // Regular union
  struct { int x; };    // Anonymous struct, no identifier and no tag
  union { int y; };     // Anonymous union, no identifier and no tag
};
Enter fullscreen mode Exit fullscreen mode

Members of a anonymous struct or union can't have the same identifier name of the members of the parent struct or union, because their scope is the parent struct or union scope. A regular struct or union have its own scope. And this also define the way you access the members.

struct {
  int x;
  struct { int x; };   // πŸ’©, same name of parent member identifier
  struct { int z; };   // πŸ‘
  struct { int x; } y; // πŸ‘, regular struct
} s;
Enter fullscreen mode Exit fullscreen mode

Accessing values:

#include <stdio.h>

struct {
  int x;
  struct { int z; };
  struct { int x; } y;
} s;

int main(void) {
  s.x = 5;
  s.z = 10; // Access the same way as any parent member
  s.y.x = 15;

  printf("%d, %d, %d\n", s.x, s.z, s.y.x);

  return 0;
}
Enter fullscreen mode Exit fullscreen mode

Until now can't find any useful situation for anonymous struct inside another struct, if you find let me know. But an anonymous struct inside an union is useful:

enum conType { USB, SERIAL, WIFI, };

struct usbConnection { int x; };
struct serialConnection { int x; };
struct wifiConnection { int x; };

struct connection {
  enum conType type;
  union {
    struct usbConnection usb;
    struct serialConnection serial;
    struct wifiConnection wifi;
  };
};

int main(void) {
  struct connection con;    
  con.usb.x = 10; // Anonymous union allow easy access
  con.type = USB;

  return 0;
}
Enter fullscreen mode Exit fullscreen mode

Another example, trying to achieve some kind of polymorphism:

#include <stdio.h>

#define LENGHT 10
enum NumberType { INT, FLOAT, DOUBLE };

struct saveData {
  enum NumberType type;
  union {
    int     iNumbers[LENGHT];
    float   fNumbers[LENGHT];
    double  dNumbers[LENGHT];
  };
};

struct wasteData {
  enum    NumberType type;
  int     iNumbers[LENGHT];
  float   fNumbers[LENGHT];
  double  dNumbers[LENGHT];
};

void ProcessData( struct saveData *d ){
  switch(d->type) {
    case INT:
      printf("Int\n");
      break;
    case FLOAT:
      printf("Float\n");
      break;
    case DOUBLE:
      printf("Double\n");
      break;
  };
}

int main(void) {
  struct saveData sData;    
  sData.type = DOUBLE;    
  ProcessData(&sData);

  struct saveData sData1;
  struct wasteData sData2;    
  printf("sData1: %ld\nsData2: %ld\n", sizeof(sData1), sizeof(sData2));

  return 0;
}
Enter fullscreen mode Exit fullscreen mode

In a situation like this, if you have millions of requests processing some kind of data, using union you can reduce the amount of resources (space, memory, processing times, money) needed to achieve an objective. Also not forget padding bits concept. In the example above you can switch the usage of enum to a mix of #define and char to save some bytes if needed.

Bit-field

It is a member of struct or union, with explicit width in bits. Before you be able to correctly handle bit-fields you need understand the concepts of allocation unit, padding bits and memory alignment.

⚠️ Bit-field is NOT portability friendly. The behaviors and results of the usage of a bit-field are tied to implementation defined rules.

⚠️ The use of bit-fields as member of union is not recommended, because there is no valid reason to do that. But if you want, the same rules for bit-field as member of struct apply to union, except the fact that only 1 member can exist during runtime. Because of this bit-field examples only use struct.

Components of a bit-field

type identifier : width ;

Type

Valid bit-field types:

  • int. Plain int may be signed int or unsigned int, because is implementation-defined.
  • signed int
  • unsigned int
  • _Bool. Single bit field, can be only 0or 1. Even that _Bool is defined to have width of CHAR_BIT.
  • implementation-defined types
  • atomic types [C11]. Implementation-defined.

Identifier

Variable name.

Width

The width of a bit-field is not dynamic, has to be set during declaration, must be a nonnegative integer constant, and must not exceed the bits of the type specified.

// πŸ’©
struct s { _Bool x: 5; };  // _Bool is a single bit-field, only 0 or 1 is accepted
struct s { int x: 1.2; };  // Not an integer
struct s { int x: -1; };   // Negative width
struct s { int x: 256; };  // 256 bits exceed the max bits of a int that is 32 bits

const unsigned int A = 10;
struct s { unsigned int x: A; }; // A is not integer constant

// πŸ‘
struct s { _Bool x: 1; };

#define A 10;
struct s { unsigned int x: A; }; // A is translated to a integer constant 10 before compillation of the source code.

// πŸ‘, enum
enum { A = 10 };
struct s { unsigned int x: A; }; // A is an integer constant, also works well.

enum { A = 10 } z;
struct s { unsigned int x: A; }; // A is an integer constant, also works well.

// πŸ’©, enum
enum { A = 10 } z;
struct s { unsigned int x: z.A; }; // Tag z is NOT an integer constant.
Enter fullscreen mode Exit fullscreen mode

Storage Behavior

The allocation of a bit-field is not always done at the first position of the byte because of allocation unit. Multiple bit-fields in sequence are packed together if there is enough space. It is implementation defined if the bits a packed left-to-right or right-to-left, endianness related.

Next examples, the premises are that unsigned int have 4 bytes, in total 32 bits to mess around.

struct s { int x : 20, y : 3; }; //Bit field
struct s { int x, y; };          // Any other type
Enter fullscreen mode Exit fullscreen mode
Description from example above.
Bit-field I want x and y to be the type of int. The allocation will only allocate one int, because it is enough to hold the space required by x and y.
Any other type The allocation is done by reserving space for x int and y int, total 2 int using 64 bits. In this case using double the memory than the bit-field.

If you change x value to 30, it will be allocated 64 bits, because only one int is not enough:

struct s { int x: 30, y: 3; };
Enter fullscreen mode Exit fullscreen mode

For some reason you want an empty space between some bit-fields, use : + number of bits:

struct {
  unsigned int x: 4, :5, y: 1;
} s;
Enter fullscreen mode Exit fullscreen mode
  • x use 4 bits.
  • :5 without identifier means that 5 bits are empty and will not be used but are reserved.
  • y use 1 bit.

A bit-field of any type with value 0 and no identifier :0, between other bit-fields of the same type, means stop padding bits, and the next bit-field start in the beginning of the next allocation unit:

// πŸ‘, sizeof is 8 for both styles.
struct { unsigned int x: 3, :0, y: 4; } s; // 

struct {
  unsigned int x: 3;
  unsigned int :0;
  unsigned int y: 4;
} s;

// πŸ’© 
struct {
  unsigned int x: 3;
  :0; // Not compile because no type was specified.
  unsigned int y: 4;
} s;

struct {
  unsigned int x: 3;
  _Bool :0; // Normal compilation, but the desired behavior (break) does not work, because the type is different.
  unsigned int y: 4;
} s;
Enter fullscreen mode Exit fullscreen mode

To control the amount of bits to be packet without compiler interference use #pragma pack.

Storing and Accessing Bit-field Values

The unary operator & ( Address of) can’t be used with bit-field, this is why pointer or array of bit-field does not exist. Also sizeofor alignas[C11] will not work in a bit-field member, but is valid to sizeof a struct that contain one or more bit-fields.

#include <stdio.h>

struct { unsigned int x: 3; } s;

int main(void) {
  s.x = 7; // Assign a decimal 7
  printf("%d", s.x); // Output decimal 7
  return 0;
}
Enter fullscreen mode Exit fullscreen mode

If you replace 7 to 8 or any number bigger than decimal 7, it will not work in this case. Because the bit-field has a width of 3 bits, that can hold the binary representations of the number 0 - 7. The number 7 in bits it is 111 and it is enough to hold the number 7, but the number 8 need 4 bits.

This is a very error prone operation, because integer overflow can easy happen. Some implementation define the behavior of unsigned int overflow be always 0 , but signed int overflow can cause undefined behavior.


References

Discussion (0)

pic
Editor guide