DEV Community

Cover image for Where is Your Data?
John Robertson
John Robertson

Posted on • Updated on

Where is Your Data?

Introduction

Making source code easier to read without significantly increasing its size is always a win. For C and C++ I've developed a simple convention to make clear the distinction between data residing in one of two types of static data segments, data which resides on the stack, and the visibility of static data (global or single source file). This is very useful when writing multi-threaded apps.

Possible Data Locations

In terms of where data can exist within a given process space, there are only four locations:

  1. Static data segment
  2. Thread specific static data segment
  3. Heap
  4. Stack

That's it. Really. If you are writing a multi-threaded application, it is vitally important that you know the location of all data. No amount of OOP wizardry can defer for very long your need to know this. It is to your advantage to leverage each of these locations for appropriate purposes.

Globally Visible Static Data

Despite what programming language ideologues may say about the evils of global static data, I find some is usually necessary when writing non-trivial applications. Also, I've found it useful in my source code to make clear whenever global data is involved. Here is how I do this, usually in a file named: "NameOfProject.h"

struct Global {
   int nClients; // or whatever
   /* and so on */
};
extern struct Global G;

Now when I need to do something with nClients, it must appear prefixed with 'G.'

printf("Client count is: %d\n", G.nClients);

You will of course need to instance 'G' in a .c or .cc file somewhere.

Single Sourcefile Static Data

In C and C++, (outside of any class declarations) the "static" keyword is overloaded; in the context of data instances, it means that the data will be located in the static data segment, and that the symbol will not be visible in the corresponding object file for the purpose of linking; which is to say that the symbol is invisible to other source files. This is a lot simpler than other mechanisms for scoping static data, and it works in both C and C++. You will need to place code similar to this in your .c or .cc file:

static struct {
   /* Correction specifies _Atomic.
    * Thanks Andrew Clayton for pointing out
    * this omission!
    */
   _Atomic int n_someFuncCalled;
   /* Other stuff */
} S;

int someFunc()
{
   ++S.n_someFuncCalled;
   /* Works no matter how many threads */
   return 0;
}

Here the source file scoped data which resides in the static data segment is prefixed with 'S.'.
Likewise, for thread-specific static data you can do this:

static _Thread_local struct {
   int n_someFuncCalled;
   /* count is only for the current thread */
} TS;

Accessing this data requires the prefix 'TS.' Data found in TS resides in the static data segment which is unique for each thread.

Reduce Function Argument Counts

One advantage of using the static data segments is reducing the number of arguments you must pass to some functions. This can, of course, be exploited to write sloppy and unmaintainable code. On the other hand, with proper understanding it can be used to write more robust and efficient code.

Conclusion

I cannot overstate the importance of knowing where your data resides in a multi-threaded process. This article presents a simple and practical way to represent this in source code. If you'd like to see an example of a real project using this technique, please have a look at ban2fail
Happy coding!

Top comments (4)

Collapse
 
ac000 profile image
Andrew Clayton

Hi,

In this example

static struct {
   int n_someFuncCalled;
   /* Other stuff */
} S;

int someFunc()
{
   ++S.n_someFuncCalled;
   /* Works no matter how many threads */
   return 0;
}

The comment

 /* Works no matter how many threads */

is a bit misleading. I'm not entirely sure what it's relating to, but if it's saying that multiple threads can simultaneously increment the counter and the result will be correct, then I'm afraid that's not true.

You would need to protect that counter with a mutex or somesuch or use an atomic type.

Collapse
 
jrbrtsn profile image
John Robertson

Incrementing an integer is atomic.

Collapse
 
ac000 profile image
Andrew Clayton

Hmm, OK, an integer can generally be considered atomic as when it's updated, you always see either it's old or new value and not something inbetween.

However this does not help with the case of multiple threads trying to update an integer. Using a normal int type and doing a ++ on it will be three instructions; load/add/store.

So

++S.n_someFuncCalled;

translates to something like the following assembly

        movl    S(%rip), %eax
        addl    $1, %eax
        movl    %eax, S(%rip)

Lets say the counter has a value of 10 and two threads are coming to increment it.

So thread a could be between the add and second mov, when thread b comes in and does the first mov.

So thread A did 10 + 1, but before it copied the result back to memory, thread B came in and copied 10 into the register and then does the add, meanwhile thread A has now put 11 back into memory (correct), however thread B is doing 10 + 1 and so also puts 11 back in memory, so we are left with a = 11 and not 12...

This from the link I provided above also gives a hint

Objects of atomic types are the only objects that are free from data races, that is, they may be modified by two threads concurrently or modified by one and read by another.

This is easy to test yourself with a small program that has two threads updating an integer, try with and without locking... the example in the link above shows what you can expect to see...

Thread Thread
 
jrbrtsn profile image
John Robertson • Edited

You are correct, thanks for pointing out the oversight. I got the 20 minute lecture this morning from my son, and specifying atomic is the best practice for code to work on a variety of hardware platforms. Thanks for pointing this out!