Discussion on: Is "C Programming language" Still Worth Learning in 2021?

View post

Replies for: It's worth learning in order to fully appreciate the wonders of undefined behavior. Consider the following C program -- what does it do? #inclu...

This isn't quite "undefined behaviour", just weird syntax and one of those moments when you ought to know operator precedence and evaluation order, which is pretty much the same in every language (in some languages with dialects or multiple compilers it may just be more apparent).
Undefined behaviour would be something along the lines of:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <err.h>

int main(void)
{
    const size_t size = 1024*1024;

    char *data = malloc(size);
    if(!data) { err(1,"malloc"); } // replace with assert if you don't have err.h from libbsd

    memset(data,0,size); // write zeroes
    free(data);
    memset(data,0xff,size); // write ones

    return 0;
}

Matthew Stokes • Jul 28 '20

Interesting. As far as I could read up because I didn't think it was either; in most cases the compiler will handle it as you expect, but it doesn't have to according to the spec which is why it is undefined?

There is no guarantee in the specification for c that the increment of i will be done when you use it as the third argument to printf(). So you could reasonably get 1, 1?

I may well have misunderstood though!

pentacular • Jul 29 '20

I think you're imagining that the operations occur in an unspecified order, as would be the case for

foo(a(), b());

There is a sequence point when a call is executed, so a(), and b() occur in some distinct, if unspecified, order.

The program will not have undefined behavior, but may have unspecified behavior (if it depends on the order of those calls), but we can continue to reason about the C Abstract Machine for both cases.

foo(i, i++);

There is no sequence point between i and i++, so they occur at the same time, leading to a violation of the C Abstract Machine, producing undefined behavior.

We cannot reason about the program from this point onward.

pentacular • Jul 29 '20 • Edited

It's undefined behavior of the case that "Between two sequence points, an object is modified more than once, or is modified and the prior value is read other than to determine the value to be stored."

This happens because there is no sequence point between the i++ and the i.

Precedence doesn't come into this.

Here's a more interesting variation on your example.

Can you spot the undefined behavior here? :)

int main() {
  char *data = malloc(1);
  if (data) {
    free(data++);
    data++;
  }
}

#benaryorg • Jul 29 '20

Ah, I see.
So C literally doesn't define any order on those instructions and it's up to the compiler?
Wouldn't have expected that, though I've seen the example a few times.
Excuse my hasty assumption then please.

First off, I'd really appreciate it if you specified the syntax in the code blocks so syntax highlighting kicks in ;-)
Something along the lines of that (without the backslashes, markdown dialect doesn't allow nested fences):

\`\`\`c
int main(void)
{
    return 0;
}
\`\`\`

Actually no I can't see the undefined behaviour in that example.
In all cases you're manipulating the pointer only if I see correctly, and since free takes the pointer by value and not reference, you'd end up with a copy of data before increment in the call, and move along the pointer twice afterwards, but in either case the pointer is invalid.
What am I missing?

pentacular • Jul 29 '20

Pointers are only well defined for null pointer values or when pointing into or one past the end of an allocated array.

The first increment satisfies this, since it happens before the free occurs.

After the free, the pointer value is undefined and so the second increment has undefined behavior.

#benaryorg • Aug 7 '20

But you're not actually using that pointer in the code, so I fail to see how that's undefined behaviour.
An invalid pointer which isn't used still doesn't cause any runtime issues, or is there something about that too in the standards?

pentacular • Aug 7 '20

The last increment of the pointer is when it has an undefined value, producing undefined behavior.

For example it might behave like a trap representation.

Regardless, the program cannot be reasoned about after this point. :)