Discussion on: Is "C Programming language" Still Worth Learning in 2021?

View post

It's worth learning in order to fully appreciate the wonders of undefined behavior.

Consider the following C program -- what does it do?

#include <stdio.h>

int main() {
  int i = 1;
  printf("%d, %d", i++, i);
}

Vlastimil Pospichal • Jul 28 '20

This is one of the reasons I write prototypes and tests. I'll try it.

pentacular • Jul 28 '20

In that case, I think you missed the point -- but I look forward to explaining why your results are wrong. :)

Vlastimil Pospichal • Jul 28 '20 • Edited

It's funny. First using i, then increment i, then use i as a second parameter.

The result is the same:

#include <stdio.h>

int main() {
  int i = 1;
  printf("%d, %d", i++, i+1);
}

pentacular • Jul 29 '20 • Edited

Your results are wrong. :)

They're wrong because they're showing how your implementation decided to implement this undefined behavior, this time, and don't reflect on how C works.

Comment deleted

pentacular • Sep 10 '20

C programs are understood in terms of the CAM (C Abstract Machine).

The compiler's job is to build a program that produces the same output as the CAM would for a given program.

The CAM says that a variable can only be read, or read-to-modify, once between two sequence points.

There are no sequence points between the i++ and i+1, so this produces a read/write conflict, which means that the program has undefined behavior in the CAM, and so the compiler can do whatever it wants.

It could crash, or print out 23, 37 or -9, 12, and these would all be equally correct behaviors.

Matthew Stokes • Jul 28 '20 • Edited

Print 1 and then 2? Genuinely curious where is the undefined behaviour? :)

Matthew Stokes • Jul 28 '20

Ah, I see it now. There is no guarantee the increment will happen before the print. Only before the next sequence point!

pentacular • Jul 29 '20

The increment must happen before the print, as there is a sequence point between the evaluation of the arguments and the call.

But there are no sequence points between the evaluations of the arguments.

Leading to undefined behavior of the case that "Between two sequence points, an object is modified more than once, or is modified and the prior value is read other than to determine the value to be stored."

Matthew Stokes • Jul 29 '20

Thanks for clarifying! That makes more sense.

#benaryorg • Jul 28 '20

This isn't quite "undefined behaviour", just weird syntax and one of those moments when you ought to know operator precedence and evaluation order, which is pretty much the same in every language (in some languages with dialects or multiple compilers it may just be more apparent).
Undefined behaviour would be something along the lines of:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <err.h>

int main(void)
{
    const size_t size = 1024*1024;

    char *data = malloc(size);
    if(!data) { err(1,"malloc"); } // replace with assert if you don't have err.h from libbsd

    memset(data,0,size); // write zeroes
    free(data);
    memset(data,0xff,size); // write ones

    return 0;
}

Matthew Stokes • Jul 28 '20

Interesting. As far as I could read up because I didn't think it was either; in most cases the compiler will handle it as you expect, but it doesn't have to according to the spec which is why it is undefined?

There is no guarantee in the specification for c that the increment of i will be done when you use it as the third argument to printf(). So you could reasonably get 1, 1?

I may well have misunderstood though!

pentacular • Jul 29 '20

I think you're imagining that the operations occur in an unspecified order, as would be the case for

foo(a(), b());

There is a sequence point when a call is executed, so a(), and b() occur in some distinct, if unspecified, order.

The program will not have undefined behavior, but may have unspecified behavior (if it depends on the order of those calls), but we can continue to reason about the C Abstract Machine for both cases.

foo(i, i++);

There is no sequence point between i and i++, so they occur at the same time, leading to a violation of the C Abstract Machine, producing undefined behavior.

We cannot reason about the program from this point onward.

pentacular • Jul 29 '20 • Edited

It's undefined behavior of the case that "Between two sequence points, an object is modified more than once, or is modified and the prior value is read other than to determine the value to be stored."

This happens because there is no sequence point between the i++ and the i.

Precedence doesn't come into this.

Here's a more interesting variation on your example.

Can you spot the undefined behavior here? :)

int main() {
  char *data = malloc(1);
  if (data) {
    free(data++);
    data++;
  }
}

#benaryorg • Jul 29 '20

Ah, I see.
So C literally doesn't define any order on those instructions and it's up to the compiler?
Wouldn't have expected that, though I've seen the example a few times.
Excuse my hasty assumption then please.

First off, I'd really appreciate it if you specified the syntax in the code blocks so syntax highlighting kicks in ;-)
Something along the lines of that (without the backslashes, markdown dialect doesn't allow nested fences):

\`\`\`c
int main(void)
{
    return 0;
}
\`\`\`

Actually no I can't see the undefined behaviour in that example.
In all cases you're manipulating the pointer only if I see correctly, and since free takes the pointer by value and not reference, you'd end up with a copy of data before increment in the call, and move along the pointer twice afterwards, but in either case the pointer is invalid.
What am I missing?

pentacular • Jul 29 '20

Pointers are only well defined for null pointer values or when pointing into or one past the end of an allocated array.

The first increment satisfies this, since it happens before the free occurs.

After the free, the pointer value is undefined and so the second increment has undefined behavior.

#benaryorg • Aug 7 '20

But you're not actually using that pointer in the code, so I fail to see how that's undefined behaviour.
An invalid pointer which isn't used still doesn't cause any runtime issues, or is there something about that too in the standards?

pentacular • Aug 7 '20

The last increment of the pointer is when it has an undefined value, producing undefined behavior.

For example it might behave like a trap representation.

Regardless, the program cannot be reasoned about after this point. :)

Andrew Harpin • Jul 28 '20

This is the same in any language, there are many ways to do something, not all of them are advised.

Understand the language, learn the best practices and fundamentally write decent code.

I realise this is easier said than done, but it is the golden principal we should be adhering in our products.

Sergiy Yevtushenko • Jul 29 '20 • Edited

learn the best practices

But keep in mind that they are not a dogma and should be broken if there is a significant reason for that.

Andrew Harpin • Jul 29 '20

Agreed, if you can justify the need to do something with a particular unconventional approach, then go for it.

BUT it must be well documented, with the emphasis on how it works and that changes must be carefully considered.

Sergiy Yevtushenko • Jul 29 '20

Again, it depends. For example, now I'm working on a personal project (it's not C but Java). Among goals of this project is the search for new style of writing code. I'm often rewrite code several times in order to make it easier to read and/or more reliable. When I get code which looks satisfactory I often discover that it violates one or more Sonar rules (i.e. "best practices"). In vast majority of cases the considerations behind those rules are no longer valid because whole approach is different. What I'm trying to say is that "best practices" is a set of compatible rules/guides/considerations and there might be more than one such a set.