DEV Community

Discussion on: Is "C Programming language" Still Worth Learning in 2021?

Collapse
 
pentacular profile image
pentacular

It's worth learning in order to fully appreciate the wonders of undefined behavior.

Consider the following C program -- what does it do?

#include <stdio.h>

int main() {
  int i = 1;
  printf("%d, %d", i++, i);
}
Enter fullscreen mode Exit fullscreen mode
Collapse
 
vlasales profile image
Vlastimil Pospichal

This is one of the reasons I write prototypes and tests. I'll try it.

Collapse
 
pentacular profile image
pentacular

In that case, I think you missed the point -- but I look forward to explaining why your results are wrong. :)

Thread Thread
 
vlasales profile image
Vlastimil Pospichal • Edited

It's funny. First using i, then increment i, then use i as a second parameter.

The result is the same:

#include <stdio.h>

int main() {
  int i = 1;
  printf("%d, %d", i++, i+1);
}
Thread Thread
 
pentacular profile image
pentacular • Edited

Your results are wrong. :)

They're wrong because they're showing how your implementation decided to implement this undefined behavior, this time, and don't reflect on how C works.

Collapse
 
Sloan, the sloth mascot
Comment deleted
 
pentacular profile image
pentacular

C programs are understood in terms of the CAM (C Abstract Machine).

The compiler's job is to build a program that produces the same output as the CAM would for a given program.

The CAM says that a variable can only be read, or read-to-modify, once between two sequence points.

There are no sequence points between the i++ and i+1, so this produces a read/write conflict, which means that the program has undefined behavior in the CAM, and so the compiler can do whatever it wants.

It could crash, or print out 23, 37 or -9, 12, and these would all be equally correct behaviors.

Collapse
 
stokesm profile image
Matthew Stokes • Edited

Print 1 and then 2? Genuinely curious where is the undefined behaviour? :)

Collapse
 
stokesm profile image
Matthew Stokes

Ah, I see it now. There is no guarantee the increment will happen before the print. Only before the next sequence point!

Thread Thread
 
pentacular profile image
pentacular

The increment must happen before the print, as there is a sequence point between the evaluation of the arguments and the call.

But there are no sequence points between the evaluations of the arguments.

Leading to undefined behavior of the case that "Between two sequence points, an object is modified more than once, or is modified and the prior value is read other than to determine the value to be stored."

Thread Thread
 
stokesm profile image
Matthew Stokes

Thanks for clarifying! That makes more sense.

Collapse
 
benaryorg profile image
#benaryorg

This isn't quite "undefined behaviour", just weird syntax and one of those moments when you ought to know operator precedence and evaluation order, which is pretty much the same in every language (in some languages with dialects or multiple compilers it may just be more apparent).
Undefined behaviour would be something along the lines of:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <err.h>

int main(void)
{
    const size_t size = 1024*1024;

    char *data = malloc(size);
    if(!data) { err(1,"malloc"); } // replace with assert if you don't have err.h from libbsd

    memset(data,0,size); // write zeroes
    free(data);
    memset(data,0xff,size); // write ones

    return 0;
}
Collapse
 
stokesm profile image
Matthew Stokes

Interesting. As far as I could read up because I didn't think it was either; in most cases the compiler will handle it as you expect, but it doesn't have to according to the spec which is why it is undefined?

There is no guarantee in the specification for c that the increment of i will be done when you use it as the third argument to printf(). So you could reasonably get 1, 1?

I may well have misunderstood though!

Thread Thread
 
pentacular profile image
pentacular

I think you're imagining that the operations occur in an unspecified order, as would be the case for

foo(a(), b());

There is a sequence point when a call is executed, so a(), and b() occur in some distinct, if unspecified, order.

The program will not have undefined behavior, but may have unspecified behavior (if it depends on the order of those calls), but we can continue to reason about the C Abstract Machine for both cases.

foo(i, i++);

There is no sequence point between i and i++, so they occur at the same time, leading to a violation of the C Abstract Machine, producing undefined behavior.

We cannot reason about the program from this point onward.

Collapse
 
pentacular profile image
pentacular • Edited

It's undefined behavior of the case that "Between two sequence points, an object is modified more than once, or is modified and the prior value is read other than to determine the value to be stored."

This happens because there is no sequence point between the i++ and the i.

Precedence doesn't come into this.

Here's a more interesting variation on your example.

Can you spot the undefined behavior here? :)

int main() {
  char *data = malloc(1);
  if (data) {
    free(data++);
    data++;
  }
}
Thread Thread
 
benaryorg profile image
#benaryorg

Ah, I see.
So C literally doesn't define any order on those instructions and it's up to the compiler?
Wouldn't have expected that, though I've seen the example a few times.
Excuse my hasty assumption then please.

First off, I'd really appreciate it if you specified the syntax in the code blocks so syntax highlighting kicks in ;-)
Something along the lines of that (without the backslashes, markdown dialect doesn't allow nested fences):

\`\`\`c
int main(void)
{
    return 0;
}
\`\`\`

Actually no I can't see the undefined behaviour in that example.
In all cases you're manipulating the pointer only if I see correctly, and since free takes the pointer by value and not reference, you'd end up with a copy of data before increment in the call, and move along the pointer twice afterwards, but in either case the pointer is invalid.
What am I missing?

Thread Thread
 
pentacular profile image
pentacular

Pointers are only well defined for null pointer values or when pointing into or one past the end of an allocated array.

The first increment satisfies this, since it happens before the free occurs.

After the free, the pointer value is undefined and so the second increment has undefined behavior.

Thread Thread
 
benaryorg profile image
#benaryorg

But you're not actually using that pointer in the code, so I fail to see how that's undefined behaviour.
An invalid pointer which isn't used still doesn't cause any runtime issues, or is there something about that too in the standards?

Thread Thread
 
pentacular profile image
pentacular

The last increment of the pointer is when it has an undefined value, producing undefined behavior.

For example it might behave like a trap representation.

Regardless, the program cannot be reasoned about after this point. :)

Collapse
 
andrewharpin profile image
Andrew Harpin

This is the same in any language, there are many ways to do something, not all of them are advised.

Understand the language, learn the best practices and fundamentally write decent code.

I realise this is easier said than done, but it is the golden principal we should be adhering in our products.

Collapse
 
siy profile image
Sergiy Yevtushenko • Edited

learn the best practices

But keep in mind that they are not a dogma and should be broken if there is a significant reason for that.

Thread Thread
 
andrewharpin profile image
Andrew Harpin

Agreed, if you can justify the need to do something with a particular unconventional approach, then go for it.

BUT it must be well documented, with the emphasis on how it works and that changes must be carefully considered.

Thread Thread
 
siy profile image
Sergiy Yevtushenko

Again, it depends. For example, now I'm working on a personal project (it's not C but Java). Among goals of this project is the search for new style of writing code. I'm often rewrite code several times in order to make it easier to read and/or more reliable. When I get code which looks satisfactory I often discover that it violates one or more Sonar rules (i.e. "best practices"). In vast majority of cases the considerations behind those rules are no longer valid because whole approach is different. What I'm trying to say is that "best practices" is a set of compatible rules/guides/considerations and there might be more than one such a set.