This is going to be my first post so I chose a rather simple concept: Assignment vs Initialization in C++. I will try to keep the post as practical as possible and share keywords in case the reader wants to do in-depth research. So buckle up and enjoy the ride folks!
int x; // Define x
x = 3; // Assign 3 to x
int y{3}; // Define and initialize y with 3
Above statements cause both x and y variables to have a value of 3, which leads to the common pitfall that they are identical.
Let's wear ISO C++ Standards Committee hat
According to C++20 standards, which is recently published, initialization is explained in "9.4 Initializers" section; whereas assignment is explained in "11.4.5 Assignment operator" section. How dare you call them identical, you peasant!
That was a little bit harsh. Perhaps we should wear C++ compiler implementer hat
Gcc 10.2 produces identical output for below codes with or without optimizations.
int getX(){
int x {3};
return x;
}
int getY(){
int y;
y = 3;
return y;
}
get():
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], 3
mov eax, DWORD PTR [rbp-4]
pop rbp
ret
That was a bit anticlimactic, I guess. My guess is if data type is scalar, compiler can directly assign value as cppreference.com suggests but I couldn't find the relevant section (direct assignment) on c++ standard. Perhaps we should try a non-scalar data type. For example std::string.
#include <string>
std::string getX(){
std::string x {"bugra"};
return x;
}
std::string getY(){
std::string x;
x = "bugra";
return x;
}
Let's see what GCC 10.2 produce for getX and getY with optimizations enabled.
getX[abi:cxx11]():
lea rdx, [rdi+16]
mov BYTE PTR [rdi+20], 97
mov rax, rdi
mov QWORD PTR [rdi], rdx
mov DWORD PTR [rdi+16], 1919382882
mov QWORD PTR [rdi+8], 5
mov BYTE PTR [rdi+21], 0
ret
.LC0:
.string "bugra"
getY[abi:cxx11]():
push r12
mov r8d, 5
mov ecx, OFFSET FLAT:.LC0
xor edx, edx
push rbp
xor esi, esi
mov r12, rdi
push rbx
lea rbx, [rdi+16]
mov QWORD PTR [rdi], rbx
mov QWORD PTR [rdi+8], 0
mov BYTE PTR [rdi+16], 0
call std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_replace(unsigned long, unsigned long, char const*, unsigned long)
mov rax, r12
pop rbx
pop rbp
pop r12
ret
mov rbp, rax
jmp .L2
getY[abi:cxx11]() [clone .cold]:
I am no expert but I think getX (initializing method) is a lot better than getY (assignment method). Since we established that assignment and initializing can cause different outputs to be produced, let's try to understand the difference between them. We need to wear the formal hat again.
C++20 standard "6.7.7 Temporary objects" section states that the expression a = f() requires a temporary object for the result of f(), which is materialized so that the reference parameter of X::operator=(const X&) can bind to it.
Cpp Core Guidelines also advices to prefer initalization to assignment.
Let's finish with a regular developer hat
TLDR: Prefer initialization to assignment!
class A { // Good
string s1;
public:
A(string p) : s1{p} { } // GOOD: directly construct
};
class B { // BAD
string s1;
public:
B(const char* p) { s1 = p; } // BAD: default constructor followed by assignment
};
Keywords: copy assignment, Builtin direct assignment for scalar types, copy initialization, direct initialization, list-initialization, temporary objects
Top comments (3)
Just to complement on this part:
That's not the only problem we face in
getY
for the string example.In fact, we have to understand how local variables are initialized. When you create a local integer (
int i;
), memory is reserved for the new variable, but primitive data types are not default initialized. They will hold whatever value that they find in the allocated memory (on the stack).On the other hand,
std::string
is not a primitive data type and objects are default initialized given that they have a default constructor. If they don't have, well, such code wouldn't compile.So getting back to
getY
for the string example.The line
std::string x;
creates a variablex
which is initialized to an empty string. Then on the next linex = "bugra";
you assign a new value tox
.x
is assigned twice! (The integeri
was assigned only once!)It's yet another problem that "bugra" is not a string. It's a
const char*
that first have to be - implicitly - converted to astd::string
and you pay for it. Hence the immense difference in the ASM code. If we want to avoid that cost, and we have access to C++14, we should use a string literal.Then the generated ASM code becomes similar:
But even with whatever optimization turned on, there is no reason in similar circumstances to split declaration from initialization. For example, it doesn't let you declare your variables const.
Here is a great talk on this
Thanks for the great feedback Sandor. I really appreciate it :)
Cool, before I got involved with move semantics I had no idea of the difference between assignment and initialization. I believe that many people don't even try to understand :3
constructor -> initialization
operator= -> assignment
[google translator]