Preface
Recently I've made myself unemployed and for a few weeks now I'm frequently doing technical interviews as part of looking for a new job. Despite each project being unique, interviewers tend to ask the same questions from time to time. I've decided to keep track of the most common interview questions I get and write down my answers to them.
When you're in-between jobs, getting ready for interviews may be tedious. So instead I welcome you to follow this series, in which I will try to cover different interview topics in each post. I will try to write these regularly and I hope that it can be helpful to someone.
Note: if you find a mistake, please let me know in the comments and I will correct it. Thanks!
Heap vs stack
Q: Where are objects allocated in C#?
In C# there are two places where an object can be stored -- the heap and the stack.
Objects allocated on the stack are available only inside of a stack frame (execution of a method), while objects allocated on the heap can be accessed from anywhere.
Q: Which objects are allocated on the stack and which objects are allocated on the heap?
Note: you should never say "reference types are allocated on the heap while value types are allocated on the stack", this is a commonly repeated mistake and sets off a red flag for an experienced interviewer.
Reference types (classes, interfaces, delegates) are always allocated on the heap.
When you pass a reference object as a parameter or assign it to a variable, you're in fact passing its reference. The reference (not the referenced object) can be allocated both on the stack or on the heap.
By passing a reference to an object, you're telling where that object is located on the heap so that your code can access it.
Every time an object is passed as a reference, the reference itself is copied. This means that you can change the reference to point to a different object without affecting the previous object itself or other references pointing to it. A reference is lightweight and is always constant size (32 bit or 64 bit depending on OS bitness) so copying it (and thus passing around reference types) is considered cheap.
Value types (derived from System.ValueType
, e.g. int
, bool
, char
, enum
and any struct
) can be allocated on the heap or on the stack, depending on where they were declared.
- If the value type was declared as a variable inside a method then it's stored on the stack.
- If the value type was declared as a method parameter then it's stored on the stack.
- If the value type was declared as a member of a class then it's stored on the heap, along with its parent.
- If the value type was declared as a member of a struct then it's stored wherever that struct is stored.
Starting with C#7.2, a struct
can be declared as ref struct
, in which case it will always be allocated on the stack, preventing it from being declared inside reference types.
Instances of value types are passed by copy (unless used with reference semantics, see below). This means that every time a value type is assigned to a variable or passed as parameter, the value is copied.
Because copying value types can get expensive depending on the size of the object, it's not recommended to declare memory-heavy objects as value types.
Since every type in C# derives from System.Object
, value types can be assigned to variables or passed to methods that expect an object
. In such cases, the value is copied and stored on the heap wrapped as a reference type, in an operation known as boxing.
Q: Can we use value types with reference semantics?
Keywords such as ref
and out
, ref return
and ref local
(C#7.0), in
(C#7.2) allow accessing value types by reference. This means that instead of copying the value, the consuming code will receive a reference to the value instead, be it on a stack or on a heap, as long as the lifetime of that value type is longer than that of consuming code.
Q: How is the heap memory freed up?
While the objects stored on the stack are gone when the containing stack frame is popped, memory used by objects stored on the heap needs to be freed up by the garbage collector.
When an object stored on the heap no longer has any references pointing to it, it's considered eligible for garbage collection.
At a certain point, garbage collector kicks in, interrupts all running threads, invokes the finalizers of the objects it's trying to get rid of (on a special finalizer thread), and then marks the memory as free to use.
Q: What issue may happen due to allocation and de-allocation of memory on the heap?
As the memory on the heap is allocated and de-allocated, it becomes fragmented. See the following diagram:
HEAP:
---][-------][----------][-----]........
obj 1 obj 2 obj 3 free
When obj 2
is de-allocated, its memory becomes free:
HEAP:
---][-------]............[-----]........
obj 1 free obj 3 free
Now, if the runtime needs to allocate another object on the heap, it may use the memory freed up by obj 2
, but only if the new object actually "fits". If that memory is not enough, the runtime may request more contiguous memory from the operating system by expanding its working set, as shown here:
HEAP:
---][-------]............[-----][--------------------]...
obj 1 free obj 3 obj 4
As a result of the fragmentation, the memory usage becomes less efficient. To deal with this, garbage collector may rearrange the memory so that there are no gaps. This is done by simply copying the bytes around, in an operation called "defragmentation".
HEAP:
---][-------][-----][--------------------]...............
obj 1 obj 3 obj 4 free
Q: What is Large Object Heap and what is it used for?
Depending on the size of the consumed memory, memory defragmentation can be expensive, that's why the heap is further separated into Small Object Heap (SOH) and Large Object Heap (LOH).
An object is stored on the SOH if it's smaller than 85kbytes, otherwise it's stored on the LOH. This cut off point of 85000 bytes was empirically devised as the point after which defragmentation no longer provides performance benefits.
Due to how CPUs deal with double
s, arrays of double
are an exception, such objects are stored on the LOH if there are more than 1000 elements in the array.
Memory in LOH is (normally) not defragmented, providing better performance at the cost of less efficient memory usage.
Top comments (15)
"Reference types (classes, interfaces, delegates) are always allocated on the heap and never on the stack."
Yes, reference types are generally allocated on the heap but there's no guarantee and clever compilers can and will allocate objects on the stack if they can prove that the reference to the object never escapes (you can read up on Escape analysis).
"Keywords such as ref and out, ref return and ref local (C#7.0), in (C#7.2) allow accessing value types by reference. This means that instead of copying the value, the consuming code will receive a reference to the value instead, be it on a stack or on a heap, as long as the lifetime of that value type is longer than that of consuming code"
Yeah, that's what you'd intuitively think, but sadly that's (generally) not the case. Value types are specified to be immutable, which the compilers has to guarantee. That means if you pass a value type around via
ref
or similar you'll get a defensive copy. Same thing if you try to call methods on a struct.This has caused quite the performance problems over the years. Luckily
readonly struct
was introduced to avoid this problem.It would also be a good idea to mention that what you're describing about the GC and LOH in particular is an implementation detail and not contractually guaranteed.
Also finalization strategies are also implementation defined. Currently it is not true that the GC stops everything while finalizers are being run (after all we have a dedicated thread for it), which is one reason that makes finalizers so complicated to implement correctly. There is also no guarantee that finalizers won't run on the thread pool instead of a single dedicated thread in the future (so don't rely on finalizers being sequentially executed!). Actually I'm not even sure if .Net Core still has a dedicated thread here.
In all fairness, everything about how the memory management works in CLR is an implementation detail, although it doesn't stop interviewers from asking these questions. :)
You're thinking about
ref readonly
andin
specifically, in which case yes, a defensive copy has to be made because the compiler cannot be sure that the object is not mutated."In all fairness, everything about how the memory management works in CLR is an implementation detail, although it doesn't stop interviewers from asking these questions. :)"
I know people who ask these questions in the hope of getting push back, but I'll agree - lots of questionable interview questions out there :)
Also yes you're right, the copy only happens if you use
in
or access a property of a readonly field that's a struct.This is wrong,
string
is reference type and it is always passed as reference (and reference is a pointer to actual string contents, this pointer is passed as value, pointer is always on stack when passed/returned in method)However string is designed to be immutable so you cannot modify it (this is why you feel it is copied, but it is not), imagine if you have 100KB of text in
string
passing it from one method to another would be time consuming. When you run a method like.ToUpper()
etc, this is the time a new string is allocated on heap and its reference is sent to you.Also literal strings are declared in assembly's resources, which is loaded on the heap, string is not copied and it can never be, it would be worst design ever.
This statement is correct (with exception of closure), because fields do not constitute as type, class/structure containing them is a type. If experienced developers do not understand true definition of the type then it is certainly wrong place to work !!
Only in case of closure, every captured variable becomes part of a reference stored on heap.
This is the reason, there are local functions, captured value type variables in local functions are not stored on heap.
So I would recommend shorter sentence,
You're completely right, thanks for correcting.
I'm not sure what your point is here. A reference type (i.e. class) can be declared with a value type field inside of it. The lifespan of the memory allocated for this field cannot be shorter than the lifespan of the memory allocated for the containing type, so both have to be placed on the heap.
My point is, you cannot use term
value type
for a field. Field is a member of type. Members belong where ever the containing type exists that's all (this is well known phenomenon).A field is member of a type but also represents an instance of some type as well. The term "value type field" is a field whose type (not the declaring type) is a value type.
For example, see here, you can get the type of a field by getting the value of
FieldInfo.FieldType
property. You can then check if it's a value type through checking Type.IsValueType property.Yes, that's what I said. Hence why saying "value types are allocated on the stack" is not correct, even if you exclude closures.
For example, here's an article by Jon Skeet referencing the subject in the second paragraph.
Type is something you can always do
typeof(x)
, you can never do type of (field of (class/struct)). Example,First of all you can never do
typeof(A.a)
becausea
is a field of type, it is not a type !FieldInfo.FieldType
is type of field, field is not type. Again, value type is a type, which you can safely dotypeof(int), typeof(string)
, anything that can sit insidetypeof
expression is a type, field is not type.Here,
A
is a type, since it is a struct, it will always be on stack unless captured by lambda. And whatever may be the type fielda
,A
will always be on stack !! Member of a type is not type !! Field/Method/Property all are member of type and allocation will never depend on them. Ifa
is string, it is reference, but string is a type,a
is not type, andA
will still sit on stack and contents of string will be on heap anda
will store reference and entire object will sit on stack.You can do
A.a.GetType()
to get the field type. Field type can be value type. I'm not talking about field being a type.I have a question in relevance to the Stack and Heap. I will greatly appreciate any inputs. Thanks.
Question: why is the new operator not used when initializing predefined class type variables(such a variable of type string) or predefined struct type variables(such as a variable of type int)?
For example:
string name = "Richard";
int number = 36;
The new operator was never used when assigning such values to variables name and number. But are such values still considered instances of the respective predefined type.
Reason why I am asking this is because if I were to define a custom class type(say custom class type Person) or custom struct type(say custom struct type Dog). And if I were to declare a variable of such custom class type and such custom struct type, the value assigned to such variables would be an instance of their types created via the use of the new operator.
For example:
Person person1 = new Person( );
Dog dog1 = new Dog( );
Thus, is string type value(string literal) Richard an instance of predefined class type String(alias: string) just like how new Person( ) is an instance of user-defined class type Person?
And is in type value(integral literal) 24 an instance of predefined struct type Int32(alias: int) just like how new Dog( ) is an instance of user-defined struct type Dog?
It's funny how these topics are such a focus for interview questions. You need to understand the difference between value and reference types, but beyond that, you will rarely need to know any of this. That's the whole point of managed code. There's usually a lot more that should be prioritized ahead of these topics in an interview.
Thanks for writing this! I've been reading about this topic all day and it has been rather confusing to be honest. I have seen read many sources that do assert that the main difference is that value types go to the stack, and reference types to the heap.
This may seem like a silly question (I know, there are no silly questions in programming), but if I declare a global variable of type int, will that be stored on the heap?
There are no "global" variables in C#, as everything is part of some class (or struct). But if it's a field in a class then it's most likely going to be on the heap.
Thanks, fixed.