[.NET Internals 02] Stack and heap – .NET data structures

In the second post of .NET Internals series, we’re going to investigate the organization of .NET process’s memory. We’ll see what is stack and heap and what kind of data is stored on each of these memory structures.

If you’re here for the first time, I encourage you to first read the .NET Internals series introductory post explaining the basics of memory structure.

Memory division

By default, when the .NET application is started and virtual address space is allocated for the process (as we saw in the previous post), the following data structures, represented as heaps, are created:
  • Code Heap – storing JIT-compiled native code,
  • Small Object Heap (SOH) – storing objects of size less than 85 kilobytes,
  • Large Object Heap (LOH) – storing objects of size greater than 85 kilobytes*,
  • Process Heap.

*side note: there’s an exception for arrays of double,  which are allocated on the LOH much before reaching 85K (double[] is considered “large” when it exceeds 1000 elements). This is for the sake of 32-bit code optimization. More details here and here.

Division of storing small and large objects on different heaps has its performance reasons, which will be covered in the next posts about garbage collection.

Elements placed on the heap have their addresses, which are a kind of pointers to the place in memory where the particular object is located.

Heap is however not the only data structure used by .NET process. There’s also a stack, which helps in tracking code’s execution and storing some special types of data. We’ll now see in details how stack and heap are used and organized.

Stack

Stack Data Structure, source: Wikipedia

Stack is a memory structure organized as LIFO (last-in-first-out) collection. If you think about it, it’s perfect for storing anything that may soon need to be used (easily popped out from the top of the stack). This nature of stack leads to its two main purposes: keeping track of execution order and storing value types variables.

Keeping track of execution order – call stack

Most of the code we write is wrapped into classes and methods, which can call another methods, which call another methods etc. .NET Framework must all the time keep track of this execution’s order. Moreover, it also has to remember the state of variables and parameters of the method calling another method, while this another method executes (to be able to restore the calling method’s state after the called method returns).

As soon as any method is called, .NET initializes a stack frame (kind of container) which stores all data necessary to execute the method: parameters, local variables and address of the line of code to go back to when the method finishes.  Stack frames are created on the stack on top of each other. Each method has its own stack frame. All that behavior is well illustrated on the figure below.
Call stack frames, source: Wikipedia

Stack used for storing the execution order is often referred to as call stack, execution stack or program stack.

Take a look at the following code:

In order to call Method2, the framework must save an execution return address which will be the next line of code to execute after Method2 finishes (line 4 in the example above). This address altogether with parameters and local variables of the calling and called methods are stored on the call stack, as the schema below presents.
Call stack for methods 1-3, source: C. Farrell and N. Harrison – Under the Hood of .NET Memory Management

You can also notice what happens when Method3 returns (its stack frame is popped out from the stack – it disappears).

Storing value types

Stack is also used to store variables of any of .NET value types, including bool, decimal, int etc. Full list of .NET value types can be found here.

Value types are basically the types which keep the data and memory in the same location. What’s also interesting to know is that all value-typed local variables of the method allocated on the stack are cleaned-up after the method’s execution finishes. It happens because the method’s stack frame becomes unreachable – stack has some pointer to the beginning of the top-level stack frame (current stack frame pointer), which is simply moved to the second stack frame from the top as soon as the method’s execution is finished (the data is still physically there, but it’s not reachable by default .NET mechanisms).

Heap

Heap is similar to stack, but if the stack can be imagined as a series of boxes stacked on top of each other, where we always push or pop the top one, heap contains boxes that can be accessed at any time. The boxes in the heap are placed in different places, not necessarily on the top of each other. To access one of the boxes we don’t need to pop out the top ones first.

Storing reference types

All other variables, which are not value types i.e. are either string or object (or deriving from one of these, so all classes, interfaces and delegates) are referred to as reference types. All reference types are allocated on the managed heap (SOH or LOH, depending on their size). However, even though an object is allocated on the heap, a reference to it (address on the heap) is stored on the stack.

Consider the following code:

The figure below presents how stack and heap would look like here in terms of allocated data:
Stack and Heap with reference variable, source: C. Farrell and N. Harrison – Under the Hood of .NET Memory Management
The “OBJREF” stored on the stack is actually a pointer (reference) to the MyClass object on the heap.

NOTE:  MyClass myObj statement doesn’t allocate any space on the heap for myObj variable. It only creates “OBJREF” variable on the stack, initializing it with NULL value. By the time new statement is used, the actual memory allocation on the heap takes place and the reference’s value is set.

Value types vs reference types (stack vs heap)

The crucial difference between value and reference types is that when a value type variable is passed to another method as parameter or just assigned to another value type variable, its value is copied into the new variable. That’s why when passing a value type to another method, which modifies this variable (parameter), when the method returns the original variable’s value is not changed (it was its copy which was used inside the called method). Figure below shows this behavior.
Value types copying, source: link

It’s different in case of reference types. When reference type variable is assigned to another reference type variable (or method’s parameter), it’s not its value which is copied, but only the reference to it (address in the memory where the data is actually stored on the heap). In effect, the new variable (or method’s parameter) still points to the same place in memory, so modifying such reference type variable inside the called method will also be in effect outside of the method. This case is illustrated below.
References passing, source: link
Of course, storing some data on the stack and some on the heap (moreover: different types of heaps) has its precise purpose, which we’re going to explore in the coming posts 🙂

Summary

In today’s post we went through stack and heap, two data structures used for storing value and reference types and tracking execution order of .NET application. In the next posts in the series we will see what is boxing and unboxing and how it can affect application’s performance. We will also examine the garbage collection process in details.

Stay tuned!

Don't miss any new content!

Sign up for my newsletter 🙂

I agree to receive new updates notification emails from dsibinski.pl