Garbage Collection Internals (Part 2 – C++)

In the previous post we saw that C progamming language does not have a built-in garbage collector, but what about C++? Yes and No.

C++ does NOT have a built-in garbage collector at language level but over a period of time it has added “smart” pointers, which deallocate memory when they go out of scope. This at least stems the memory leaks caused in typical C++ program, provided the developer uses these smart pointers.

How smart are these pointers?
/*
 * File:      SmartPointer.cpp
 * Project:   CPP
 * Author:    Sanjay Vyas
 * 
 * Description:
 * 
 * Revision History:
 * 2020-Jun-20	[SV]: Created
 */

#include <iostream>
using namespace std;

// Resource we will allocate on heap
class AllocatedResource
{
private:
    int data;

public:
    // Constructor
    AllocatedResource(int x)
    {
        this->data = x;
        cout << "Allocated " << x << "\n";
    }

    // Destructor
    ~AllocatedResource()
    {
        cout << "Deallocated " << this->data << "\n";
    }

    void print()
    {
        cout << "Resource: " << this->data << "\n";
    }
};

int main()
{
    unique_ptr<AllocatedResource>
        {new AllocatedResource(5)};
    return 0;
}

And here is the output
Allocated 5
Deallocated 5

As we can see, unique_ptr automatically deallocates the pointer when it goes out of “scope”, in this case, when main ends. This frees us up from keeping track of allocation and manually calling delete when it’s not longer needed.

unique_ptr is an object itself and keeps track of allocation it does
So what happens if we dont deallocate
and simply reallocate a unique_ptr
/*
 * File:      AutoPtr.cpp
 * Project:   CPP
 * Author:    Sanjay Vyas
 * 
 * Description:
 *      GC is not at language level in C++
 *      We can use auto_ptr/unique_ptr to automate it    
 * 
 * Revision History:
 * 2020-Jun-20	[SV]: Created
 */
#include <iostream>
#include <memory>

using namespace std;

// Resource we will allocate on heap
class AllocatedResource
{
private:
    int data;

public:
    // Constructor
    AllocatedResource(int x)
    {
        this->data = x;
        cout << "Allocated " << x << "\n";
    }

    // Destructor
    ~AllocatedResource()
    {
        cout << "Deallocated " << this->data << "\n";
    }

    void print()
    {
        cout << "Resource: " << this->data << "\n";
    }
};

int main()
{
    // Raw pointer (C style)
    AllocatedResource *rawPointer;
    cout << "Raw pointer allocation\n";
    for (int i = 0; i < 5; i++)
    {
        // This will allocate and fire constructor
        rawPointer = new AllocatedResource(i);
        // WARNING! Memory leak occuring on rawPointer
    }
    cout << "Raw pointer causes memory leak\n\n";

    // Unique pointer (C++ style)
    unique_ptr<AllocatedResource> smartPointer{};
    cout << "Automatic pointer allocation\n";
    for (int i = 0; i < 5; i++)
    {
        // This will allocate and fire constructor
        // When assigned again
        //  It will fire destructor on previous allocation
        //  and fire constructor on new allocation
        smartPointer = unique_ptr<AllocatedResource>
            { new AllocatedResource(i + 100) };
    }
}

And here is the output
Raw pointer allocation
Allocated 0
Allocated 1
Allocated 2
Allocated 3
Allocated 4
Raw pointer causes memory leak
Automatic pointer allocation
Allocated 100
Allocated 101
Deallocated 100
Allocated 102
Deallocated 101
Allocated 103
Deallocated 102
Allocated 104
Deallocated 103
Deallocated 104

As we can see, the C style “raw” pointer allocates memory but if we don’t call delete on it, the memory will not be deallocated, causing it to become “garbage” which is not “garbage collected”.

However, unique_ptr is a class which automatically deallocates when it goes out of scope (basically does a delete when its destructor is called), or when we assign a new allocation to it. This brings some semblance of sanity in deallocating unused memory allocations in C++. However, let me repeat… the is NO built-in Grabage Collector in C++, instead we have to use unique_ptr or shared_ptr and make sure we reduce memory leaks from our code.

If unique_ptr is again allocated, it auto deallocates the previous allocation

C and C++ both don’t have built-in garbage collector, but at least C++ provides built-in classes like unique_ptr and shared_ptr which automatically manage the allocation. Internally shared_ptr uses reference counting to keep track of how many pointers are pointing to a given allocation. However, this is not language level Reference counting (Swift or Python).

Garbage Collection Internals Series

  1. Part 1 – C language
  2. Part 2 – C++ Language
  3. Part 3 – Java

Process Map

I am in love with Process Maps. What are they?

Whenever a program is loaded into memory and executed as a process, the OS doesn’t simply dump the executable, rather, arranges and structures various sections into segments – code, data, heap, stack etc. In case of dynamic languages there may not be a binary executable to begin with, so their process maps are populated gradually. C/C++ have relatively simple process maps where as others can be quite complex, like .NET has 9 distinct types of heap alone.

To understand how something works, it’s best to look it how it works internally. Visualise opening up the engine of a car to understand what gears do, what happens when we press the accelerator etc. Similarly, to understand programming, visualising the process map instead of just reading the syntax will make us understand it better

Compilation Process and Process Map of C/C++

Here is a video which explains what happens when we compile a program, say in C/C++, and execute it. It is interesting to visualise process maps of Java, C# and more interesting are JavaScript and Python as they are dynamic languages.

Fundamentals

Maybe it is the way we learnt programming by patterns – “Do this and that will happen” or “this is how it is done”. Maybe we don’t stop to think how things work at more fundamental level.

Here is a piece of code which works in some languages (C, C++, JavaScript) but won’t work in others (Java, C#)

C/C++ allows a statement like 1, 2, 3;
/*
   Works in C/C++
*/
int main()
{
    1, 2, 3;  // Why does this work? What does it mean?
}

Java and C# don't allow 1, 2, 3
/*
   Java (and C#) don't allow
*/
class MyClass {
    static void main(String[] args) {
        1, 2, 3; // Gives an error.. Why?
}

Why do you think this is? Is there a fundamental reason these languages differ? Why did Java disallow this? If you know why, write it in comments below