Garbage Collection Internals (Part 1 – C)

Why do programs produce garbage? 🤔

Well, we first have to understand memory internals of programming languages to understand what is dynamic memory and what constitutes “garbage”. So let’s take a look at this Digital ConceptVisual video which explains Process Map, Segments, Global Data area, Stack and Heap, the last one being the focus of our attention.

Compilation Process of C/C++ and Process Map

As we can see from the above video, global variables (if any in the language) are stored in DATA/BSS segment, local variables are in the STACK and any data created at run time are stored in HEAP.

Languages like C/C++ were mostly Global and Local variables and only when the developer specifically called malloc/calloc or new, it allocated memory in the HEAP.

One great thing about HEAP allocation is that it’s allocated at runtime and can be “released” by the developer. If the developer stops using the allocated memory and does not release it, well, that’s when it’s called “Garbage” memory.

Different languages have different strategy for releasing heap allocated memory once it’s no longer needed

TechnologyGrabage Collection
CYou clean up your own mess
C++You clean up your own mess
JavaMark and Sweep Garbage Collector (GC)
SwiftAutomatic Reference Counting (ARC)
COMReference Counting (RC)
C#Mark and Sweep Garbage Collector (GC)
PythonAutomatic Reference Counting (ARC) + Mark and Sweep Garbage Collector (GC)
JavaScriptEarlier was ARC but now most JS Engine implement Mark and Sweep (GC)
Language/Technology and Garbage Collection

So there are 4 Garbage collection strategies

  1. No automatic Garbage Collection – C/C++
  2. Manual Reference Counting (RC) – COM
  3. Automatic ReferenceCounting (ARC) – Swift
  4. Mark and Sweep (Compact or Copy) – Java, .NET, JavaScript
1. No Automatic Garbage Collection - C/C++
/*
 * File:      NoGC.c
 * Project:   CPP
 * Author:    Sanjay Vyas
 * 
 * Description:
 *      C program to demonstarte No GC
 * 
 * Revision History:
 * 2020-Jun-20	[SV]: Created
 */

#include <stdio.h>  // printf/scanf
#include <stdlib.h> // malloc/qsort

// Maximum random number
#define MAX (100)

// Helper function to print array
void print(int *array, int size)
{
    int offset;

    // Print comma (upto 2nd last)
    for (offset = 0;
         offset < size-1;
         offset++)
    {
        printf("%d, ", array[offset]);
    }

    // Print last element
    printf("%d\n", array[size-1]);
}

// Compare function required for qsort
// Return
//      0 if both are equal
//     <0 if lhs is < rhs
//     >0 if lhs is > rhs
int intcmp(const void *lhs, const void *rhs) 
{
    // We are given 2 elements as void *
    // We need to typecast them back to int
    return *(int *)lhs - *(int *)rhs;
}

int main()
{
    int *pointer = NULL;    // Defensive programming 🙂
    int size;
    int offset;

    while (printf("Enter size (0 to quit): "),
           scanf("%d", &size),
           size)
    {
        // Allocate heap memory
        pointer = (int *)malloc(size * sizeof(int));

        // Randomize it
        for (offset = 0;
             offset < size;
             offset++)
        {
            pointer[offset] = rand()%MAX;
        }

        // Now print the array
        print(pointer, size);

        // Let's sort it using built in function
        qsort(pointer, size, sizeof(int), intcmp);

        // And print it again
        print(pointer, size); 

        // WARNING! Memory is LEAKING here
        // We did not free(pointer)
        // The loop will use same pointer
        // to malloc again, losing the previous address
    }
}
What happens we do a malloc or new
This is what happens when we do a malloc
And what is a memory leak?
Losing address without doing free is MEMORY LEAK

What we see in the above image is a result of not deallocating memory (free or delete) and losing the address. Over a period of time, the application will fill up its heap area and would no longer be able to allocate more memory as previously unused allocations are occupying the heap area. These unused allocations are called GARBAGE.

Remember to free your allocations
/*
 * File:      FreeMem.c
 * Project:   CPP
 * Author:    Sanjay Vyas
 * 
 * Description:
 *      C program to demonstarte No GC
 * 
 * Revision History:
 * 2020-Jun-20	[SV]: Created
 */

#include <stdio.h>  // printf/scanf
#include <stdlib.h> // malloc/qsort

// Maximum random number
#define MAX (100)

// Helper function to print array
void print(int *array, int size)
{
    int offset;

    // Print comma (upto 2nd last)
    for (offset = 0;
         offset < size - 1;
         offset++)
    {
        printf("%d, ", array[offset]);
    }

    // Print last element
    printf("%d\n", array[size - 1]);
}

// Compare function required for qsort
// Return
//      0 if both are equal
//     <0 if lhs is < rhs
//     >0 if lhs is > rhs
int intcmp(const void *lhs, const void *rhs)
{
    // We are given 2 elements as void *
    // We need to typecast them back to int
    return *(int *)lhs - *(int *)rhs;
}

int main()
{
    int *pointer = NULL; // Defensive programming 🙂
    int size;
    int offset;

    while (printf("Enter size (0 to quit): "),
           scanf("%d", &size),
           size)
    {
        // Allocate heap memory
        pointer = (int *)malloc(size * sizeof(int));

        // Randomize it
        for (offset = 0;
             offset < size;
             offset++)
        {
            pointer[offset] = rand() % MAX;
        }

        // Now print the array
        print(pointer, size);

        // Let's sort it using built in function
        qsort(pointer, size, sizeof(int), intcmp);

        // And print it again
        print(pointer, size);

        // Remember to deallocate memory
        free(pointer);
    }
}

In C programming there is no built-in GC (Garbage Collection) and as developers we must keep track of the memory we are allocating and runtime and deallocate it when not required. This is tough given that the codebase may be huge and memory addresses may be passed around, losing track of who allocated it and who will deallocate it. Add to this, many developers do not have visualisation of what is happening in memory.

No wonder, languages which came after C/C++ implemented some from of Garbage Collection (Java, .NET, Python, Swift)

Garbage Collection Internals Series

  1. Part 1 – C language
  2. Part 2 – C++ Language
  3. Part 3 – Java

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s