Garbage Collection Internals (Part 3 – Java)

In Part 1 and Par 2 we saw that there is no Garbage Collection at language level in C and C++, even though we can use “smart” pointers in C++ to carry out deallocation when it goes out of scope.

Languages which came after C/C++ understood the problems of memory leaks and developers putting substantial efforts in managing memory that they implemented Garbage Collection at the platform/VM level.

Java equivalent of earlier C++ program
/*
 * File:      GCJava.java
 * Project:   CPP
 * Author:    Sanjay Vyas
 * 
 * Description:
 * 
 * Revision History:
 * 2020-Jun-21	[SV]: Created
 */

class AllocatedResource {
    private int data;

    public AllocatedResource(int x) {
        this.data = x;
    }

    @Override
    protected void finalize() throws Throwable {
        System.out.println("Deallocating " + this.data);
    }
}

public class GCJava {
    public static void main(String[] args) {
        AllocatedResource resource = new AllocatedResource(5);
        System.out.println("End of program");
    }
}
And here is the output
End of program

That’s it!!
If we were expecting the finalize() method to blurt out something, well, it didn’t even get called. Not even when the program came to an end. In fact, there is no gaurantee that finalize will ever get called. Morover, starting from Java version 9, it has been deprecated.

Let’s make our code put some pressure on GC and create thousands of objects and see what happens

/*
 * File:      GCJava.java
 * Project:   CPP
 * Author:    Sanjay Vyas
 * 
 * Description:
 * 
 * Revision History:
 * 2020-Jun-21	[SV]: Created
 */

class AllocatedResource {
    private int data;

    public AllocatedResource(int x) {
        this.data = x;
    }

    @Override
    protected void finalize() throws Throwable {
        System.out.println("Deallocating " + this.data);
    }
}

public class GCJava {
    public static void main(String[] args) {
        
        // Put pressure on GC
        for (int i = 0; i < 1000000; i++) {
            AllocatedResource resource = new AllocatedResource(i);
        }
        System.out.println("End of program");
    }
}
And what do we get?
:
Deallocating 822025
Deallocating 822024
Deallocating 822023
Deallocating 822022
Deallocating 822021
Deallocating 822020
Deallocating 822019
Deallocating 651277
Deallocating 822013
Deallocating 822012
Deallocating 822011
Deallocating 822010
:
End of Program

Finally! The Garbage Collector ran… but only when it started to run out of heap because we allocated 1000000 objects.

So, don’t use finalize() because there is no guarantee that it will be called and also its deprecated now.

So how does Java manage Garbage Collection?

Java has multiple heaps – Eden, Survivors, Tenured, Permanent Generation

Java, unlike C/C++, has multiple heaps like Eden, Survivors, Tenured and Permanent generation. It’s stack is also slightly different with operand stack part for computation (not shown in the Concept Visual, maybe topic of another blogpost).

Java Garbage collector works across 4 heaps (Eden, Survivor 1&2 and Tenured) to keep track of “unreachable” objects which it marks-sweeps-moves across these segments.

HeapPurpose
EdenNew object allocations are stored here
Survivor 1After GC, objects from Eden are moved here
Survivor 2After GC, objects from Survivor 1 are moved here
TenuredLong surviving objects are moved here
PermanentNot really object heap but keeps class artefacts

So what, happens when GC runs?
Let’s say we release two rerefences a and d by setting them to null. As a result, objects containing “5” and “8” and now “unrechable”, i.e. no one is pointing to them. So what happens? GC will come running and remove them? NO! From the first example in this post we know that GC doesn’t run as soon as a single object is orphaned. In fact, the GC wont run unless it starts to run low on heap memory. So the object will keep lying around in the heap.

However, let’s assume that the GC runs, then what happens? Well, GC will mark those objects which are still “reachable”, in this case “6” and “7”, move them to Survivor generation (S0) and then clear the Eden heap.

Mark-Sweep and Move From Eden to Survivor 0 heap

As a result, Eden now becomes empty and surviving objects are “compacted” into Survivor 0. Why not let object remain is just one heap and remove those which are unreachable? Well, if they remain where they are, the free space between them will be fragmented and if there is a larger object is to be allocated, we may not have enough “contiguous” free space in the heap. That’s the reason existing objects are “moved” to a different heap and “compacted”.

So what does the Process Map looks like now?

Post GC, object’s are moved and compacted in Survivor 0

Efficient, isn’t it? Periodically, Garbage Collector removes unreachable objects and moves the remaining in a compacted manner in Survivor generation. So why 2 Survivor generations? Survivor 0 and Survivor 1?

Well, if memory gets fragmented in Survivor 0, then is moves and compacts to Survivor 1 and vice versa. This way new (and may shortlived) objects can be allocated in Eden space while long surviving objects osciallte between S0 and S1. Huh? They will oscillate forever? Hehe.. no. If an object survives a threashold, it’s finally promoted to Tenured generation, where it will no longer be moved around (old people, you see :-)).

Here is what it might look in Tenured.

Tenured Generation

An object reaches “Tenured” generation if it survives N GC passes threshold. Interesting, isn’t it?
We have compared non-GC languages like C and library based GC like C++ Smart pointer. Java was one of the first commercial language to bring in Garbage Collection (the other was Python) and .NET followed with its own Garbage Collection system. In fact, it has 8 heaps (compared to 4 heaps of Java) and 9th has been just added to .NET 5. So, another post?

Garbage Collection Internals Series

  1. Part 1 – C language
  2. Part 2 – C++ Language
  3. Part 3 – Java

Switch is EVIL! (Part 3a)

Why 3a? Why not Part 4? Well, a lot of people have messaged me that I am writing code in JavaScript, which due to its dynamic nature can do such things conveniently, but how about static languages like C++, Java or C#? So this post is basically same as Part 3 but using static languages.

So let's get rolling

switch-case in any language leads to problems, especially modification of existing code, regression testing, not to mention that the code starts to bloat up when more cases are added

Let's look at the beast
class Evil {
public String changeCase(String caseType, Stringwords) {
// We support lower and camel cases
// Sanitize the user input
caseType = caseType
.trim()
.toLowerCase();
// Accumulate output in StringBuilder
StringBuilder result = new StringBuilder();
if (caseType.equals("lower")) {
// All words are converted to lower
// And then concatenated
for (String word : words) {
result.append(
word
.trim()
.toLowerCase()
);
}
} else if (caseType.equals("camel")) {
// First word is all lower
result.append(
words[0]
.trim()
.toLowerCase()
);
// Subsequent words are
// first -> upper
// rest -> lower
for (int offset = 1;
offset < words.length;
offset++) {
// First letter is upper
result.append(
words[offset]
.substring(0, 1)
.toUpperCase()
);
// Remaining letters are lower
result.append(
words[offset]
.substring(1)
.toLowerCase()
);
}
}
return result.toString();
}
public static void main(String[] args) {
Evil obj = new Evil();
String output = obj.changeCase("lower", "This", "Word");
System.out.println(output);
}
}
view raw SwitchCase.java hosted with ❤ by GitHub

Implementing function maps in Java and C# are nearly as easy as JavaScript. The simplest way is to use delegates in C#. Other ways include Action<>, Func<> and Predicate<>. Java too has Function<>, Consumer<> and Predicate<>. Everything else failing, we can still fall back to Interfaces (or abstract classes in languages like C++).

We will use the simplest of ways here.. interface
/**
* interface specifies just one method
* which converts the case and returns back
*
* Various converters will implement this
*. lower, camel, proper, snake etc
*/
interface CaseConverter {
String convert(Stringwords);
}

Now let’s refactor our application. We will cleave out code from individual if-else blocks and make them separate interface implementations. These implementation can be written in separate files, independent of other converters and make the unit testing much simpler

lower CaseConverter
/**
* LowerConverter implements CaseConverter
* and converts all words to lower case
*/
class LowerConverter implements CaseConverter {
@Override
public String convert(Stringwords) {
// Accumulate result in StringBuilder
StringBuilder result = new StringBuilder();
// All words are converter to lower
// And then concatenated
for (String word : words) {
result.append(
word
.trim()
.toLowerCase());
}
return result.toString();
}
}

Now we are focussing on ONE case at a time, which leads to better unit testing and no modification is required in a HUGE switch-case of if-else ladder

camel CaseConverter
/**
* CamelConverter makes
* first word all lower
* subsequent words firt letter upper, rest lower
* "TWO", "WORDS" -> "twoWords"
*/
class CamelConverter implements CaseConverter {
@Override
public String convert(Stringwords) {
// Accumulate result in StringBuilder
StringBuilder result = new StringBuilder();
// First word is all lower
result.append(
words[0]
.trim()
.toLowerCase()
);
// Subsequent words are
// first -> upper
// rest -> lower
for (int offset = 1;
offset < words.length;
offset++) {
// First letter is upper
result.append(
words[offset]
.substring(0, 1)
.toUpperCase()
);
// Remaining letters are lower
result.append(
words[offset]
.substring(1)
.toLowerCase());
}
return result.toString();
}
}

We have separated out each case (if block) into separate implementations, so nows it’s time to factor our main CaseConverter code. This is the crux of the whole exercise –
Replace switch-case/if-else ladder with a MAP

Convert EVIL into Angel
/**
* Our EVIL code has turned into an angel
* We now have a small and maintanable code
*/
class Angel {
// Get rid of old switch-case/if-else ladder
// Replace it with a sweet map
HashMap<String, CaseConverter> converters =
(HashMap<String, CaseConverter>)
Map.of(
"lower", new LowerConverter(),
"camel", new CamelConverter()
);
public String changeCase(String caseType, Stringwords) {
// We support lower and camel cases
// Sanitize the user input
caseType = caseType
.trim()
.toLowerCase();
// Lookup which converter the user wants
CaseConverter converter = converters.get(caseType);
if (converter!=null)
return converter.convert(words);
else
return null;
}
public static void main(String[] args) {
Angel obj = new Angel();
String output = obj.changeCase("camel", "This", "Word");
System.out.println(output);
}
}
view raw CaseAngel.java hosted with ❤ by GitHub

So there.. we have eliminated switch-case/if-else ladder from a static language like Java. We can do similar things in C#, C++ and others too.
I have not focussed on optimisation as the idea was to keep the code simple to understand the conversion process, but we can use Function<> or even SAM (Single Abstract Method) implementation, instead of creating ENTIRE class to implement individual converters.
You feedback, correction and improvement suggestions are welcome

Switch is EVIL Series

  1. Switch is EVIL! (Part 1) – Convert it to arrays
  2. Switch is EVIL! (Part 2) – Convert it to maps
  3. Switch is EVIL! (Part 3) – Create function maps
  4. Switch is EVIL! (Part 3a) – Implementing in Java
  5. Switch is EVIL! (Part 3b) – Implementing in C#

Fundamentals

Maybe it is the way we learnt programming by patterns – “Do this and that will happen” or “this is how it is done”. Maybe we don’t stop to think how things work at more fundamental level.

Here is a piece of code which works in some languages (C, C++, JavaScript) but won’t work in others (Java, C#)

C/C++ allows a statement like 1, 2, 3;
/*
   Works in C/C++
*/
int main()
{
    1, 2, 3;  // Why does this work? What does it mean?
}

Java and C# don't allow 1, 2, 3
/*
   Java (and C#) don't allow
*/
class MyClass {
    static void main(String[] args) {
        1, 2, 3; // Gives an error.. Why?
}

Why do you think this is? Is there a fundamental reason these languages differ? Why did Java disallow this? If you know why, write it in comments below