Running JVM Applications on Kubernetes: Beyond java -jar

Discover some important tips about running JVM applications in containerized environments orchestrated by Kubernetes

Thiago Mendes
CodeX

--

Container Tamagotchi running JVM

First off, I apologize for that lovely artwork above :D

For those of you who might be younger and not familiar, taking care of a Tamagotchi used to be quite the task. If you didn’t give it the attention it needed, it would end up perishing.

Running a JVM in a container is akin to having a modern Tamagotchi. You need to take care of it for it to function properly, feeding it with the necessary resources like CPU and memory so it can perform its tasks as best as possible.

Target Audience

Are you running JVM applications in containerized environments on Kubernetes and feel like you could be optimizing performance and cost better?

Are you considering migrating your JVM applications to this type of environment but have doubts about how to do it efficiently?

Do you struggle with latency and throughput issues in your JVM applications running on Kubernetes?

Or do you simply want to learn more about optimizing JVM applications in Kubernetes environments?

Then, this post is for you!

Before we begin, it’s important to clarify that in some very specific scenarios, not all the tips provided below may make sense. However, I believe that for the majority of use cases where we have services running within a JVM in a containerized environment orchestrated with Kubernetes, the following information can be quite useful.

For a better understanding of the tips we’ll be presenting next, it’s important to align some basic concepts:

  • Java (Programming Language) VS Java Platform/JVM: Do not confuse the Java programming language with the Java platform. Nowadays, the JVM is capable of running and supporting various programming languages beyond Java, such as Kotlin, Scala, Groovy, Clojure, among others. Throughout this post, we will mainly discuss the JVM in relation to the execution environment (runtime), rather than the programming languages themselves.
Java (Programming Language) VS Java Platform.

With that in mind, let’s move on to the tips.

1) Ergonomics: The Hero and the Villain

Today, when you run your JVM application simply by executing the java -jar command, there’s a feature of the JVM called Ergonomics that tries to find an appropriate configuration based on the environment where the JVM is being executed.

At first, you might think: Well, isn’t that good? And the answer is: it depends. On one hand, this feature can help you build and run your application faster without worrying about fine-tuning, but when it comes to large-scale environments, the results may not be optimal.

Some of the automatic configuration adjustments made by Ergonomics in the JVM, which can directly impact performance and resource consumption, relate to the choice of Garbage Collector and the size of the Heap. Let’s talk a bit more about them.

Choice of Garbage Collector

The choice of GC is based on two conditions: the amount of memory and CPUs available to the JVM.

The rule works as follows:
Until Java 8: If the number of CPUs is equal to or greater than 2 and the amount of memory is greater than 1792MB, the chosen GC will be the ParallelGC. If either of these two conditions is below the mentioned values, the chosen GC will be the SerialGC.

From Java 9 onwards: The conditions are practically the same, but in scenarios where the number of CPUs is equal to or greater than 2 and the amount of memory is greater than 1792MB, instead of the ParallelGC, we will have the G1GC as the chosen GC.

GC selection process in JVM running inside containers.

For the more curious ones, here’s a snippet of the GC selection implementation through Ergonomics in JDK 11.

jdk11u/gcConfig.cpp at master · openjdk/jdk11u · GitHub

jdk11u/os.cpp at master · openjdk/jdk11u · GitHub

Heap Size

With a few exceptions, in the vast majority of cases, when we don’t specify the desired heap size using the -Xmx parameter or the -XX:MaxRAMPercentage flag, Ergonomics ends up configuring the maximum heap value as ¼ of the available memory. For example, if your container has a memory limit set to 2GB, Ergonomics will configure the maximum heap size as 512MB.

Exceptions mentioned:

  • If the container has up to 256MB of available memory, the maximum heap value will be 50%.
  • If the container has between 256MB and 512MB of available memory, the maximum heap value will be approximately 127MB.

But what are the impacts of all this?

Regarding the automatic choice of GC, we need to keep in mind that the SerialGC will not perform well in high-concurrency server-side environments due to the long pause times it generates.

But then the question may arise: in a scenario where my container has a limit of 1000m and consequently the JVM has only 1 CPU available, what would be the advantage of using a GC that utilizes multi-threading resources?

To answer this, I won’t go into specific details, but to summarize, from the container’s perspective, there is confusion; many people confuse 1000m (1000 millicores) with 1 CPU. However, instead, millicores represent computational capacity time, something that can be distributed among all available CPUs of the node.

For a better understanding of millicores in Kubernetes, please visit the page: Resource Management for Pods and Containers | Kubernetes

The confusion arises because the JVM interprets 1000m as 1 available CPU, 1001m as 2 CPUs, 2001 as 3, and so on.

Knowing this, it’s possible for the JVM to leverage multi-threading GCs even when running in containers limited to 1000m of CPU. To achieve this, we can force a number of available CPUs for the JVM using the -XX:ActiveProcessorCount flag, passing values greater than 1.

It’s important to keep in mind that even though we can use other GCs besides the SerialGC with only 1000m, depending on the use case, this may result in poor performance because of the low CPU availability. We may experience application throttling due to the brief exceeding of available quota (CFS Quota).

For a better understanding of CPU quotas in Kubernetes, please visit the following page: Control CPU Management Policies on the Node | Kubernetes

To find out which GC implementation your JVM is using, access the container using kubectl exec and execute the following command:

java -XX:+PrintFlagsFinal -version | grep “Use*.*GC “

You will get an output similar to the example below, indicating which implementation is being used through a boolean value:

In the example from the print, you can see that the implementation in use is the SerialGC, and it was configured through Ergonomics.

Regarding the heap size, by using only ¼ of the available memory, we may have a waste of resources, as the container is specifically built to run your application and we shouldn’t have other parallel processes to consume the remaining memory. However, it’s always important to keep in mind that the JVM is not composed solely of the heap but also includes other components that we call non-heap. In addition to the heap and non-heap, there are still some processes related to the ‘operating system’ that also consume memory, albeit in smaller quantities. We’ll discuss this further in the next tip.

I placed ‘operating system’ in quotes because when referring to containers, it’s not actually an operating system but rather an emulated environment that utilizes the host system through isolation resources. However, I believe the analogy is valid for understanding.

JVM memory consumption in containers.

To find out the maximum heap size configured for your JVM, access the container using kubectl exec and execute the following command:

java -XX:+PrintFlagsFinal -version | grep “ MaxHeapSize”

You will get an output similar to the example below:

In the example from the print, you can see the MaxHeapSize represented in bytes, and it was configured through Ergonomics.

Summary of the tip:

  • Avoid using SerialGC in high-concurrency server-side environments by ensuring that the JVM is not limited to only 1 available CPU. There are several ways to achieve this, such as adjusting the container’s CPU limit to above 1001m or using the -XX:ActiveProcessorCount flag with values greater than 1.
  • Be aware that if your container has less than 1792MB of available memory and you don’t force a specific GC version, Ergonomics will also select SerialGC.
  • You can also specify the desired GC implementation through JVM arguments. As a recommendation, use ParallelGC for heaps up to 4GB and G1 for heaps above 4GB. Although there are more GC implementations available, this should cover most use cases. Here are some examples of arguments to force the use of a specific GC:
# Serial GC
java -XX:+UseSerialGC -jar meuapp.jar
# Parallel GC
java -XX:+UseParallelGC -jar meuapp.jar
# Parallel G1 GC
java -XX:+UseG1GC -jar meuapp.jar
# Parallel UseShenandoahGC GC
java -XX:+UseShenandoahGC -jar meuapp.jar
# Parallel Z GC
java -XX:+UseZGC -jar meuapp.jar
  • For better performance and to avoid application throttling, avoid containers with less than 2000m CPU limits. In many situations, it’s more beneficial to have a single JVM in a container with a 2000m CPU limit than two separate JVMs in two containers with 1000m CPU limits. Sometimes, less is more — think about it.
  • To avoid wasting resources, properly configure the JVM heap size using the -Xmx parameter or the -XX:MaxRAMPercentage flag. We’ll talk more about appropriate heap sizes in the next tip.
# Example
# Using MaxRAMPercentagejava -XX:MaxRAMPercentage=50.0 -jar meuapp.jar# Using xmxjava -Xmx1g -jar meuapp.jar

2) Proper Memory Sizing: JVM’s Life Beyond the Heap

Reading the first tip, you might wonder: Why not configure the heap size to 100% of the available memory in the container for resource optimization?

I’ll save the answer to that question for the end of this tip. Before that, I’d like to briefly explain the memory areas that compose the JVM.

As hinted in tip 1, apart from the Heap, the JVM’s memory areas include the non-heap, also known as Native Memory or Off-heap memory. Within the non-heap, we have several important components, including Metaspace, Code Cache, Stack Memory, GC Data, among others.

I won’t delve into each of these components as it’s not the aim of this post. My intention is simply to clarify that besides the heap, there are other areas of the JVM that also consume memory from the host, in this case, the container where the application is running.

Now you might be thinking: Well, can’t I just divide the available memory in the container between heap and non-heap?

The answer is: no. We need to consider that the base image used by this container, where the JVM is executing, also has its emulated operating system, which consumes a certain amount of memory to remain active.

So, now, answering the initial question: Why not configure the heap size to 100% of the available memory in the container?

If we set the heap size to 100% of the available memory, our container will be killed by OOM (Out Of Memory), because we need to consider that in addition to the heap, the non-heap and the operating system also use the container’s memory.

Error due to heap with 100% consumption of container memory.

But then, a new question may arise: How should this memory division between heap, non-heap, and the operating system be?

I’ve seen cases in the literature mentioning that 75% of available memory is a safe value to be configured for the heap. However, in practice, I’ve encountered issues when using values above 60%. Personally, my professional experiences have shown that allocating 50% for the heap is a safe value, even though it might initially seem too conservative. Nevertheless, I don’t dismiss the possibility of using values above 50%, but it’s important that this is validated. For validation, we can utilize load testing.

Heap size with 75% inside the container.

Another important point when it comes to heap size is to understand that very small heaps can cause excessive work for the garbage collector, which can result in higher CPU usage and compromise application performance. On the other hand, very large heaps can significantly impact application startup time and lead to long garbage collection times.

Summary of the tip:

  • To optimize memory usage in your containers, configure the heap size of your JVM between 50% and 75%, reserving the remaining value for non-heap and the operating system.
  • Use load testing to validate that the configured heap size is suitable for your application, checking for occurrences of OOM (Out Of Memory) during test execution. You can use monitoring tools to track JVM memory consumption and also monitor pod metrics through Kubernetes Metrics Server using the Kubernetes API.
  • Avoid very small heaps to prevent compromising application performance.
  • Avoid very large heaps to prevent impacting application startup time and avoid generating long garbage collection times.

3) Xms Equals Xmx: Just Tell Me How Much You Need

Before we dive into this tip, let’s have a basic summary of Xms and Xmx.
These are two parameters used to inform the JVM about the minimum (Xms) and maximum (Xmx) heap sizes.

It works like this: Xms represents the initial memory allocation for the heap by the JVM, and Xmx represents the maximum amount that can be allocated. The JVM initially allocates the value defined in Xms, and during program execution, as needed, this value may increase up to the value defined in Xmx. In cases where the JVM attempts to allocate values beyond Xmx, we encounter OutOfMemoryError occurrences.

Before container usage became as widespread as it is today, applications running on JVMs often shared the same server. In such scenarios, it was common to set a smaller value for the Xms parameter and a larger value for Xmx, aiming for better resource utilization and sharing among concurrently executing processes. Thus, the JVM would only allocate memory as needed, returning it to the host when no longer in use.

When dealing with containers, the scenario changes. In most cases, there are no other relevant parallel processes running within the same container. Therefore, there’s no need to dynamically allocate and return memory to the host during program execution. To avoid the JVM having to deal with memory allocation tasks, we can configure the value of Xms to be equal to the value of Xmx.

Summary of the tip:

  • To avoid the JVM having to deal with memory allocation and deallocation tasks, use the value of Xms equal to the value of Xmx. To achieve this, use the parameters -Xms and -Xmx in the JVM configuration.
# Example
java -Xms2g -Xmx2g -jar meuapp.jar

4) CPU Overbooking in JVM Containers: A Risky Practice, but Acceptable in Some Cases

Before we proceed with this tip, it’s important to explain two concepts of Kubernetes: “Pod and Container Resource Management” and “Quality of Service (QoS)”. These concepts will help better understand the topic to be addressed next.

Pod and Container Resource Management

At this point, I’d like to discuss the concept of requests and limits for CPU and memory. Basically, requests represent the minimum amount of resources a container needs to run, while limits represent the maximum amount of resources a container can consume within the cluster.

Memory requests and limits can be configured in the YAML file used to create the Pod in Kubernetes, under the “resources” section.

A simple example would be:

apiVersion: v1
kind: Pod
metadata:
name: exemplo
spec:
containers:
- name: nome-container
image: nome-imagem:latest
resources:
requests:
memory: "1Gi"
cpu: "1"
limits:
memory: "2Gi"
cpu: "2"

At this point, I would like to ask a question related to the topic addressed in tip 1, where we discussed Ergonomics and the default configurations it sets for the JVM according to the available resources on the host. The question is as follows:

If we run a JVM inside a container, without any customization, letting Ergonomics define the settings, with the container’s requests and limits equal to those mentioned in the YAML example above, what would be the GC settings and maximum heap size adopted by the JVM? Would it consider the values of request or limit defined for the container?

To answer this question, let’s run a container with a JVM version 17 and with the configurations from the YAML above. Then, we will access the container via kubectl exec, check which GC was chosen, and what the maximum heap size is.

If the JVM respects the request for 1GB of memory and 1 CPU, we will have the GC chosen as SerialGC and the maximum heap size of 256MB (¼ of 1GB).

However, if the JVM respects the limit of 2GB of memory and 2 CPUs, we will have the GC chosen as G1GC and the maximum heap size of 512MB (¼ of 2GB).

Below are the pod and container configurations after execution:

The result of the JVM configurations executed by Ergonomics is as follows:

It was noticeable that the JVM considered the value of the container’s limit. Therefore, the chosen GC was G1GC, and the maximum heap size was 512MB, corresponding to ¼ of 2GB.

Now let’s take a break from the topic of resource management and talk about the second concept…

QoS (Quality of Service)

The QoS (Quality of Service) in Kubernetes is a way to classify pods into three categories: Guaranteed, Burstable, and BestEffort, based on the resources they request and consume. This classification is used by Kubernetes to determine the priority of each pod relative to other pods in case of resource scarcity.

The rules for classifying a pod into each QoS category are defined by the following conditions:

  • Guaranteed: At this level, for both CPU and memory, the values of request and limit need to be specified and equal. Example:
resources:
requests:
memory: "3Gi"
cpu: "2"
limits:
memory: "3Gi"
cpu: "2"
  • Burstable: This level is for pods that do not meet the rules to be classified as Guaranteed. At least one container in the pod needs to have a request or limit for memory or CPU. Example:
resources:
requests:
memory: "3Gi"
# Another example of Burstable:
resources:
requests:
memory: "3Gi"
cpu: "1"
limits:
memory: "3Gi"
cpu: "2"
# In this example, even though the memory request and limit are equal,
# the CPU request is less than the limit,
# which classifies the pod as Burstable.
  • BestEffort: A pod is classified as BestEffort when none of the containers inside the pod have request or limit configurations for CPU or memory. If at least one container has some request or limit configuration, the classification changes to Burstable.

But after all, what is the importance of QoS and its level categories?

When a node in the cluster is overloaded or experiencing resource scarcity, the Kubernetes scheduler can select a pod to be removed based on its QoS priority, which is defined as follows:

Best-Effort: These pods have the lowest priority and are the first to be removed. Since they do not have defined resource requests or limits, they are the easiest to remove without impacting the cluster.

Burstable: These pods have medium priority and are removed after Best-Effort pods since they have at least minimal resource request configurations.

Guaranteed: These pods have the highest priority and are the last to be removed since they have both request and limit values equal for CPU and memory, ensuring they will have the necessary resources to execute properly.

Cool, now that we’ve discussed the above two concepts, “Pods and Containers Resource Management” and “QoS (Quality of Service),” you might be wondering if configuring all pods as Guaranteed would be the best option, as this level offers greater assurance and stability for pods, right?

So, configuring all pods as Guaranteed, always keeping request and limit equal for CPU and memory, can make your environment very costly since the nodes would accommodate fewer pods, requiring more nodes in your cluster. Additionally, this configuration can lead to underutilization, leaving idle resources that could be shared with other pods, resulting in inefficient resource utilization.

On the other hand, configuring all pods as Guaranteed ensures they will always have the necessary resources to run without failures and delays, guaranteeing application stability. Moreover, this configuration can be a good option for critical applications that require high availability and performance, as it ensures maximum priority for pods.

Now, before we move on to the tip itself, I would like to discuss just one more additional concept so that you really understand what I am suggesting.

Let’s talk about the concept of Overbooking in Kubernetes clusters.

In general, overbooking is a technique that allows allocating more resources than are available, expecting that not all allocated resources will be used simultaneously. For example, in an airline, overbooking is used to sell more seats than the total number available on a flight, assuming that not all passengers will show up for the flight.

In the context of Kubernetes, overbooking can be applied to CPU and memory resources allocated for containers in a pod. This means it is possible to allocate more resources than the total available in the cluster, based on requests and limits defined for the containers.

As an example, imagine you have a node with 4GB of memory and 4 CPUs. On this node, you have two pods, each with one container with the following resource configurations:

resources:
requests:
memory: "1Gi"
cpu: "1"
limits:
memory: "1Gi"
cpu: "3"

Note that the CPU limit is set to 3. The point is that, considering the two pods running, each with one container with the configurations above, if we multiply the number of pods by the CPU limit configured, we will have a result of 6 CPUs, which exceeds the maximum number of CPUs available on the node.

When working with Overbooking in Kubernetes, it’s important to keep in mind that CPU is considered a “compressible” resource, while memory is not. This means that Kubernetes will ensure your containers receive the requested amount of CPU and will limit the rest. However, if a container starts exceeding CPU limits, Kubernetes will begin throttling the container, meaning it will limit CPU usage, which may result in a performance drop for the application, though it’s important to note that the container will not be terminated or removed.

Unlike CPU, memory is not a compressible resource. This means that if the node runs out of available memory, Kubernetes will need to make decisions on which containers to terminate to free up memory space.

Considering that JVM applications, in most use cases, demand more memory, except in situations involving intensive data processing or complex calculations, here’s a tip to reduce the cost of your environment, especially development environments.

To ensure efficient use of the cluster, for memory, use request values equal to the limits, but for CPUs, overbook and use request values lower than the limits, keeping the pods at the Burstable level. This will allow the CPU resource to be shared among the pods, resulting in greater cluster efficiency.

If by chance all pods need to use CPU limits simultaneously (which is unlikely), use excessive CPU consumption as a trigger for node autoscaling to balance the environment’s load. In this scenario, the only negative impact would be temporary container throttling until the nodes are scaled.

This strategy may seem controversial at first, but it can dramatically optimize the cost of your environment, as the price of CPU is proportionally higher compared to memory.

Summary of the tip

  • To reduce costs in your Kubernetes environment, especially in testing environments, consider configuring pods running JVM applications at the Burstable level, with memory containing equal requests and limits. However, for CPU, overbook by setting requests lower than the limits.

5) JVM & HPA: If you have to use the standard, prioritize CPU over memory

Before we proceed with this tip, I’d like to briefly talk about the Horizontal Pod Autoscaler (HPA).

The HPA is a Kubernetes feature that helps automatically adjust the number of replicas of your application based on CPU or memory usage of the pods. It increases or decreases the number of replicas to meet a specific demand. If there’s a higher need for processing power or memory, the HPA increases the number of replicas. When the demand decreases, the HPA reduces the number of replicas. This way, the HPA maintains service availability even during peak demand.

When dealing with JVM applications, it’s common for them to consume more memory than CPU. Therefore, it might be tempting to configure the HPA to scale based on memory. However, it’s important to remember that the JVM experiences fluctuations in memory consumption due to garbage collection and memory allocation processes. This can make using memory as a trigger for the HPA unstable, potentially disrupting the horizontal scaling process of your application.

Knowing this, since by default Kubernetes allows us to scale considering memory and CPU metrics, our only choice is to consider CPU as an alternative to configure HPA triggers.

It’s important to remember that Kubernetes allows the use of other custom metrics for HPA, but in this post, we’ll focus on the default metrics.

But don’t think we’re advocating for using CPU as the HPA trigger just because it’s the only available option. In fact, there’s an interesting strategy for horizontally scaling JVM applications on Kubernetes that involves using CPU as the metric. Let’s delve into that.

The idea is basically this: In high-demand scenarios, the JVM starts to allocate memory intensely, leading to increased work for the application’s Garbage Collector (GC), including full garbage collection cycles (Full GC), which are CPU-intensive processes. This is where things start to make sense for the HPA’s role. In this context, the HPA can be used to horizontally scale the application based on the CPU metric, as it can be an indicator of the application’s demand.

Here’s an example of how this HPA configuration would look like:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: nome-do-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: nome-do-deployment
minReplicas: 1
maxReplicas: 5
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50

In this example, the HPA is configured to monitor the CPU utilization of the deployment and automatically adjust the number of pod replicas to maintain CPU utilization at 50%. The minimum number of replicas is 1, and the maximum is 5, allowing the application to horizontally scale according to demand.

The HPA periodically monitors the CPU utilization of the deployment, and if the utilization remains below the threshold defined for a certain period, the HPA will gradually begin to reduce the number of replicas until it reaches the defined minimum value. By default, Kubernetes waits for 5 minutes before starting the scale-down process.

Summary of the tip

  • To improve the performance of your Java Virtual Machine (JVM) applications in high-demand situations, use CPU metrics as the default configuration for the Horizontal Pod Autoscaler (HPA) in Kubernetes. This is because intense memory allocation by the JVM during traffic spikes can significantly increase the workload of the garbage collector, leading to full garbage collection cycles (Full GC) and excessive CPU usage.

Conclusion

I apologize for the length of this post, my intention was to provide comprehensive and detailed content. I acquired this knowledge through extensive research and experiences in projects throughout my career, and I hope that the shared tips have been helpful to you.

I’d like to remind you that this is just the beginning, and I intend to share more tips like these in a possible part 2.

If you have any criticism, suggestions, or simply liked the content, please leave a comment so I can know.

--

--