My First Task in Amazon and How I Tried to Change 512 to 512
It has been some time since I joined Amazon but every now and then I remember the first task I took cause it always gives me a good chuckle. Also it makes me remember that sometimes all it takes the fix it is finding that missing semicolon.
By the way if you want to learn how I got into Amazon, you can check out my previous post here.
Before We Start Let Me Give You Some Background
There is no doubt that starting a new job is hard but starting in a new job at a huge company after you worked in startups all your life is even harder.
You have to adapt to the language, the people, the meetings, the culture and the way to do things in general. Also adding the scale into this, what you may not consider as a valid task in a different job may needs to be split into 3 different tasks for a company like Amazon. So believe me when I say, even creating a pull request feels like a huge commitment in your first weeks.
That is why when I join on a new company; for my first task I try to pick the easiest to implement, somewhat impactful thing that also requires me to do a bit of research and talk with other people. Following this method, I find out how the tools function, learn how people work and be somewhat useful to the overall goal of the team I joined.
That is, if things were to go as I planned…
Change the Heap Size from 256 to 512
This was the title of the task I picked up. I do not want to get too technical but giving it some context: after some load testing, the team noticed that the heap memory usage during the garbage collection period was a little too close to our maximum heap memory limit so we decided doubling the maximum heap size from its current value would solve the issue.
The task was well document with the links where to do the code change and where to validate the code change. In the configuration file you could see the current maximum heap size was set to 256MB.
So the only matter was just changing the value from 256 to 512.
But just before I was about the change the value, I had a talk with a teammate and he suggested that changing that option may not be enough and he pointed me to a documentation that was saying the options in our configuration were deprecated and we should use the new options.
Still easy, instead of just changing the value I also changed the option itself with it.
After getting the approval and merging my change only thing left for me to do was to check the dashboard after the load test to validate that my changes are working.
Problem is even after several load tests, the dashboards were exactly the same as before.
Here I should mention that the dashboards for heap usage were based on percentages so there was not any way to validate the actual heap size in megabytes by looking at the dashboards.
A Quick Roadmap Appears
The change was not working. Now the problem was to find why?
My initial suspicions were as follows:
1-) Our instances have limited memory so that they do not allow more than 256MB of heap size
A quick search validated that this assumption is not true in an hour or two.
2-) Configuration is not truly changing the maximum heap size
To validate if the configuration is truly working or not, I set the maximum heap size to 1MB to see if the project build and run locally. It did not run so option was working correctly.
3-) There is an upper limit on the Maximum Heap Size
Since I validated the configuration is working I thought maybe there was some limit set for maximum heap size. So I set the v to 2GB and rebuilt and ran the project. From the logs I was able to see the Maximum Heap Size was set to 2GB so this assumption was wrong as well.
4-) Some dependency is overwriting the configuration
Since I crossed out every other option, this was the only conclusion I had. Although improbable but not impossible, some dependency we use to deploy our project may have some affect on the configuration we were setting. Therefore, I did an internal search and asked some questions but the answers I had were all pointing me to the configuration I was already doing in our project. When I arrived to the conclusion that internal search is not helping me, I reached out to my more senior teammates and ask for their help. They helped me get in contact with another team that used to have some problems with their heap usage and set up a meeting.
The meeting I had with that team did not give me any answers either since their problems were totally unrelated to what our problem was.
So at this point I began to think what was I missing. How could I not be able to complete what was supposed to be just one line change in the code?
What was I missing?
If you didn’t notice it yet the answer to what was I supposed to is already written above:
From the logs I was able to see the Maximum Heap Size
Up until this point I assumed we had 256MB of heap size but never actually confirmed it. The task was presented to me this way, it was literally written in the title we had a heap size of 256MB; not to mention the configuration we had before was also saying that the heap size was 256MB.
But when I checked the logs I found out the horrible truth.
The heap was already 512MB.
So for the past week I was trying to change 512MB to 512MB and banging my head around thinking why it was not increasing the heap size.
Turns out the configuration we had was deprecated and the project defaulted to use some percentage of the total memory as the maximum heap size which was in our case 512MB.
After I pointed out this to my teammates I made the quick code change and merged it. Then I was able to validate the change in the dashboards.
Hope you enjoyed this story and learned something along the way.
I try to write a story about my expat adventures in Barcelona every week. If you are interested please give me a follow.