Lambda Function to Resize EBS Volumes of EMR Nodes
I have to start by saying that you should not use EMR as a persistent Hadoop cluster. The power of EMR lies in its elasticity. You should launch an EMR cluster, process the data, write the data to S3 buckets, and terminate the cluster. However, we see lot of AWS customers use the EMR as a persistent cluster. So I was not surprised when a customer told that they need to resize EBS volume automatically on new core nodes of their EMR cluster. The core nodes are configured to have 200 GB disks, but now they want to have 400 GB disks. It’s not possible to change the instance type or EBS volume configuration of core nodes, so a custom solution was needed for it. I explained to the customer, how to do it with some sample Python code, but at the end they gave up to use this method (thanks God).
I wanted to see if it can done anyway. So for fun and curiosity, I wrote a Lambda function with Java. It should be scheduled to run on every 5 or 10 minutes. On every run, it checks if there’s an ongoing resizing operation. If the resizing is done, it connects to the node and run “growpart” and “xfs_growfs” commands to grow the partition and filesystem. If there’s no resizing operation in progress, it checks all volumes of a specific cluster, and start a resizing operation on a volume which is smaller than a specific size.
Here’s the main class which will be used by Lambda function:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
package com.gokhanatil.volumeresizer; import com.amazonaws.services.dynamodbv2.document.Item; import com.amazonaws.services.lambda.runtime.Context; import com.amazonaws.services.lambda.runtime.RequestHandler; import java.util.Map; public class Resizer implements RequestHandler<Map<String, Object>, String> { public String handleRequest(Map<String, Object> input, Context context) { String result = "{'result': 'success'}"; Item volumeInfo = MyDynamoDB.getVolumeInfo(); if (volumeInfo != null) { String targetVolume = volumeInfo.getString("volid"); String targetInstance = volumeInfo.getString("pip"); if (VolumeChecker.isResized(targetVolume)) { MySSH.runShellCommands(targetInstance); MyDynamoDB.deleteVolumeInfo(); result = "{'result': 'resized " + targetVolume + "'}"; } else result = "{'result': 'waiting for " + targetVolume + "'}"; } else VolumeChecker.checkVolumes(); return result; } } |