Execute shell script as one of the steps on EMR AWS

amazon-web-serviceshadoop

We are thinking to migrate our Hadoop infrastructure from Data Center to AWS EMR. As some of the tasks / stages in ETL process are dependent e.g. flow is like

  1. Map Reduce job will generate data
  2. Shell script will move the data generated in step 1 to the output location

In EMR, we could find steps for Custom Jar, Pig, Hive, but did not find option to execute shell script. Few options we have to overcome this is,

  • We can write the shell script logic in java program and add custom jar step.
  • Bootstrap action. But as our requirement is to execute the shell script after the step 1 is complete, I am not sure whether it will be useful.

Rather than reinventing the wheel, if any other option which is directly available from EMR or AWS which fulfil our requirement, then our efforts would be reduced.

Best Answer

Please refer to the link: http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hadoop-script.html

aws emr create-cluster --name "Test cluster" –-release-label  --applications Name=Hive Name=Pig --use-default-roles --ec2-attributes KeyName=myKey --instance-type m3.xlarge --instance-count 3 --steps Type=CUSTOM_JAR,Name=CustomJAR,ActionOnFailure=CONTINUE,Jar=s3://region.elasticmapreduce/libs/script-runner/script-runner.jar,Args=["s3://mybucket/script-path/my_script.sh"]
Related Topic