In addition to out-of-the-box schedulers for Aurora, Heron can also be deployed on a YARN cluster with the YARN scheduler. The YARN scheduler is implemented using the Apache REEF framework.
Key features of the YARN scheduler:
Heterogeneous container allocation: The YARN scheduler will request heterogeneous containers from the YARN ResourceManager RM. In other words the topology will not request more resources than what is really needed.
Container reuse: The REEF framework allows the YARN scheduler to retain containers across events like topology restarts.
Topology deployment on a YARN Cluster
Using the YARN scheduler is similar to deploying Heron on other clusters, i.e. using the Heron CLI. This document assumes that the Hadoop yarn client is installed and configured.
Following steps are executed when a Heron topology is submitted:
- The REEF client copies the
Heron Core package
and thetopology package
on the distributed file system. - It then starts the YARN Application Master (AM) for the topology.
- The AM subsequently invokes the
Heron Scheduler
in the same process. - This is followed by container allocation for the topology’s master and workers. As a result
N+2
containers are allocated for each topology.
Configuring the Heron client classpath
Under 0.14.2 version (including 0.14.2)
- Command
hadoop classpath
provides a list of jars needed to submit a hadoop job. Copy all jars toHERON_INSTALL_DIR/lib/scheduler
.- Do not copy commons-cli jar if it is older than version 1.3.1.
- Create a jar containing core-site.xml and yarn-site.xml. Add this jar to
HERON_INSTALL_DIR/lib/scheduler
too.
After 0.14.3 version released
It is unnecessary to copy hadoop-classpath-jars to HERON_INSTALL_DIR/lib/scheduler
like what 0.14.2 version requested. #1245 added extra-launch-classpath
arguments, which makes it easier and more convenient to submit a topology to YARN.
Tips
No matter which version of Heron you are using, there is something user should pay attention to if you want to submit a topology to YARN.
For
localfs-state-manager
- The version of common-cli jar should be greater than or equal to 1.3.1.
For
zookeeper-state-manager
- The version of common-cli jar should be greater than or equal to 1.3.1.
- The version of curator-framework jar should be greater than or equal to 2.10.0
- The version of curator-client jar should be greater than or equal to 2.10.0
Configure the YARN scheduler
A set of default configuration files are provided with Heron in the conf/yarn directory. The default configuration uses the local state manager. This will work with single-node local YARN installation only. A Zookeeper based state management will be needed for topology deployment on a multi-node YARN cluster.
- Custom Heron Launcher for YARN:
YarnLauncher
- Custom Heron Scheduler for YARN:
YarnScheduler
- State manager for multi-node deployment:
com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager
YarnLauncher
performs the job of uploader also. SoNullUploader
is used.
Topology management
Topology Submission
Command
Under 0.14.2 version (including 0.14.2)
$ heron submit yarn heron-examples.jar com.twitter.heron.examples.AckingTopology AckingTopology
After 0.14.3 version released
$ heron submit yarn heron-examples.jar com.twitter.heron.examples.AckingTopology AckingTopology --extra-launch-classpath <extra-classpath-value>
Tips
- More details for using the
--extra-launch-classpath
argument in 0.14.3 version. It supports both a single directory which including allhadoop-lib-jars
and multiple directories separated by colon such as whathadoop classpath
gives. The submit operation will fail if any path is invalid or if any file is missing.- if you want to submit a topology to a specific YARN queue, you can set the
heron.scheduler.yarn.queue
argument in--config-property
. For instance,--config-property heron.scheduler.yarn.queue=test
. This configuration could be found in the conf/yarn/scheduler file too.default
would be the YARN default queue as YARN provided.
Sample Output
INFO: Launching topology 'AckingTopology'
...
...
Powered by
___________ ______ ______ _______
/ ______ / / ___/ / ___/ / ____/
/ _____/ / /__ / /__ / /___
/ /\ \ / ___/ / ___/ / ____/
/ / \ \ / /__ / /__ / /
/__/ \__\ /_____/ /_____/ /__/
...
...
com.twitter.heron.scheduler.yarn.ReefClientSideHandlers INFO: Topology AckingTopology is running, jobId AckingTopology.
Verification
Visit the YARN http console or execute command yarn application -list
on a yarn client host.
Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):1
Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
application_1466548964728_0004 AckingTopology YARN heron default RUNNING UNDEFINED 0% N/A
Topology termination
Command
$ heron kill yarn AckingTopology
Log File location
Assuming HDFS as the file system, Heron logs and REEF logs can be found in the following locations:
Logs generated when the topologies AM starts:
<LOG_DIR>/userlogs/application_1466548964728_0004/container_1466548964728_0004_01_000001/driver.stderr
Ths scheduler’s logs are created on the first/AM container:
<NM_LOCAL_DIR>/usercache/heron/appcache/application_1466548964728_0004/container_1466548964728_0004_01_000001/log-files
Logs generated when the TMaster starts in its container:
<LOG_DIR>/userlogs/application_1466548964728_0004/container_1466548964728_0004_01_000002/evaluator.stderr
The TMaster’s logs are created on the second container owned by the topology app:
<NM_LOCAL_DIR>/usercache/heron/appcache/application_1466548964728_0004/container_1466548964728_0004_01_000002/log-files
Worker logs are created on the remaining containers in the YARN NodeManager’s local directory.
Work in Progress
- The YARN Scheduler will restart any failed workers and TMaster containers. However AM HA is not supported yet. As a result AM failure will result in topology failure. Issue: #949
- TMaster and Scheduler are started in separate containers. Increased network latency can result in warnings or failures. Issue: #951