迁移AWS EMR4.1.0 Spark 1.5.0遇到的问题

Zeng · 发表于 2015-11-11 09:42:59

最近迁移到AWS EMR 4.1.0，其中Spark的版本是1.5.0，在Spark Streaming应用部署的过程中，遇到了不少问题：

1. Cast Exception
org.apache.spark.deploy.SparkHadoopUtil cannot be cast to org.apache.spark.deploy.yarn.YarnSparkHadoopUtil

https://github.com/apache/spark/pull/9174
https://github.com/apache/spark/pull/8911

Make sure SPARK_YARN_MODE is set before spark-submit
export SPARK_YARN_MODE=true

2. guava NoSuchMethodError Exception
java.lang.NoSuchMethodError: com.google.common.collect.Queues.newArrayDeque()Ljava/util/ArrayDeque

https://community.cloudera.com/t ... hodError/td-p/21159
- Add bootstrap action to copy guava-16.0.1.jar to each data node
- Set guava-16.0.1.jar as the first postition in both --driver-class-path and spark.executor.extraClassPath

3. Spark Not Using All Available Cores
YARN doesn't actually know how many cores are being used by the containers, so it's just displaying 1 vcore used per executor.

https://forums.aws.amazon.com/th ... dID=218950&tstart=0
Set yarn.scheduler.capacity.resource-calculator=org.apache.hadoop.yarn.util.resource.DominantResourceCalculator in capacity-scheduler.xml

		自动登录	找回密码
密码			立即注册

迁移AWS EMR4.1.0 Spark 1.5.0遇到的问题

浏览过的版块