|
最近迁移到AWS EMR 4.1.0,其中Spark的版本是1.5.0,在Spark Streaming应用部署的过程中,遇到了不少问题:
1. Cast Exception
org.apache.spark.deploy.SparkHadoopUtil cannot be cast to org.apache.spark.deploy.yarn.YarnSparkHadoopUtil
https://github.com/apache/spark/pull/9174
https://github.com/apache/spark/pull/8911
Make sure SPARK_YARN_MODE is set before spark-submit
export SPARK_YARN_MODE=true
2. guava NoSuchMethodError Exception
java.lang.NoSuchMethodError: com.google.common.collect.Queues.newArrayDeque()Ljava/util/ArrayDeque
https://community.cloudera.com/t ... hodError/td-p/21159
- Add bootstrap action to copy guava-16.0.1.jar to each data node
- Set guava-16.0.1.jar as the first postition in both --driver-class-path and spark.executor.extraClassPath
3. Spark Not Using All Available Cores
YARN doesn't actually know how many cores are being used by the containers, so it's just displaying 1 vcore used per executor.
https://forums.aws.amazon.com/th ... dID=218950&tstart=0
Set yarn.scheduler.capacity.resource-calculator=org.apache.hadoop.yarn.util.resource.DominantResourceCalculator in capacity-scheduler.xml |
|