[입 개발] spark-submit 시에 –properties-file 와 파라매터에서의 우선 순위

어쩌다보니… 갑자기 SparkSubmit 시에 사용되는 –properties-file(일종의 spark-defaults.conf)와 그냥 파라매터로 넘기는 것의 우선순위가 어떻게 적용되는지가 궁금해 졌습니다. 뭐, 당연히 일반적으로 생각하면 파라매터로 넘기는 것이 분명히 spark-defaults.conf 에 들어가있는 것 보다는 우선이 되는게 당연하겠지라는 생각을 가지고 있었고, 결론부터 말하자면, 이게 맞습니다.(다를 수가 없잖아!!! 퍽퍽퍽)

그러나, 우리는 공돌이니 그래도 명확하게 해두자라는 생각이 들어서, 소스를 가볍게 살펴봤습니다.
실제로 해당 내용은 “core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala” 파일을 살펴보면 들어있습니다. 일단 main 코드는 다음과 같습니다. 여기서는 아주 간단히 확인할 것인데… 이름 부터 이미 parse 와 mergeDefaultSparkProperties 가 있습니다. 우리는 우선순위가 궁금할 뿐이니… parse 에서 가져온 것들을 mergeDefaultSparkProperties 에서 덮어쓸까만 확인하면 됩니다.

  parse(args.asJava)
  // Populate `sparkProperties` map from properties file
  mergeDefaultSparkProperties()
  // Remove keys that don't start with "spark." from `sparkProperties`.
  ignoreNonSparkProperties()
  // Use `sparkProperties` map along with env vars to fill in any missing parameters
  loadEnvironmentArguments()
  useRest = sparkProperties.getOrElse("spark.master.rest.enabled", "false").toBoolean
  validateArguments()

parse를 확인해 봅시다. 특별히 중요한 것은 없고 findCliOption 가 넘겨진 opts 중에서 해당 옵션이 있는지 확인하는 코드이고 handle 에서 실제로 해당 값을 셋팅하는 코드가 있습니다.

  protected final void parse(List args) {
    Pattern eqSeparatedOpt = Pattern.compile("(--[^=]+)=(.+)");

    int idx = 0;
    for (idx = 0; idx 
      val properties = Utils.getPropertiesFromFile(filename)
      properties.foreach { case (k, v) =>
        defaultProperties(k) = v
      }
      // Property files may contain sensitive information, so redact before printing
      if (verbose) {
        Utils.redact(properties).foreach { case (k, v) =>
          logInfo(s"Adding default property: $k=$v")
        }
      }
    }
    defaultProperties
  }

즉 defaultProperties -> sparkProperties 로 저장이 되는 겁니다. 그러면. 실제로 이 값의 우선순위는 어디에 저장이 되는가? 실제로 loadEnvironmentArguments 에서 해당 값이 설정이 됩니다. 아래에 보시면 Option에 먼저 executorMemory 가 NULL 이면 orElse 로 아까 저장한 sparkProperties 에서 가져오고 그래도 없으면 환경 변수에서 가져오고, 그래도 없으면 Null이 리턴됩니다.

  private def loadEnvironmentArguments(): Unit = {
    ......
    executorMemory = Option(executorMemory)
      .orElse(sparkProperties.get(config.EXECUTOR_MEMORY.key))
      .orElse(env.get("SPARK_EXECUTOR_MEMORY"))
      .orNull
    ......
  }

마지막으로 정리하면 결국 우선순위는 다음과 같습니다.

파라매터로 전달함 –executor-memory 이런식으로
properties-file 로 저장한 값
환경변수

그런데 무조건 되는가에 대한 고민을 더 하셔야 합니다. 예를 들어 파라매터로 넘길 수 있는 것이 100%는 아닙니다. 다른 설정이 spark 설정 파일에 있을 수 가 있는 거죠. 즉 spark.yarn.executor.memoryOverhead 이런 값이 spark 설정 파일에 있다면, 여전히 이것 때문에 문제가 발생할 수 있다라는 것을 알아야 합니다.

[입 개발] spark-submit 시에 –properties-file 와 파라매터에서의 우선 순위

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112