#2020.07.06 spark3.0 preview2 ๊ธฐ์ค์ผ๋ก ์์ฑ๋์ด ์์ (spark3.0-preview2->spark3.0)
spark3.0 / hadoop3.2 ๊ธฐ์ค์ผ๋ก ์์ฑ๋์์ต๋๋ค.
1) Colab ์ ์
https://colab.research.google.com/
2) ์๋ ธํธ์์ฑ
3) openjdk8 ์ค์น
!apt-get install openjdk-8-jdk-headless
4) spark3.0 ( hadoop3.2 ) tar ๋ค์ด๋ก๋
!wget -q https://www-us.apache.org/dist/spark/spark-3.0.0/spark-3.0.0-bin-hadoop3.2.tgz
5) ์์ถํ๊ธฐ
!tar -xvf spark-3.0.0-bin-hadoop3.2.tgz
6) findspark ์ค์น
!pip install findspark
7) ํ๊ฒฝ๋ณ์ ์ธํ
import os
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["SPARK_HOME"] = "/content/spark-3.0.0-bin-hadoop3.2"
8) findspark ์ด๊ธฐํ ๋ฐ sparksession ์์ฑ
import findspark
findspark.init()
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("spark3_test").master("local[*]").getOrCreate()
9) spark version ํ์ธ
spark.version
๋ค์ ํฌ์คํธ๋ spark3.0์ ๋ํ ์ถ๊ฐ๊ธฐ๋ฅ์ ๋ํ ํ ์คํธ์์
reference
- https://colab.research.google.com/drive/1EcotODzgSnLozSH3hDuBfZr06gJXY8I0#scrollTo=zgReRGl0y23D
๋๊ธ