SparkContext in PySpark
Run this on Jupyter Notebook after installing Spark
from pyspark import SparkContext
sc = SparkContext()
%%writefile example.txt
first line
second line
third line
fourth line
textFile = sc.textFile('example.txt')
textFile.count()
textFile.first()
secfind = textFile.filter(lambda line: 'second' in line)
# RDD
secfind
# Perform action on transformation
secfind.collect()
# Perform action on transformation
secfind.count()
Run this on Jupyter Notebook after installing Spark
from pyspark import SparkContext
sc = SparkContext()
%%writefile example.txt
first line
second line
third line
fourth line
textFile = sc.textFile('example.txt')
textFile.count()
textFile.first()
secfind = textFile.filter(lambda line: 'second' in line)
# RDD
secfind
PythonRDD[7] at RDD at PythonRDD.scala:43
# Perform action on transformation
secfind.collect()
['second line']
# Perform action on transformation
secfind.count()
1
Comments
Post a Comment