Wednesday, March 4, 2020

Proper use of Spark Shell While Writing the Spark Code


Problem Satement:
I am writing a spark probelm , which I need to read a csv file and convert that file data to an RDD and apply groupBy transformation on that .
While doing that I have written the following code.

scala> case class Customer(name : String, age :Int, gender:String,zip:String)
scala> var customers = customerData.map{line =>{ val a:Array[String] = line.split(",") Customer(a(0),a(1).toInt,a(2),a(3)) }}

When i execute the above statement, am getting the following error.


<console>:25: error: value Customer is not a member of Array[String]
var customers = customerData.map{line =>{ val a:Array[String] = line.split(",") Customer(a(0),a(1).toInt,a(2),a(3)) }}

I did not understand the error , how it is coming even though I have written the correct code ..
after analyzing the code and the source of this code . I came to know that I have written all the code in single line...
So that compiler is not able to understand the Customer Case Class.

Solution:
I have added semicolon(;) after splitting the line.
Then code is working fine.

scala> var customers = customerData.map{line =>{ val a = line.split(","); Customer(a(0),a(1).toInt,a(2),a(3)) }}
customers: org.apache.spark.rdd.RDD[Customer] = MapPartitionsRDD[19] at map at <console>:27


No comments:

Post a Comment

Recent Post

Databricks Delta table merge Example

here's some sample code that demonstrates a merge operation on a Delta table using PySpark:   from pyspark.sql import SparkSession # cre...