My Experiences : Proper use of Spark Shell While Writing the Spark Code

Wednesday, March 4, 2020

Problem Satement:

I am writing a spark probelm , which I need to read a csv file and convert that file data to an RDD and apply groupBy transformation on that .

While doing that I have written the following code.

scala> case class Customer(name : String, age :Int, gender:String,zip:String)

scala> var customers = customerData.map{line =>{ val a:Array[String] = line.split(",") Customer(a(0),a(1).toInt,a(2),a(3)) }}

When i execute the above statement, am getting the following error.

<console>:25: error: value Customer is not a member of Array[String]

var customers = customerData.map{line =>{ val a:Array[String] = line.split(",") Customer(a(0),a(1).toInt,a(2),a(3)) }}

I did not understand the error , how it is coming even though I have written the correct code ..

after analyzing the code and the source of this code . I came to know that I have written all the code in single line...

So that compiler is not able to understand the Customer Case Class.

Solution:

I have added semicolon(;) after splitting the line.

Then code is working fine.

scala> var customers = customerData.map{line =>{ val a = line.split(","); Customer(a(0),a(1).toInt,a(2),a(3)) }}

customers: org.apache.spark.rdd.RDD[Customer] = MapPartitionsRDD[19] at map at <console>:27

My Experiences