Monday, August 3, 2020

Create tuple in spark

scala> val rdd = sc.parallelize(for {
     |     x <- 1 to 3
     |     y <- 1 to 2
     | } yield (x, y), 8)

rdd: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[13] at parallelize at <console>:24

scala> rdd.collect

res8: Array[(Int, Int)] = Array((1,1), (1,2), (2,1), (2,2), (3,1), (3,2))



No comments:

Post a Comment

Recent Post

Databricks Delta table merge Example

here's some sample code that demonstrates a merge operation on a Delta table using PySpark:   from pyspark.sql import SparkSession # cre...