Wednesday, May 19, 2021

Spark Performance Tuning Optimisation

 Apache Spark Performance Tuning and Optimizations for Big Datasets

Spark performance tuning from the trenches

Spark Optimisation Techniques

Apache Spark Optimization Techniques and Tuning

Fine Tuning and Enhancing Performance of Apache Spark Jobs

Optimizing the Skew in Spark

Spark Job Optimization: Dealing with Data Skew

Spark Scala Notes

create external table with data in csv

 create external table  if not exists my_db.mytable(
  name string,
 age int)
COMMENT 'Example table'
LOCATION '/data/warehouse/employee_data'

What is difference between tail -f and tailf in unix?

The tailf just waits almost infinite. But the tail -f begin to print the output within seconds.

I have digged deeper by examining the underlying system calls using strace command. The results given below:

# strace tailf /var/log/messages

stat("/var/log/messages", {st_mode=S_IFREG|0600, st_size=47432599401, ...}) = 0
open("/var/log/messages", O_RDONLY)     = 3
fstat(3, {st_mode=S_IFREG|0600, st_size=47432600425, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f7dba2d1000
read(3, "Nov  1 03:23:01 hostnameXXXX"..., 4096) = 4096
read(3, "\nNov  1 03:54:33 del"..., 4096) = 4096
read(3, "io.c(600) [receiver=3.0.6]\nNov  "..., 4096) = 4096

As you can see, the tailf is trying to read (buffer) all the lines from beginning before generating output to the screen.

Check the output of tail -f below, here it is using the system call lseek (C/C++) to directly jump to end of file and start reading from there:

# strace tail -f /var/log/messages

open("/var/log/messages", O_RDONLY)     = 3
fstat(3, {st_mode=S_IFREG|0600, st_size=47294167448, ...}) = 0
lseek(3, 0, SEEK_CUR)                   = 0
lseek(3, 0, SEEK_END)                   = 47294170917
lseek(3, 47294169088, SEEK_SET)         = 47294169088

Sunday, May 16, 2021

Connect Android Device to Ubuntu Laptop


Step 1: How to Install scrcpy in Ubuntu 18.04.5

Step 2: How to enable USB in Samsung Tablet.

Step A:


=> About tablet

        => Software Information

                => Build Number => here tab 7 times.. then Developer option will be


Step B:


=>Enable USB Debugging

Step 3: Connect Table to Laptop with USB Cable

Step 4: Now Type the following to connect ti the tablet.

scrcpy --lock-video-orientation 1

Step 5: Increase screen size in scrcpy.

scrcpy 1.13 adds two new options related to orientation: an option to lock the video orientation, and shortcuts to rotate display in steps of 90°.

scrcpy --lock-video-orientation 0 
  • 90° counterclockwise:

scrcpy --lock-video-orientation 1

  • 180°:

scrcpy --lock-video-orientation 2
  • 90° clockwise:

Thursday, May 13, 2021

Recent Post

Databricks Delta table merge Example

here's some sample code that demonstrates a merge operation on a Delta table using PySpark:   from pyspark.sql import SparkSession # cre...