Wednesday, May 19, 2021

Spark Performance Tuning Optimisation

 Apache Spark Performance Tuning and Optimizations for Big Datasets

Spark performance tuning from the trenches

Spark Optimisation Techniques

Apache Spark Optimization Techniques and Tuning

Fine Tuning and Enhancing Performance of Apache Spark Jobs

Optimizing the Skew in Spark

Spark Job Optimization: Dealing with Data Skew

Spark Scala Notes

create external table with data in csv

 create external table  if not exists my_db.mytable(
  name string,
 age int)
COMMENT 'Example table'
ROW FORMAT DELIMITED
FILEDS TERMINATED BY ','
SOTRED AS TEXTFILE
LOCATION '/data/warehouse/employee_data'
tableproperties("skip.header.line.count"="1")

What is difference between tail -f and tailf in unix?

The tailf just waits almost infinite. But the tail -f begin to print the output within seconds.

I have digged deeper by examining the underlying system calls using strace command. The results given below:

# strace tailf /var/log/messages

(truncated)
stat("/var/log/messages", {st_mode=S_IFREG|0600, st_size=47432599401, ...}) = 0
open("/var/log/messages", O_RDONLY)     = 3
fstat(3, {st_mode=S_IFREG|0600, st_size=47432600425, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f7dba2d1000
read(3, "Nov  1 03:23:01 hostnameXXXX"..., 4096) = 4096
read(3, "0.31.148.12)\nNov  1 03:54:33 del"..., 4096) = 4096
read(3, "io.c(600) [receiver=3.0.6]\nNov  "..., 4096) = 4096
(truncated)

As you can see, the tailf is trying to read (buffer) all the lines from beginning before generating output to the screen.

Check the output of tail -f below, here it is using the system call lseek (C/C++) to directly jump to end of file and start reading from there:

# strace tail -f /var/log/messages

(truncated)
open("/var/log/messages", O_RDONLY)     = 3
fstat(3, {st_mode=S_IFREG|0600, st_size=47294167448, ...}) = 0
lseek(3, 0, SEEK_CUR)                   = 0
lseek(3, 0, SEEK_END)                   = 47294170917
lseek(3, 47294169088, SEEK_SET)         = 47294169088
(truncated)

Sunday, May 16, 2021

Connect Android Device to Ubuntu Laptop

 



Step 1: How to Install scrcpy in Ubuntu 18.04.5





Step 2: How to enable USB in Samsung Tablet.


Step A:


Settings

=> About tablet

        => Software Information

                => Build Number => here tab 7 times.. then Developer option will be

enabled.


Step B:


Settings

=>Enable USB Debugging



Step 3: Connect Table to Laptop with USB Cable


Step 4: Now Type the following to connect ti the tablet.


scrcpy --lock-video-orientation 1



Step 5: Increase screen size in scrcpy.



scrcpy 1.13 adds two new options related to orientation: an option to lock the video orientation, and shortcuts to rotate display in steps of 90°.



scrcpy --lock-video-orientation 0 
  • 90° counterclockwise:

scrcpy --lock-video-orientation 1


  • 180°:

scrcpy --lock-video-orientation 2
  • 90° clockwise:









Thursday, May 13, 2021

Recent Post

Databricks Delta table merge Example

here's some sample code that demonstrates a merge operation on a Delta table using PySpark:   from pyspark.sql import SparkSession # cre...