Apache Spark In-Memory Computations: A Mini Deep Dive

How Apache Spark utilizes in-memory computations to accelerate data transformations, and the benefits of in-memory processing over disk operations and the impact of hardware choices on performance.

August 2, 2024 · 6 min · Satvik Jadhav

Use PySpark Locally With Docker

Installing Spark on Linux Here we’ll learn how to install Spark 3.0.3 for Linux. We tested it on Ubuntu 20.04 (also WSL), but it should work for other Linux distros as well Installing Java Download OpenJDK 11 or Oracle JDK 11 (It’s important that the version is 11 - spark requires 8 or 11) We’ll use OpenJDK Download it (e.g. to ~/spark): 1 wget https://download.java.net/java/GA/jdk11/9/GPL/openjdk-11.0.2_linux-x64_bin.tar.gz Unpack it: ...

June 1, 2022 · 2 min · Satvik Jadhav