Member-only story

Connecting Apache Spark to different Relational Databases(Locally and AWS) using PySpark.

Rajan Sahu
3 min readMar 14, 2022

--

My article is for everyone! Non-members can click on this link and jump straight into the full text!!

In this post, we will learn how to connect a Spark Application to a locally installed relational database, as well as AWS RDS.

Table of Contents

  • Apache Spark
  • Pyspark
  • Connect PySpark with a locally installed MySQL RDB
  • Connect PySpark with an AWS MySQL RDS
  • Connect PySpark with a locally installed Postgres RDB
  • Connect PySpark with Postgres AWS RDS
  • Connect PySpark with a locally installed Oracle RDB
  • Connect PySpark with Oracle AWS RDS
  • References

Apache Spark

  • Apache Spark is a data processing framework that can quickly perform processing tasks on very large data sets, and can also distribute data processing tasks across multiple nodes or computers.

Pyspark

  • PySpark is a Python API for Apache Spark to process larger datasets in a distributed cluster.
  • It is written in Python to run Apache Spark jobs using Python.

--

--

Rajan Sahu
Rajan Sahu

Written by Rajan Sahu

Backend and Data Engineer by Day; Teacher, Friend and Content-Writer by night.

Responses (1)