Member-only story
Connecting Apache Spark to different Relational Databases(Locally and AWS) using PySpark.
3 min readMar 14, 2022
My article is for everyone! Non-members can click on this link and jump straight into the full text!!
In this post, we will learn how to connect a Spark Application to a locally installed relational database, as well as AWS RDS.
Table of Contents
- Apache Spark
- Pyspark
- Connect PySpark with a locally installed MySQL RDB
- Connect PySpark with an AWS MySQL RDS
- Connect PySpark with a locally installed Postgres RDB
- Connect PySpark with Postgres AWS RDS
- Connect PySpark with a locally installed Oracle RDB
- Connect PySpark with Oracle AWS RDS
- References
Apache Spark
- Apache Spark is a data processing framework that can quickly perform processing tasks on very large data sets, and can also distribute data processing tasks across multiple nodes or computers.
Pyspark
- PySpark is a Python API for Apache Spark to process larger datasets in a distributed cluster.
- It is written in Python to run Apache Spark jobs using Python.