Introduction to Python for Data Engineering

What is Python? Why is it Essential?

Introduction to Python for Data Engineering

Before we dive into the world of Data, we need tools to explore and manipulate that data. One such tool is a programming language, specifically the Python Programming Language. As a Tech Enthusiast, you might have come across Python as it is not only popular but also powerful.

What is Python?

image_2022-08-31_135018082.png

Python is a high-level, general-purpose programming language.

High-level simply means that it is easier to understand relative to other low-level languages, and the general purpose implies that we can use Python to build many solutions across many categories.

What is Python used for?

  • Making Web Applications; Instagram was originally built on Python
  • Making Desktop Apps
  • Robotics - especially popular among the Raspberry Pi Community
  • Artificial Intelligence & Machine Learning
  • Game Development - make video games using Python
  • Scientific work involving simulations and Scientific Computing
  • Automation
  • Data Analysis
  • Connecting and working with Databases

... and the list goes on. I'll leave a few resources down below.

Why Python?

Python is easy to read & understand; a quick look at some Python code gives insight into what the code does. Unlike some languages (Java)

Python has a HUGE community and this has enabled the language to grow and offer multiple services & tools for us to solve day-to-day problems. Let's say you want to work with Excel Spreadsheets and automate Data Entry, Python has got you covered. Additionally, the huge community also ensures that you have resources to learn from and solve problems you encounter.

Due to Python's popularity, it is supported by all major platforms and cloud services such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP), among others. Furthermore, integrating services such as Mobile Messaging (SMS) via Twilio is effortless.

Python is more productive, especially in Business Environments where speed of execution is preferred to near-perfect solutions. And with this, you get access to a large talent pool.

Setting up Python

Installing Python on your system is as easy as ABC.

  • Go to python.org/downloads
  • Click on the Installer for your Operating System
  • After the download finishes, run the installer
  • While installing, ensure to enable the Add Python to PATH

To start coding;

  • Install a Code Editor, I prefer VSCode
  • For Data Engineers and Data Scientists, install Anaconda
  • Anaconda comes with most Data Engineering tools ready to go, making it even easier to get into Building Data Engineering Projects.

image_2022-08-31_142929123.png

Programming in Python

You do not need to understand 100 concepts before you start building cool projects in Python.

Most important concepts to learn and understand include:

  • Data types; working with strings and integers
  • Variables
  • Lists & Tuples
  • Dictionaries
  • Conditionals; if-else conditions
  • Math Expressions; exponentials, rounding operations, ...
  • Loops
  • Functions

I have attached a Jupyter Notebook/webpage demonstrating all the concepts above. You can find it here.

What next?

Given that you understand Basic Python concepts, the next natural question is; What can I do with Python?

This is where you get to learn all about Libraries, i.e. Tools that Python provides for solving certain problems.

  • Want to build games? There's a library for that - Pygame
  • Want to process data from Excel files? There's a library for that - Pandas
  • Want to work with matrices and perform numerical analysis? - There's a library for that - Numpy
  • Building web applications? - There's Flask
  • Want to work with Data Pipelines? - There are libraries for that too.

In the next article, we'll talk more about databases and how to work with data using Python and other tools such as SQL!!!


Resources

buymeacoffee.png