Arsalan Shahid

Why Python for Data Analytics and Machine Learning?

In this post, I explain the usability of Python (Wiki) in data analytics.

There are many programming languages that have implementations of Machine Learning (ML) algorithms; but, over the years, Python has emerged to be a dominant contender in this regards.

I usually choose to write my programs in C, C++ and Bash to work on (embedded and HPC) system’s performance and energy consumption analysis. But, in 2018, I worked on analytical energy consumption models and measurement tools – and experienced working with high dimensional datasets. An easy to use software development framework and the first resort to me was Matlab. Although having a telecom engineering background and being quite familiar with the Matlab interface, I decided to quit using it because of its closed source code and commercialization limit for the software developed on top of it. I started using R programming interface for analyzing and studying the structures within my data as it is an easy pickup framework to explore datasets. R is simple, and people with not much programming experience can find it fascinating to play with.

An interesting observation, over the past years, is a deep interest and a quick transition of data scientists to program their data exploration tasks in Python. I started using it and found several reasons to choose python for not only web-based applications development and scripting but also for manipulating data over a repeated set of tasks and unleashing hidden structures of data in modern AI revolution era as well.

I recommend data analysts and ML enthusiasts to learn to program in Python as their first step due to the following main reasons:

  1. Ease of understanding, which reduces the efforts to implement and test the concepts of linear algebra and calculus. ML is mainly about understanding the hidden patterns in your data and ML algorithms are mainly developed on top of the core concepts of these two subjects.
  2. Packages that yield less memory consumption and performance optimized codes such as numpy for array manipulations.
  3. There are a lot of stable and well-optimized packages available in Python to practice and understand your unstructured data. For example:
    • If you want to work with images and extract useful information then you have numpy, opencv, and scikit.
    • To explore text analytics, you have nltk, numpy and scikit. With audio, librosa is a useful package.
    • ML problems can be solved using the pandas package as well.
    • For a better visualization of your data, there are specialized tools like matplotlib and seaborn.
    • For deep learning, an optimized implementation of tensorflow and pytorch is available.
    • For web data extraction and manipulation, scrapy API is available as a quick starter.
    • Last but not least, Django package can help you to integrate web applications with your program.
  4. Unlike Matlab, the ML packages implemented in Python are free to use under GNU license.

In the future posts, I will provide the guide to free learning resources for you to learn AI (a must-have skill of today and tomorrow) using Python.

Exit mobile version