Application fee : 0 USD


Certification Body: Aegis School of Data Science
Location: On-campus (India, Mumbai, Pune, Bangalore)
Type: Certificate course
Director: Pratik Gujrathi
Coordinator: Ritin Joshi
Language: English
Course fee: 0 USD
GST: 18%
Total course fee: 0 USD
No Ratings


Course Details

Python for Data Science

Python is a powerful, flexible, open-source language that is easy to learn, easy to use, and has powerful libraries for data manipulation and analysis. It is among the top three along with R and SAS for analytics application. Python have become popular in recent years for building websites using their numerous web frameworks, like Django (Python). Adoption of Python for scientific computing in both industry applications and academic research has increased significantly since the early 2000s. Having the front end web interface and the back-end analytics on the same platform is an advantage. Being able to integrate with other applications (even those built on other languages) gives Python added advantage. Most modern computing environments share a set of legacy FORTRAN and C libraries for doing linear algebra, optimization, integration, fast fourier transforms, and other such algorithms. The same holds true for many companies and data labs that have used Python to glue together 30 years’ worth of legacy software.  

For data analysis and interactive, exploratory computing and data visualization, Python is inevitably drawing comparisons with the many other domain-specific open source and commercial programming languages and tools in wide use, such as R, MATLAB, SAS and others. In recent years, Python’s improved library support (primarily pandas) has made it a strong alternative for data manipulation tasks.

Python is used in scientific computing and highly quantitative domains such as finance, oil and gas, physics, and signal processing. Python is used in web applications like YouTube (originally built on PHP but shifted to Python around 2005-6) and has powered much of Google's internal infrastructure. From Google to NASA, users love Python for its readable high-level syntax and interoperability with other programming languages and systems. The Numpy and Scipy libraries take advantages of Python’s API to its C source code to deliver blazing fast matrix operations. A new library Pandas (Panel Data Analysis) offers a viable alternative to R’s Data Frame type, allowing R users to quickly pick up Python. The vibrant scientific community around Python is growing rapidly, making Python the strongest competitor to R.

Python has very strong computational capabilities for the Data Science workflow, and it’s much faster and easier to start using relative to other packages available. What people are increasingly finding is that Python is a suitable language not only for doing research and prototyping but also building the production systems, too.

What you learn in this course:

  • Hands-on experience of setting up a fully functioning integrated analysis environment for doing data science with Python.
  • An understanding of how to use the Python standard library to write programs, access the various data science tools, and document and automate analytic processes.
  • Orientation to some of the most powerful and popular Python libraries for data science including Pandas (data preparation, analysis, and modeling; time series analysis), scipy.stats (statistics), scikit-learn (machine learning), and Matplotlib (data visualization).
  • Working knowledge of the Python tools ideally suited for data science tasks, including:
    • Accessing data (e.g., text files, databases)
    • Cleansing and normalizing data
    • Exploring data (e.g., simple statistics, correlation matrices, visualization)
    • Modeling data (e.g., statistics, machine learning)