Shortcuts

Ecosystem

Scientific Python is both an ecosystem of software libraries and tools as well as a community of contributors who maintain the software and support the scientists, students, and others users.

Here are some useful links:

Welcome to the Ecosystem

[DRAFT] This video has not been recorded yet.

The Scientific Python ecosystem is a collection of open-source scientific software packages written in Python. It is a broad and ever-expanding set of algorithms and data structures that grew around NumPy, SciPy, and matplotlib.

The ecosystem includes a wide variety of tools: some more specialized to specific domains such as biological imaging or astronomy, and others quite general for tasks such as data management and high-performance computing.

It includes projects such as Pandas (for data analysis), NetworkX (for graph computation), scikit-learn (for machine learning), and scikit-image (for image processing).

Ecosystem Packages

Here is a curated selection of packages available in the ecosystem:

Core

  • NumPy, the fundamental package for numerical computation. NumPy defines the n-dimensional array data structure, the most common way of exchanging data within packages in the ecosystem.
  • SciPy, a collection of numerical algorithms and domain-specific toolboxes, including signal processing, optimization, statistics, and much more.
  • Matplotlib, a mature and popular plotting package that provides flexible, publication-quality 2-D and 3-D visualization.

Data and computation

  • pandas, providing high-performance, easy-to-use data structures.
  • SymPy, for symbolic mathematics and computer algebra.
  • NetworkX, is a collection of tools for analyzing complex networks.
  • scikit-image is a collection of algorithms for image processing.
  • scikit-learn is a collection of algorithms and tools for machine learning.

Productivity and high-performance computing

  • IPython, a command-line interface to Python, for interactively exploring code, processing data, and testing code ideas.
  • Jupyter Lab provides computational notebooks that combine interactive code with descriptive text in your web browser, useful especially for teaching and documenting research.
  • Joblib, Dask, or Ray for distributed processing with a focus on numerical data.

Install

[DRAFT] This video has not been recorded yet.

There are many different ways to install scientific Python packages. We’ll focus on two recommended methods, specifically:

  • pip (with the builtin venv module for virtual environments.
  • conda

Both pip and conda, you can control the package versions for a specific project to prevent dependency conflicts. One of the advantages of pip is that it is a built-in module of Python and therefore is ubiquitous and well-supported. conda is a more extensive package manager, which includes the ability to install and manage non-Python dependencies, such as compilers or other libraries like MKL or HDF5. conda also doubles as an environment manager, allowing users to create and manage different virtual environments for their Python projects.

Installing with conda

In order to use conda, it needs to be installed on your system. The simplest way to install the conda package manager is with the Miniconda minimal installer package.

Once the conda package manager has been installed and configured, installing packages from the scientific Python ecosystem is quite simple, for example:

conda install numpy scipy matplotlib ipython jupyter pandas

Installing via pip

Python comes with a built-in package management system, pip. Pip can install, update, or delete any official package.

You can install packages via the command line by entering:

python -m pip install --user numpy scipy matplotlib ipython jupyter pandas

The --user flag is useful to avoid conflicts with system-level Python packages and to circumvent the need to have administrator priviledges to install packages.

Pip accesses the Python Package Index, PyPI which stores almost 200,000 projects, including all the packages of the scientific Python ecosystem.

Testing the installation

An easy way to verify that the installation was successful is to import the installed packages in an interactive Python session. For example, open up an IPython terminal1 by typing ipython in the commande line. Once in the terminal session, try importing the packages you want to use like so:

In [1]:  import numpy as np

In [2]:  import matplotlib.pyplot as plt

You’re now ready to start using the tools of the scientific Python ecosystem!


  1. Note ipython was installed in the previous step. ↩︎

Next Steps

[DRAFT] This video has not been recorded yet.

Scientific Python is built on the Python programming language. Using Scientific Python therefore requires having a firm grasp of Python itself. We suggest reading through the official tutorial, doing an online tutorial on exercism, or using any of the countless resources that exist online or in print.

Learning a new language can be challenging, but Python is fun—so keep trying and hang in there! The community is there to help you along the way.

So let’s cover some basics.

How to run your Python code

Python is an interpreted language: that means that it reads a text file with instructions and executes those one by one.

The easiest way to create a text file is in a text editor, like Spyder or VSCode. We can do that right now. Let’s create a file called hello.py:

print("Hello world")

And then run it:

python hello.py
hello

That’s it, your first Python program!

You can also play around with Python code interactively in IPython:

[launch IPython and run:]

In [1]: def fibonacci(n):
   ...:     a, b = 0, 1
   ...:     for i in range(0, n):
   ...:         a, b = b, a + b
   ...:     return a
   ...:

In [2]: fibonacci(10)
Out[2]: 55

Another ways to play with Python code is in Jupyter Lab. This is an interactive web application for typing in and executing Python code. Let me show you how to do a simple plot in Jupyter:

[Open Jupyter Lab; create notebook; import matplotlib as plt; plt.plot([1, 2, 3])]

You can head over to https://try.jupyter.org to test it out.

Hello NumPy

What distinguishes most scientific codes from general ones is that they operate on collections of numbers. These are often represented as NumPy arrays—they are fast, and they have convenient syntax.

Let’s generate 1000 random numbers and square them:

[In IPython]

import numpy as np
import matplotlib.pyplot as plt

## Generate 1000 random numbers, store in x
x = np.random.random(size=1000)

## Square them and store in y
y = x**2

## Plot the results!
plt.plot(x, y)
plt.show()

Learn more!

We’ll post a list of links below the video where you can learn more:

By far the best way to learn, however, is to start coding!

Stuck?

The first thing to do when stuck is to read the documentation. Note that almost all libraries ship with documentation right at your fingertips!

[illustrate how to look up the docstring for np.linspace]

If you are still stuck, join the community forum at https://discuss.scientific-python.org or reach out to the relevant package on its mailing list.

Good luck!