13 SciPy and Fitting#

Goal#

Learn how to use SciPy for advanced statistical analysis and curve fitting. These techniques are essential for analyzing experimental data and validating scientific hypotheses.

Prerequisites#

1. Introduction#

SciPy (Scientific Python) is a library built on NumPy that provides advanced mathematical functions for:

  • Statistical analysis and hypothesis testing

  • Curve fitting and interpolation

  • Integration and differentiation

  • Linear algebra

  • Signal and image processing

  • Optimization

In this tutorial, we focus on statistics and curve fitting—critical skills for scientific research.

2. Installation#

SciPy usually comes with Anaconda/Miniconda. Install explicitly with:

pip install scipy

3. Basic Statistics#

3.1 Descriptive Statistics#

import numpy as np
from scipy import stats

data = [25, 30, 28, 35, 32, 29, 33, 31, 28, 34]

# Mean, median, standard deviation
print(np.mean(data))         # 30.5
print(np.median(data))       # 31.0
print(np.std(data))          # 2.76...

# Using scipy.stats
print(stats.describe(data))
# DescribeResult(nobs=10, minmax=(25, 35), mean=30.5, variance=7.61..., skewness=-0.11..., kurtosis=-1.09...)

3.2 Percentiles and Quantiles#

from scipy import stats

data = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]

# Percentiles
print(np.percentile(data, 25))  # 25th percentile (Q1)
print(np.percentile(data, 50))  # 50th percentile (median)
print(np.percentile(data, 75))  # 75th percentile (Q3)

4. Hypothesis Testing#

4.1 T-Test (Compare Two Samples)#

Testing if two datasets have different means:

from scipy import stats

sample1 = [23, 25, 27, 29, 31, 33]  # Group A
sample2 = [28, 30, 32, 34, 36, 38]  # Group B

# Two-sample t-test
t_statistic, p_value = stats.ttest_ind(sample1, sample2)

print(f"T-statistic: {t_statistic}")
print(f"P-value: {p_value}")

if p_value < 0.05:
    print("The groups are significantly different (p < 0.05)")
else:
    print("No significant difference between groups")

Interpretation:

  • P-value < 0.05: Groups are statistically significantly different

  • P-value ≥ 0.05: No significant difference detected

4.2 Correlation Test#

Test if two variables are correlated:

from scipy import stats

x = [1, 2, 3, 4, 5]
y = [2, 4, 5, 4, 6]

correlation, p_value = stats.pearsonr(x, y)

print(f"Correlation: {correlation:.3f}")
print(f"P-value: {p_value:.4f}")

if p_value < 0.05:
    print("Significant correlation found!")

4.3 Chi-Square Test#

Test association between categorical variables:

from scipy.stats import chi2_contingency

# Contingency table: observed frequencies
observed = [[10, 20], [15, 25]]

chi2, p_value, dof, expected = chi2_contingency(observed)

print(f"Chi-square: {chi2:.3f}")
print(f"P-value: {p_value:.4f}")
print(f"Degrees of freedom: {dof}")

5. Curve Fitting#

5.1 Linear Fitting#

Fit data to a line: \(y = mx + b\)

import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt

# Generate data with noise
x_data = np.array([1, 2, 3, 4, 5])
y_data = np.array([2.1, 3.9, 6.1, 8.0, 9.9])  # Approximate y = 2x

# Define linear function
def linear(x, m, b):
    return m * x + b

# Fit
params, covariance = curve_fit(linear, x_data, y_data)
m, b = params

print(f"Slope (m): {m:.3f}")
print(f"Intercept (b): {b:.3f}")

# Plot
x_fit = np.linspace(0, 6, 100)
y_fit = linear(x_fit, m, b)

plt.scatter(x_data, y_data, label="Data")
plt.plot(x_fit, y_fit, label=f"Fit: y={m:.2f}x+{b:.2f}")
plt.legend()
plt.show()

5.2 Polynomial Fitting#

Fit data to a polynomial: \(y = ax^2 + bx + c\)

import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt

x_data = np.array([1, 2, 3, 4, 5])
y_data = np.array([1, 4, 9, 16, 25])  # y = x²

# Define quadratic function
def quadratic(x, a, b, c):
    return a * x**2 + b * x + c

# Fit
params, cov = curve_fit(quadratic, x_data, y_data)
a, b, c = params

print(f"a={a:.3f}, b={b:.3f}, c={c:.3f}")

# Plot
x_fit = np.linspace(0, 6, 100)
y_fit = quadratic(x_fit, a, b, c)

plt.scatter(x_data, y_data, label="Data")
plt.plot(x_fit, y_fit, label=f"Fit: y={a:.2f}x²+{b:.2f}x+{c:.2f}")
plt.legend()
plt.show()

5.3 Exponential Fitting#

Fit data to: \(y = A e^{Bx}\)

from scipy.optimize import curve_fit
import numpy as np
import matplotlib.pyplot as plt

# Exponential data
x_data = np.array([0, 1, 2, 3, 4])
y_data = np.array([1, 2.7, 7.4, 20.1, 54.6])

# Define exponential function
def exponential(x, A, B):
    return A * np.exp(B * x)

# Fit
params, cov = curve_fit(exponential, x_data, y_data)
A, B = params

print(f"A={A:.3f}, B={B:.3f}")

# Plot
x_fit = np.linspace(0, 5, 100)
y_fit = exponential(x_fit, A, B)

plt.scatter(x_data, y_data, label="Data")
plt.plot(x_fit, y_fit, label=f"Fit: y={A:.2f}e^({B:.2f}x)")
plt.legend()
plt.show()

6. Goodness of Fit#

Coefficient of Determination (R²)#

Measure how well the fit explains the data:

from scipy.optimize import curve_fit
from sklearn.metrics import r2_score
import numpy as np

x_data = np.array([1, 2, 3, 4, 5])
y_data = np.array([2.1, 3.9, 6.1, 8.0, 9.9])

def linear(x, m, b):
    return m * x + b

params, _ = curve_fit(linear, x_data, y_data)
y_fit = linear(x_data, *params)

r2 = r2_score(y_data, y_fit)
print(f"R² = {r2:.4f}")  # 1.0 = perfect fit, 0.0 = poor fit

7. Unit Conversion and Constants#

SciPy provides physical constants:

from scipy.constants import c, G, k_B, e

print(f"Speed of light: {c} m/s")
print(f"Gravitational constant: {G} m³/(kg·s²)")
print(f"Boltzmann constant: {k_B} J/K")
print(f"Elementary charge: {e} C")

8. Integration and Differentiation#

Numerical Integration#

from scipy import integrate
import numpy as np

def f(x):
    return x**2

# Integrate from 0 to 2
result, error = integrate.quad(f, 0, 2)
print(f"Integral: {result:.4f}")  # Should be 8/3 ≈ 2.667

9. Practical Example: Analyzing Experimental Data#

import numpy as np
from scipy import stats
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt

# Hypothetical experiment: measuring voltage vs. current
voltage = np.array([1.0, 2.1, 3.0, 4.1, 5.0])
current = np.array([0.10, 0.21, 0.30, 0.41, 0.50])  # Ohm's law: V = I*R

# 1. Calculate resistance (slope)
def ohms_law(I, R):
    return I * R

params, cov = curve_fit(ohms_law, current, voltage)
R = params[0]
print(f"Resistance: {R:.2f} Ω")

# 2. Calculate uncertainty
R_uncertainty = np.sqrt(np.diag(cov))[0]
print(f"Uncertainty: ±{R_uncertainty:.3f} Ω")

# 3. Plot
I_fit = np.linspace(0, 0.6, 100)
V_fit = ohms_law(I_fit, R)

plt.scatter(current, voltage, label="Measured")
plt.plot(I_fit, V_fit, label=f"Fit: V={R:.2f}I")
plt.xlabel("Current (A)")
plt.ylabel("Voltage (V)")
plt.legend()
plt.show()

10. Resources#

Next Steps#

You now have solid foundational knowledge of Python for scientific computing! To apply these skills in collaborative projects, proceed to: 14 Creating Packages.