Curve Fitting with Python for Biology

In biology or pharmacy, I think some people want to get a dose-response curve to calculate IC50 or EC50 values.
Many students probably use R or ImageJ, but I wanted to do it using only Python.
You can do a simple regression with scipy.optimize.curve_fit(), but it often fails quite often.
So, I looked for a better method that with higher success rate (good Fitting even with a few data to plot).
The method is the same as the method using Basin-Hopping in the following article.
Fitting with Python
So, I customized it because bio students more demand CurveFitting with Sigmoid's 4 parameters.
The code is below.

"""curve_fitting.py
4 parameters curve fitting
"""
import numpy as np
from scipy import optimize
from typing import Tuple


def fourPL(x, A, B, C, D):
    """fitting function"""
    return A + ((B - A) / (1 + (10 ** ((C - x) * D)))) 
    # Please check out below for other equations
    # https://www.hulinks.co.jp/support/kaleida/cf.html


def calc_cost(params, x, y):
    """LossFunction"""
    return ((y - fourPL(x, *params)) ** 2).sum()  # RMSE


def generate_initial_params(
    xdata_opt: np.ndarray, ydata_opt: np.ndarray
) -> Tuple[float, float, float, float]:
    """generate initial 4 params
    Args:
        xdata_opt (np.ndarray): X data
        ydata_opt (np.ndarray): Y data
    Returns:
        Tuple[float, float, float, float]: generated initial params
    """
    A = ydata_opt.min()
    B = ydata_opt.max()
    C = (xdata_opt.min() + xdata_opt.max()) / 2
    if ydata_opt[0] <= ydata_opt[-1]:
        D = 1
    else:
        D = -1
    return A, B, C, D


def randomstep_fit(xdata_opt: np.ndarray, ydata_opt: np.ndarray) -> None:
    """Fitting
    Args:
        xdata_opt (np.ndarray): X data
        ydata_opt (np.ndarray): Y data
    """
    params = generate_initial_params(xdata_opt, ydata_opt)
    minimizer_kwargs = {"args": (xdata_opt, ydata_opt)}
    try: # Basin-Hopping
        # Please optimize caluculation parameters
        result = optimize.basinhopping(
            calc_cost,
            params,
            niter=512,
            stepsize=2,
            interval=50,
            seed=314,
            minimizer_kwargs=minimizer_kwargs,
        )
        return result.x
    except RuntimeWarning or RuntimeError:
        pass

As in the aforementioned article, initial values are important, so I put lots of effort into generating initial values.

The four parameters A, B, C, and D of the formula used in this study are as follows

A: Minimum value of the regression data Y
B: Maximum value of Y in the regression data
C: 50% change concentration (IC50)
D: slope of the tangent line when x=C

Since it is difficult to predict the values of C and D accurately, they are fixed to reasonable values. D is divided into two cases, depending on whether the graph is rising or falling.
※The meaning of the parameters depends on the formula, so please check and use them.
Explaining with a figure, like this↓

Then Just find parameters that minimize the Loss Fuction by Basin-Hopping with the generated initial values.
I also don't know well about Basin-Hopping, but it is a nice parameter optimizer that finds the global minimum while optimizing the parameters.
See below pages.It also explain about the arguments to scipy.optimize.basinhopping().

This will successfully fit about 9.5 out of 10 times, even when 3 points data. However, it takes a little longer time to calculate than the normal curve_fit(). (About 10sec per Fitting?).
I hope this will be helpful to someone.

DEV Community

Curve Fitting with Python for Biology

Top comments (0)

Read next

Piperidine Alkaloids: Natural Sources and Medical Benefits

How to Enable Named Pipe Activation in Windows 11?

Securing the Cloud #29

Java POJO