DEV Community π©βπ»π¨βπ» is a community of 967,911 amazing developers

We're a place where coders share, stay up-to-date and grow their careers.

Lucas Shen

Posted on

Making coefficient plots in Python using Forestplot

This post illustrates how one can use the open-source `Forestplot` package to plot estimates with confidence intervals.

This package plots correlation coefficients or regression estimates from upstream analyses (see this example of correlation analysis).

Prepare the package and load the data

To install the package from PyPI:

``````pip install forestplot
``````

Load example dataset that reports how certain factors correlate with the amount of sleep one gets:

``````import forestplot as fp

df = fp.load_data("sleep")  # companion example data
``````
var r moerror label group ll hl n power p-val
0 age 0.0903729 0.0696271 in years age 0.02 0.16 706 0.67 0.0163089
1 black -0.0270573 0.0770573 =1 if black other factors -0.1 0.05 706 0.11 0.472889
2 clerical 0.0480811 0.0719189 =1 if clerical worker occupation -0.03 0.12 706 0.25 0.201948

In the above `dataframe`, each row is an individual characteristic with a corresponding correlation coefficient from correlating the characteristic with the amount of sleep one gets per night.

The first row, `age`, for instance, with a correlation coefficient of `0.09 (p = 0.016)`, says that people who are older get more sleep.
(See this notebook to see how the correlation coefficients are computed from the real `sleep75.csv` data.)

Plot the estimates

Forest plots (or coefficient plots, dot plots, coefplots) are useful to visualize the estimates and their confidence intervals.

To plot the estimates in `df`:

``````fp.forestplot(df,  # the dataframe with results data
estimate="r",  # col containing estimated effect size
ll="ll", hl="hl",  # columns containing conf. int. lower and higher limits
varlabel="label",  # column containing variable label
ylabel="Confidence interval",  # y-label title
xlabel="Pearson correlation"  # x-label title
)
``````

Customizing and adding annotations (Pt. 1)

You can add variable group subheadings (e.g. the Labor Factors subheading) and sort the estimates (within groups). You can also sort the order of the variable group subheadings (`group_order`):

``````fp.forestplot(df,  # the dataframe with results data
estimate="r",  # col containing estimated effect size
moerror="moerror",  # columns containing conf. int. margin of error
varlabel="label",  # column containing variable label
# group ordering
group_order=["labor factors", "occupation", "age", "health factors",
"family factors", "area of residence", "other factors"],
sort=True  # sort in ascending order (sorts within group if group is specified)
)
``````

Customizing and adding annotations (Pt. 2)

You can also add more annotations to the plot, such as the sample size (e.g. `N` and `formatted_pval`) and add table lines:

``````fp.forestplot(df,  # the dataframe with results data
estimate="r",  # col containing estimated effect size
ll="ll", hl="hl",  # lower & higher limits of conf. int.
varlabel="label",  # column containing the varlabels to be printed on far left
pval="p-val",  # column containing p-values to be formatted
annote=["n", "power", "est_ci"],  # columns to report on left of plot
rightannote=["formatted_pval", "group"],  # columns to report on right of plot
groupvar="group",  # column containing group labels
group_order=["labor factors", "occupation", "age", "health factors",
"family factors", "area of residence", "other factors"],
xlabel="Pearson correlation coefficient",  # x-label title
xticks=[-.4,-.2,0, .2],  # x-ticks to be printed
sort=True,  # sort estimates in ascending order
table=True,  # Format as a table
**{"marker": "D",  # set maker symbol as diamond
"markersize": 35,  # adjust marker size
"xlinestyle": (0, (10, 5)),  # long dash for x-reference line
"xlinecolor": ".1",  # gray color for x-reference line
"xtick_size": 12,  # adjust x-ticker fontsize
}
)
``````

Final remarks

Planned future enhancements include allowing for multiple estimates per row in the plot.

Forest plots have many aliases. Other names include coefplots, coefficient plots, meta-analysis plots, dot plots, dot-and-whisker plots, blobbograms, margins plots, regression plots, and ropeladder plots.

This posts hopefully gives my `forestplot` package some visibility. At the the same time, happy to hear comments about the API's ease of use and features. plot. See the GitHub repo readme for a more substantial documentation.

Forestplot

Easy API for forest plots.
A Python package to make publication-ready but customizable forest plots

This package makes publication-ready forest plots easy to make out-of-the-box. Users provide a `dataframe` (e.g. from a spreadsheet) where rows correspond to a variable/study with columns including estimates, variable labels, and lower and upper confidence interval limits. Additional options allow easy addition of columns in the `dataframe` as annotations in the plot.

Release
Status
Coverage
Python
Docs
Meta

Installation

Install from PyPI

`pip install forestplot`

Install from source

```git clone https://github.com/LSYS/forestplot.git
cd forestplot
pip install .```

Quick start

```import forestplot as fp

df = fp.load_data("sleep")  # companion example data