Last time we discussed the WHY, now it's time to discussed the HOW. This series is based on python-pptx library, a small and great library but it's not most beginner friendly, we will try to make more friendly in this part.
After installing the library
pip install python-pptx, things maybe overwhelming if you navigate the library's documentation, to save you the trouble I will break down what you will need from the library, and once you can do simple template population, you may feel free to delve into the docs and explore more cool stuff!
The first thing you need to do is to import Presentation(), this is the core of what we will do, you can open a copy of your template using this function into a variable to use later on. The argument the function needs is the name/path of the template presentation, personally I keep my templates in a different location from where my code resides for various reason, if you want to know how to use the filesystem or my reasons, you may check this article. The function returns a presentation object prs.
from pptx import Presentation prs = Presentation('./market/data/temp.pptx')
From the first part, we broke down the presentation into a tree of hierarchy, the second level is the slides level,
prs.slides is an iteratable that allows you to loop through the slides or access slides directly using list notation starting with 0 for the first slide.
for sld in prs.slides: print(sld) print(prs.slides)
The realy intresting part from this hierarchy is the shapes. I talked about it quit a bit in the first part because of how important it's so get familiar with it before we processed.
The shapes can be accessed from a single slide object, let's say that we are interested in the first slide for now, we can get the shapes in a similar fashion to the slide like this
prs.slides.shapes. This returns an iteratable object,
to check for a specific shape, you have to iterate in a loop and compare with the name of the shape (you named them in the selection pane inside powperpoing) or by the the type number to get a specific type of shapes.
for shp in prs.slides.shapes: if shp.name == 'selectmeplz': print('hello world') if shp.shape_type == 3: print('I am a chart!!!')
diving down the shapes, we find 3 major types that you will mostly need in all presentations. The first one is the basic or box shape, these are the titles, headers, footers and so on. These are very basic and you only manipulate the text inside them.
shp.text is the method to use, you can get the text inside this shpae or you can feed it new text like this
shp.text = 'changemeplz'.
shp.chart are a critical part of any presentation and the main reason we do analytical presentations in the first place. There are different kinds of charts, but one of them is more common so it will be covered here. The common type of charts is called categorical chart, this is any chart that has a categorical axis, this includes bar charts and line charts single or stacked. The basic idea is simple, you create your chart in PowerPoint, add any formatting or modification, name it then use python pptx to populate it.
The categorical chart consists of two parts only, a category and the series/s. To work on this chart you need to import the
CategoryChartData function, this returns a chart object as we will see.
Let's say you have your data in a tabular form, where your index or categorical axis is know, this can be text, dates or numbers, but they should be unique. Each column you have left can be treated as a series in you chart. This following code will clarify things more:
from pptx.chart.data import CategoryChartData chart = shp.chart data = CategoryChartData() data.categories = df['your_index_or_main_axis'] for col in cols: data.add_series(col_name,df[col]) chart.replace_data(data)
Adding series to your data object can be done either for only 1 series or more, I do it in a loop so you get the general idea. The series could be different company sales data vs date, so your chart will be a group of trend liens with the x axis as date and so on.
once you add your series and set your categorical axis, you then replace the data using the
replace_data chart's method. Note that this method can add more than one series, if your chart is expecting a limited number only, you have to be careful.
This method of replacing the data is great as the chart ranges and axis change dynamically without your input, so you can populate the same chart with 3 series in one slide then with 5 in another slide without any modification from your side. Cool right?!
shp.table are a tricky kind of shape, the table object is more an excel object than a PowerPoint object imo. For some reason table objects don't change dynamically like charts, so if you have a table
with 10 rows, if you feed it 11 rows worth of data, it will not add and extra row, opposite is same, less data will not remove excess rows/columns, you will do some cleaning with you hand for sure. Unlike charts,
you don't work with series or uniform list, rather you work with individual cells starting with index (0,0). So to populate a table, you need to fill each cell with its data! at least this is what I found so far,
so if you have a better way, please tell me!
table = shp.table j = 1 # second column for i in range(0,len(table.rows)): table.cell(i,j).text = i*2
I like to work on tables in a column wise manner, I loop through columns and call a function to populate the rows of that column, this is my preference in the end and you can do it in another way if you like.
For more control on the font formatting of each cell, you may take a look at this answer
Now you have the basic tools to automate that boring report using only a small script. Of course you are not limited to these tools and you may further customize your presentations more and you are free to explore and share your finds with us. In the next part of this series, I will be doing a walkthrough of a simple report, to share my flow and helpful functions that make my life much easier, stay toned.