Exporting Tables and Figures from Stata to PowerPoint
Most data visualizations and tables I create are made for a specific purpose, such as communicating teacher turnover during the COVID-19 pandemic. In these scenarios, there is only one version of each figure.
However, sometimes I need to be able to produce multiple versions of the same figure/table, such as in producing reports for individual school districts using state-wide administrative data.
While I have yet to find a single package in Stata that makes these types of automated reports easy1, I have come up with a workflow for producing several individual PowerPoint presentations for different subsets of data. This page is a simplified guide to this workflow. All code and the template file is available on github.
Broadly, there are three steps:
- Create a template presentation.
- Match filenames of tables/figures to renamed versions in the template.
- Use a loop across reporting groups to create copies of the template with tables/figures updated for each group.
Create a “template” presentation.
PowerPoint presentations are essentially zip files with specific contents inside. If you rename a .pptx
file to end in .zip
, you’ll be able to open and explore the components that make up that presentation. We’re going to use this to our advantage to replace specific figures and tables within a template presentation. By also looping across subgroups within our data, we can obtain individualized PowerPoint reports for each subgroup.
For this post, I’ve created a simple, 5-slide presentation consisting of a title slide, three figures, and one table using the Gapminder dataset2. There’s no meaning or story behind this presentation, but it’s good enough for now. The code to produce these figures/graph can be found in this example’s github repo.
For the figures, I simply inserted them as I normally would. However, there are added steps to inserting and updating a table.
I began with the raw output placed into an excel file (shown above) and then formatted the table to be more visually appealing. Importantly, I didn’t delete any slides or otherwise change which cell in the table data will appear in. When inserting this table into PowerPoint you’ll need to paste it in as an embedded table by right clicking and selecting that option.
Match filenames of tables/figures to renamed versions in the template.
When we insert figures or tables into PowerPoint and save the file, those files are automatically renamed according to the program’s own taxonomy. To update them, we need to figure out what PowerPoint has renamed them to. Fortunately, this is relatively simple:
- Rename your
template.pptx
file to betemplate.zip
. You may need to change a setting to see the file extension. - Open the
.zip
file and navigate toppt/media/
. You should see a list of image files similar to the one below:
- Since I’m exporting to .svg files, I can ignore the .pngs. The .emf file is the table, but we’ll update that elsewhere. I can see that my
graph1.svg
is being stored asimage2.svg
,graph2.svg
asimage4.svg
, etc. - There’s no need to change anything in the file, but make note of these names as we’ll need to refer to them later.
The process for embedded tables is similar:
- Open the
.zip
file and navigate toppt/embeddings/
. You should see one Excel file for each embedded table. - Again, you won’t need to edit these but make note of what each file is stored as so we can refer to it in part 3.
After you’ve copied the file names down somewhere safe, rename your template back to something like template.pptx
.
Looping across groups to create presentations for each group.
In this example, I’m producing separate presentations for each of the five continents in the data set. The code probably has a few unfamiliar components so we’ll look through it one chunk at a time.
// Load data from github repo
use "https://github.com/andrewmcamp/Stata-to-PPT-Example/raw/main/gapminder.dta", clear
// Get the possible values of continents and begin loop
levelsof continent, local(continents)
foreach c in `continents' {
// Get the label for each level of the continent variable to be used when saving the file
local cname: label continent `c'
di "Creating .ppt for `cname'."
So far, we’ve simply downloaded our data from Github and began our loop across the values of the continents variable. Also, since continent is a factor variable we created a local (cname
) to store the name of each continent (e.g., Africa, Asia, etc.).
// Preserving/restoring the data before we limit to each continent
preserve
keep if continent == `c'
We’re doing two things here: (1) saving our unaltered data to restore later and (2) limiting our sample to just those where the continent variable equals whichever continent we’re focusing on for this loop.
// First, clean up any files in the current directory leftover from previous loop
cap erase "template.zip"
cap shell rd "_rels" /s /q
cap shell rd "docProps" /s /q
cap shell rd "ppt" /s /q
cap erase "[Content_Types].xml"
// Next copy our template and rename as a .zip file
shell copy "template.pptx" "template.zip"
// Next, unzip our template
unzipfile "template.zip"
Next, we clean up any files that might have been left over from a previous iteration of the loop, copy our template.pptx
file as template.zip
, and unzip the contents of that .zip
file into the current folder.
// Graph 1: GDP Per Capita vs. Life Expectancy
twoway scatter gdpPercap lifeExp
graph export "ppt/media/image2.svg", replace width(13in) height(7in)
// Graph 2: GDP Per Capita vs. Life Expectancy by Year
twoway scatter gdpPercap lifeExp, by(year)
graph export "ppt/media/image4.svg", replace width(13in) height(7in)
// Graph 3: Bar chart for populations over 1,000,000 in 2007
graph hbar pop if year == 2007 & pop > 1000000, over(country, sort(pop))
graph export "ppt/media/image6.svg", replace width(13in) height(7in)
// Table 1: Average Life Expectancy, Population, and GDP Per Capita by Country
// Getting values to put in using tabstat/tabstatmat
tabstat lifeExp, by(year) statistics(mean sd min max) nosep save
tabstatmat table1, nototal
// Using putexcel to export the summary statistics above
putexcel set "ppt/embeddings/Microsoft_Excel_Worksheet.xlsx", modify
putexcel A2 = matrix(table1), names
Notice here I am exporting figures to have the same dimensions as the ones in the template file and to replace the corresponding renamed .svg
files. Additionally, I am modifying the Excel worksheet instead of replacing it to maintain table formatting.
// To finish this iteration of the loop, we simply zip the folder back up
zipfile _rels docProps ppt "[Content_Types].xml", saving("report_`cname'.pptx", replace)
restore
}
This last step is simple, but important. We need to re-zip the folders and [Content_Types].xml
file into a finalized .pptx
file. Here, I’ve named the final output as report_'cname'.pptx
so that the presentations will have a consistent filename pattern ending in the local I defined at the beginning of the loop.
Wrapping Up
If your template includes a table that needs updating you’ll need to open the output file (e.g., report_Africa.pptx
) and double click on the table to update it.
That’s it - you should now have a bunch of reports that can be customized for specific audiences without using an exuberant amount of your time. This isn’t a perfect process by any means, but has worked really well for my purposes.
If you find bugs or think of improvements let me know! Thanks for reading.