Exporting Tables and Figures from Stata to PowerPoint

Most data visualizations and tables I create are made for a specific purpose, such as communicating teacher turnover during the COVID-19 pandemic. In these scenarios, there is only one version of each figure.

However, sometimes I need to be able to produce multiple versions of the same figure/table, such as in producing reports for individual school districts using state-wide administrative data.

While I have yet to find a single package in Stata that makes these types of automated reports easy1, I have come up with a workflow for producing several individual PowerPoint presentations for different subsets of data. This page is a simplified guide to this workflow. All code and the template file is available on github.

Broadly, there are three steps:

  1. Create a template presentation.
  2. Match filenames of tables/figures to renamed versions in the template.
  3. Use a loop across reporting groups to create copies of the template with tables/figures updated for each group.

Create a “template” presentation.

PowerPoint presentations are essentially zip files with specific contents inside. If you rename a .pptx file to end in .zip, you’ll be able to open and explore the components that make up that presentation. We’re going to use this to our advantage to replace specific figures and tables within a template presentation. By also looping across subgroups within our data, we can obtain individualized PowerPoint reports for each subgroup.

For this post, I’ve created a simple, 5-slide presentation consisting of a title slide, three figures, and one table using the Gapminder dataset2. There’s no meaning or story behind this presentation, but it’s good enough for now. The code to produce these figures/graph can be found in this example’s github repo.

For the figures, I simply inserted them as I normally would. However, there are added steps to inserting and updating a table.

table1_pre

I began with the raw output placed into an excel file (shown above) and then formatted the table to be more visually appealing. Importantly, I didn’t delete any slides or otherwise change which cell in the table data will appear in. When inserting this table into PowerPoint you’ll need to paste it in as an embedded table by right clicking and selecting that option.

table1_post

Match filenames of tables/figures to renamed versions in the template.

When we insert figures or tables into PowerPoint and save the file, those files are automatically renamed according to the program’s own taxonomy. To update them, we need to figure out what PowerPoint has renamed them to. Fortunately, this is relatively simple:

  1. Rename your template.pptx file to be template.zip. You may need to change a setting to see the file extension.
  2. Open the .zip file and navigate to ppt/media/. You should see a list of image files similar to the one below:

contents of ppt/media/ folder

  1. Since I’m exporting to .svg files, I can ignore the .pngs. The .emf file is the table, but we’ll update that elsewhere. I can see that my graph1.svg is being stored as image2.svg, graph2.svg as image4.svg, etc.
  2. There’s no need to change anything in the file, but make note of these names as we’ll need to refer to them later.

The process for embedded tables is similar:

  1. Open the .zip file and navigate to ppt/embeddings/. You should see one Excel file for each embedded table.
  2. Again, you won’t need to edit these but make note of what each file is stored as so we can refer to it in part 3.

After you’ve copied the file names down somewhere safe, rename your template back to something like template.pptx.

Looping across groups to create presentations for each group.

In this example, I’m producing separate presentations for each of the five continents in the data set. The code probably has a few unfamiliar components so we’ll look through it one chunk at a time.

// Load data from github repo
use "https://github.com/andrewmcamp/Stata-to-PPT-Example/raw/main/gapminder.dta", clear

// Get the possible values of continents and begin loop
levelsof continent, local(continents)
foreach c in `continents' {
	// Get the label for each level of the continent variable to be used when saving the file
	local cname: label continent `c'
	di "Creating .ppt for `cname'."

So far, we’ve simply downloaded our data from Github and began our loop across the values of the continents variable. Also, since continent is a factor variable we created a local (cname) to store the name of each continent (e.g., Africa, Asia, etc.).

// Preserving/restoring the data before we limit to each continent
	preserve
		keep if continent == `c'

We’re doing two things here: (1) saving our unaltered data to restore later and (2) limiting our sample to just those where the continent variable equals whichever continent we’re focusing on for this loop.

		// First, clean up any files in the current directory leftover from previous loop
		cap erase "template.zip"
		cap shell rd "_rels" /s /q
		cap shell rd  "docProps" /s /q
		cap shell rd  "ppt" /s /q
		cap erase "[Content_Types].xml"

		// Next copy our template and rename as a .zip file
		shell copy "template.pptx" "template.zip"

		// Next, unzip our template
		unzipfile "template.zip"

Next, we clean up any files that might have been left over from a previous iteration of the loop, copy our template.pptx file as template.zip, and unzip the contents of that .zip file into the current folder.

			// Graph 1: GDP Per Capita vs. Life Expectancy
			twoway scatter gdpPercap lifeExp
			graph export "ppt/media/image2.svg", replace width(13in) height(7in)
			
			// Graph 2: GDP Per Capita vs. Life Expectancy by Year
			twoway scatter gdpPercap lifeExp, by(year)
			graph export "ppt/media/image4.svg", replace width(13in) height(7in)

			// Graph 3: Bar chart for populations over 1,000,000 in 2007
			graph hbar pop if year == 2007 & pop > 1000000, over(country, sort(pop))
			graph export "ppt/media/image6.svg", replace width(13in) height(7in)

			// Table 1: Average Life Expectancy, Population, and GDP Per Capita by Country
			// Getting values to put in using tabstat/tabstatmat
			tabstat lifeExp, by(year) statistics(mean sd min max) nosep save
			tabstatmat table1, nototal

			// 	Using putexcel to export the summary statistics above
			putexcel set "ppt/embeddings/Microsoft_Excel_Worksheet.xlsx", modify
			putexcel A2 = matrix(table1), names

Notice here I am exporting figures to have the same dimensions as the ones in the template file and to replace the corresponding renamed .svg files. Additionally, I am modifying the Excel worksheet instead of replacing it to maintain table formatting.

		// To finish this iteration of the loop, we simply zip the folder back up
		zipfile _rels docProps ppt "[Content_Types].xml", saving("report_`cname'.pptx", replace)
		
restore

}

This last step is simple, but important. We need to re-zip the folders and [Content_Types].xml file into a finalized .pptx file. Here, I’ve named the final output as report_'cname'.pptx so that the presentations will have a consistent filename pattern ending in the local I defined at the beginning of the loop.

Wrapping Up

If your template includes a table that needs updating you’ll need to open the output file (e.g., report_Africa.pptx) and double click on the table to update it.

That’s it - you should now have a bunch of reports that can be customized for specific audiences without using an exuberant amount of your time. This isn’t a perfect process by any means, but has worked really well for my purposes.

If you find bugs or think of improvements let me know! Thanks for reading.



  1. I know it’s possible in R, but sometimes the language used for a projected is determined by what language everyone knows. ↩︎

  2. Free data from World Bank via Gapminder.org, CC-By License. ↩︎

Andrew M. Camp
Andrew M. Camp
Distinguished Doctoral Fellow

I research teacher labor markets, teacher quality, and Covid-19’s impact on students, families, and teachers.