Introduction to Puppet: Part 1

48 page views   |   890 words   |  
Puppet is a programming language that was designed to automate the administration of infrastructure. What is that? Most programming languages, like C++, Java or Ruby, are designed to build applications. C++ built the web browser you are using to read this, Java built the web server that served your Amazon purchases, and Ruby powered the Stripe transactions you used when you paid that awesome creator for their work.

Puppet is different, it's a language that's used to build and configure the machines that run applications. This type of programming is more declarative, whereas application development languages like Ruby are imperative (and also object-oriented, but that's just a facade on top of imperative programming, fight me).

What's something you can do with Puppet? You can set up a self-hosted blog. In this blog post I want to introduce the Puppet language using the example of this blog. That is: how do you spin up a web server with a blog like this one?

Before we get into the code, we also need to know how this program could be run. The puppet code will get compiled and then sent to a Puppet Master, with a number of subordinate Puppet Agent nodes.

Puppet Master/Agent Architecture

The puppet code is compiled into a catalog and then put into the appropriate directory on a Puppet Master. The catalog represents a desired state of the system. In control theory, this desired state is called the setpoint.

The Puppet Agent nodes periodically check into the master for new catalogs. The catalogs represent a 'setpoint' desired configuration on the agent, so the puppet agent applies that catalog to itself, and then reports facts back to the master. In control theory speak, these facts are the closing of the feedback loop.
puppet-master-agents.png 27.7 KB

Masterless Mode

It's also possible to run puppet without setting up a master node. This is what we can do for this example. We will build a blog, all you need is a VM or host that you can ssh into, and then install puppet, write down this example manifest, customize it with your blog name, and then run:

puppet apply

And visit the IP address of the machine.

But first let's get to the actual code of the manifest and break it down:
node '' {
  # Ensure the required packages are installed
  package { ['nginx', 'ruby', 'rails']:
    ensure => installed,

  # Configure Nginx to listen on port 80 and proxy to port 3000
  file { '/etc/nginx/sites-available/default':
    ensure  => file,
    content => template('modulename/nginx.conf.erb'),
    require => Package['nginx'],
    notify  => Service['nginx'],

  # Ensure Nginx service is running and enabled
  service { 'nginx':
    ensure => running,
    enable => true,

  # Define a systemd service for the Rails app
  file { '/etc/systemd/system/railsapp.service':
    ensure  => file,
    content => template('modulename/railsapp.service.erb'),
    notify  => Service['railsapp'],

  # Ensure the Rails app service is running and enabled
  service { 'railsapp':
    ensure => running,
    enable => true,
    require => File['/etc/systemd/system/railsapp.service'],

Node definition

The top level expression:
node '' {...} 

Is called the node definition. It's for catalogs that are specific to a particular node, or groups of nodes that match a regular expression.


This expression sets up the packages to be installed on the node. If you are using Ubuntu, then these packages will be installed through apt.
package { ['nginx', 'ruby', 'rails']: 
  ensure => installed 


The expressions of the form:
file { '/path/to/file' ...} 
Create files in the node expression they are under. This is used for config files and binary executables and scripts. They put your configs and programs into place on your node.


Finally, there are services. These defined the processes that you want to be running on your node. That is, if "ensure => running" is present.


There's something we left out, exec. Nodes, Packages, Files and Services will get us most of the way there, but in many cases we want side effects (declarative programming tries to avoid side effects but sometimes we need them).

One example could be running a command that downloads ssh public keys to the host. In our example, we are assuming you already have ssh access, but in a case where you didn't, you could set that up in the puppet manifest by adding this expression:
# Fetch and add SSH public keys from GitHub
exec { 'fetch_ssh_keys':
  command => "curl >> /home/your_user/.ssh/authorized_keys",
  path    => ['/bin', '/usr/bin'],
  unless  => "grep -Fq 'tlehman' /home/your_user/.ssh/authorized_keys",
  require => Package['curl'],

Putting it all together

First ssh into your host, then put the manifest above into a file with a .pp extension:

Make sure puppet is installed (assuming Ubuntu):
sudo apt update
sudo apt install puppet

Then apply the manifest:
sudo puppet apply /path/to/tlehman_blog_setup.pp

Where you change "" to your blog's URL and hostname. This will only create a one-node rails blog deployment, but if your readership grows enough that you need to scale up, email me or comment on this post and I'll put together a post about how to do multi-node deployments with load balancers.

What's nice about Puppet manifests is how you now have a file that makes your infrastructure easy to reproduce. Say your host dies, or you want to migrate to a larger EC2 instance, you can easily puppet apply on your new host.

#puppet #programming #control-theory