Chef:Powerful Infrastructure Automation
上QQ阅读APP看书,第一时间看更新

Modeling your infrastructure

Now that you're more familiar with some of the terms you need to know, let's take a look at a sample model and map it to Chef's components. At a higher level, the approach we will take is as follows:

  1. Define an overview of your infrastructure that is decomposed into roles to be performed within the model (web servers, firewalls, database servers, and so on).
  2. Collect or develop recipes that describe the configuration, software, and actions to be applied for those roles.
  3. Bootstrap hosts with the Chef client so that they can participate in being managed.
  4. Add any required configuration data into data bags to be used by nodes when running recipes such as IP address ranges, hostnames, users, software keys, or anything else that is specific to the active configuration.
  5. Segregate hosts and configurations into different environments to provide a replicated infrastructure (development, staging, production, and so on). This step is optional.

In this chapter, we will be using Chef to build the infrastructure for a multi-tiered, photo-sharing application whose components are diagrammed in the following image:

Modeling your infrastructure

Building an architecture diagram gives us a good overview of our system so that we can have a map of the system before we start building it. It is important to note that a model of our infrastructure doesn't need to be mapped directly to resources (physical or virtual); rather, it provides an abstract, high-level overview of your systems. Once you have the model, you can apply it to the resources available as you see fit.

Our sample service-oriented web application is composed of the following software services:

  • A frontend web application service
  • An image-processing engine
  • An image search engine

Each of these components is a role that is being played in the system. These roles may coexist on shared resources or may be applied to dedicated resources. A service-oriented architecture is a good example to work with for several reasons:

  • It is flexible and scalable
  • It will provide us with a complete system that is composed of multiple independent components to model, making it more interesting as an example

In this example, in addition to these roles, we might want to further configure our infrastructure to provide two different environments: one for staging and integration testing and one for production. Again, because this is a model, our staging environment and production environment will be composed of the same roles and have the same overall architecture; however, each will have different resources and configuration data associated with them. You may choose, for example, to consolidate resources in a test environment in order to keep costs down.

For this initial overview, we will assume that we have an account with a popular cloud-server-hosting company, that the network and operating systems are installed and operational, and that we have a functional and configured Chef service and workstation.

In our hypothetical system, each service can be mapped to a specific role in Chef. To model the infrastructure described, we will have a number of roles, one per element in our architecture. In this case, we will build one role for each service in our stack as they provide very specific features.

Roles

A role describes a part that a system plays in your infrastructure through a combination of recipes to execute and configure data. These roles can be fine-grained or broadly described, depending on your needs. There are benefits and drawbacks to both the approaches: fine-grained roles are smaller and easier to work with but require a larger number of roles to manage, whereas broadly scoped roles are less flexible and not as reusable.

For example, consider a typical LAMP (Linux, Apache, MySQL, and PHP) stack. The stack could be represented by three roles: an Apache web service with PHP, a MySQL database service, and an OpenSSH service for administration. Alternatively, you could define one role that describes the installation of the MySQL database service, the SSH service, and the Apache service.

Roles themselves know nothing about resources; instead, they are a description of how to configure a system in order to fill that role. The system administrator, via the chef console, assigns roles to the node(s) that they will be applied to. This may be a one-to-one, one-to-many, or many-to-one mapping, depending upon your capacity planning. At any time, an administrator can change the list of roles that are applied to a node, adding or removing them as needed. For example, you might decide to apply all your roles to one host today for cost savings, but scale them out in the future as your budget and needs grow.

Defining roles

Let's take a look at some roles we might define to model our SOA application on as described earlier in the chapter. Here, we will define fine-grained roles as they are easier to dissect and deploy onto separate nodes later. At a higher level, the following roles are what our services need to provide.

A web application service role

When defining what a web application server will need to do, we will need the following:

  • nginx HTTP service
  • Ruby 2.0
  • Memcached service
  • PostgreSQL client libraries
  • Open TCP ports on the external networks: 80 for HTTP and 443 for HTTPS

An image-processing role

This role requires some image-processing libraries and custom software to be installed:

  • ImageMagick libraries
  • Git (to check out the source code)
  • Build tools (to compile our source)
  • The latest version of our image-processing software

An image search role

A service that provides image searching through perceptual hashing will provide an image search role functionality. This role will require the following:

  • A Java runtime environment (JRE or JDK)
  • Our custom-built service that is developed in Java
  • TCP port 8999 open to internal hosts

A PostgreSQL service role

For the PostgreSQL database service role, the list is as follows:

  • PostgreSQL 9.x server
  • TCP port 5432 open to internal network clients
  • Database backup software to back up data to an external cloud data storage service such as S3

A Solr service role

A system that provides the Apache Solr service will need the following:

  • A compatible Java runtime (Oracle JRE or OpenJDK)
  • TCP port 8993 open to internal servers
  • Apache Solr itself

An OpenSSH service role

An OpenSSH service role will need the following:

  • OpenSSH server
  • TCP port 22 open on all the interfaces

Notice that these roles have no specific host information, such as IP addresses or servers to install the software on to; instead, they are blueprints for the packages we need to install and the configuration that those roles will provide, such as open ports. In order for these role definitions to be made as reusable as possible, we will write our recipes to use node- and role-specific configuration or data from our data bags to provide the required configuration data.

In order to define these roles, you will need recipes that describe the sets of steps and operations that will be applied to hosts in order to fulfill each role. For example, the PostgreSQL database server will require you to install PostgreSQL, open the firewall, and so on. These definitions are created by developing recipes that contain the necessary information to perform the tasks required, such as installing packages, generating configuration files, executing commands, and so on. Most of the services mentioned here (our custom imaging software being the likely exception) have cookbooks that already exist and are available for download.

Implementing a role

Now that you have seen what our infrastructure might look like at a higher level, let's take a look at how we will go about implementing one of our roles in Chef. Here, we will implement the PostgreSQL server role as it is simple to configure and has a very robust cookbook available already.

As mentioned before, you will need to either develop your own cookbooks or download existing ones in order to build your systems. Fortunately, there are thousands of cookbooks already written (over 1,500 as of this writing in the Chef Supermarket) and, as we will see in further chapters, developing new cookbooks is a straightforward process.

In order to define a role, we need to create it; this can be accomplished through a web interface or by using knife. Here, and elsewhere in this book, we will use knife as the way to interact with the Chef service because it provides a consistent experience across self-managed and hosted Chef. So let's get started!

The first thing you will need to do is create a new role with knife, which is as simple as executing the following:

knife role create -d postgresql_server

This will tell knife to connect to your Chef server's API and create a new role named postgresql_server. The -d flag tells knife to skip opening an editor and instead accept the default values. If you want to see what the underlying JSON looks like, omit the -d flag and make sure you have an appropriate EDITOR environment variable set. Once you run this, you can verify that your role was created with the following command:

knife role list

This will show you that you have a single role in the system, postgresql_server. Currently, however, this role is empty and has no information associated with it, just a name and an entry in the system. Now that we have a role in the system, let's look at how we can work with some recipes to make our role do something useful, such as install the PostgreSQL service.

Determining which recipes you need

Recipes are how Chef knows how to make sure that the correct packages are installed, what commands need to be executed in order to open ports on the firewall, which ports need to be opened, and so on. Like any good cook, Chef has a wide array of cookbooks at its disposal, each of which contains recipes relevant to that particular set of functionality. These cookbooks can either be developed by the system administrator or downloaded from a variety of places such as GitHub, BitBucket, or from a collection of cookbooks maintained by the Chef community on the Chef Supermarket (http://supermarket.getchef.com). We will discuss how to download and get started with some simple recipes and then further discuss how to develop and distribute our own recipes in later chapters.

Considering how we have arranged our roles, we would need recipes to install and configure the following:

  • nginx
  • A PostgreSQL server
  • A PostgreSQL client
  • Ruby 2.0
  • Solr
  • Java
  • OpenSSH
  • A Memcached server
  • Memcached client libraries
  • ImageMagick
  • Git
  • A Custom imaging software (we will call it Image-O-Rama)

Here, we will take an in-depth look at the recipe required for our PostgreSQL server and how we can leverage that to install the service on a host.

Installing a cookbook

Installing a cookbook for use on our clients is quite simple and involves only two steps:

  1. Developing a cookbook, or downloading the cookbook from somewhere.
  2. Uploading the cookbook to the Chef service using knife.

To get started, we will download an existing PostgreSQL cookbook from the Chef cookbook collection and upload it to our Chef service. Note that in order to install the PostgreSQL cookbook, you will also need to install any dependencies that are required. For simplicity, they are provided here as part of the instructions; however, you may find that when you experiment with other cookbooks in the future, you will need to download a few cookbooks before all of the dependencies are met, or use a tool such as Berkshelf for managing them.

To download a cookbook from Chef's provided collection of cookbooks, we will use knife with the following command:

knife cookbook site download <cookbook_name>

In this case, we will need to download five different cookbooks:

  • postgresql
  • build-essential
  • apt
  • chef-sugar
  • openssl

For each of the items in the list, we will download them using the following command:

knife cookbook site download postgresql
knife cookbook site download build-essential
knife cookbook site download apt
knife cookbook site download chef-sugar
knife cookbook site download openssl

Each download will result in an archive being downloaded to your workstation. These archives contain the cookbooks, and you will want to decompress them after downloading them. They can be downloaded anywhere, but it would probably be a good idea to keep them in a common cookbooks directory, something like chef/cookbooks inside your home directory would be a good idea if you need one.

Once they are downloaded and decompressed, you will need to upload them to the Chef service. This can be done with only one command using knife cookbook upload as follows; they are uploaded from the directory in which you stored your decompressed cookbooks:

knife cookbook upload -o . apt build-essential postgresql chef-sugar openssl

This will upload the five cookbooks we downloaded and tell knife to search the current directory by way of the -o . directive. Once this is done you can verify that they have been installed using the knife cookbook list command.

Once they are installed, your cookbooks are registered with the Chef service, and we can take a look at how we can configure and apply the PostgreSQL server to a new Ubuntu host.

Applying recipes to roles

Now that you have some cookbooks registered with your Chef service, you need to add them to a role's run list in order for their behavior to take effect on any end hosts. The relationship between a recipe and any given node is shown in the following diagram:

Applying recipes to roles

Because of the nature of this relationship, recipes deliberately have no knowledge of individual nodes. Just as a recipe for chocolate chip cookies has no idea about who manufactured the rolling pin and spatula; a Chef recipe is simply a set of instructions on what to do and in what order to perform those actions.

Because we have uploaded our cookbooks to the system, we have already added the recipes contained inside of those cookbooks to our system; therefore, we can now associate a recipe with our recently created role. If you look at the contents of the recipes directory inside of the postgresql cookbook, you will see that there is a server.rb file. This describes a recipe to install the PostgreSQL server and is what we will be adding to our postgresql_server role in order to perform the actual installation.

To do this, we need to edit our role and add the recipe to its run list; we will do this using knife.

Tip

Ensure that you have a valid text editor in your EDITOR environment variable; otherwise, you will have difficulty editing your entities with knife.

In order to edit our role, we can use the knife role edit command:

knife role edit postgresql_server

This will open the JSON file that represents the postgresql_server role stored in the Chef server in a text editor where you should see the following content:

{
  "name": "postgresql_server",
  "description": "",
  "json_class": "Chef::Role",
  "default_attributes": {
  },
  "override_attributes": {
  },
  "chef_type": "role",
  "run_list": [

  ],
  "env_run_lists": {
  }
}

The most important section of this JSON blob at this moment is the run_list key—this is an array of all the things we want to run. This can be a list of recipes or roles, and each of those has the following naming structure:

  • recipe[cookbook::recipe] for recipes
  • role[role_name] for roles

So our server recipe inside our postgresql cookbook would therefore be named "recipe[postgresql::server]". This is exactly what we will be adding to our role's run list JSON. Update the run_list entry from the original value of an empty array:

"run_list": [

],

To include our PostgreSQL server recipe, use the following code:

"run_list": [
  "recipe[postgresql::server]"
],

This is all we need to change now in order to apply the PostgreSQL server role to our node.

Tip

Notice that we have not added any values to the role's attributes; this means that our recipe will be executed using its default attributes. Most recipes are written with some set of acceptable default values, and the PostgreSQL server recipe is no different.

For now, there is no need to modify anything else, so save the JSON file and exit your editor. Doing this will trigger knife to upload your modified JSON in place of the previous JSON value (after doing some validation on your JSON contents), and the role will now have the postgresql::server recipe in its run list. You should see an output from knife indicating that the role was saved, and you can verify that this is the case with a simple knife role show command:

knife role show postgresql_server

This will show you an overview of the role in a more human-readable format than the source JSON. For example, our role should now have one entry in the run list as shown in the following output:

chef_type: role
default_attributes:
description:
env_run_lists:
json_class: Chef::Role
name: postgresql_server
override_attributes:
run_list: recipe[postgresql::server]

Once this is complete, our role is now ready to be applied to one of our nodes. At this point, we have uploaded our cookbooks, defined a role, and associated a recipe with our newly created role. Now let's take a look at the final step: applying our new role to a node.

Mapping your roles to nodes

As has been discussed, roles are a definition of what components need to be brought together to serve a particular purpose; they are independent of the hardware they are applied to. This helps to separate concerns and build reusable components to accelerate the configuration of infrastructure in new arrangements. In order to manifest a role, it must have a node that the role is applied to; in order to manage a node, it must have the Chef client and its dependencies installed and be registered with the Chef service.

Once a node is registered with Chef, you can set node-specific properties, assign roles and run the chef-client tool on the host in order to execute the generated run lists. For our sample application stack, we may have the following hosts running Ubuntu Linux 14.04:

  • cayenne
  • watermelon
  • kiwi

Once they are bootstrapped and registered with the Chef service, we will then decide which roles are to be applied to which nodes. This could yield a configuration that looks like the following:

  • cayenne
    • Web application service role
  • watermelon
    • A PostgreSQL database role
    • A Solr search engine role
  • kiwi
    • An image-processing role
    • An image search role

Without any hardware, roles are just an abstract blueprint for what needs to be configured together to provide a particular type of functionality. Here, we have combined our resources (cloud instances or dedicated hardware) and our recipes to build a concrete instance of our services and software.

In order to apply our newly created role to our host, watermelon, we will need to bootstrap that host, which will install the Chef client on the host and then register it with the Chef service. This is really a simple process, as we will see here, and is achieved using the knife bootstrap command:

knife bootstrap -x root -d ubuntu14.04 <ip address>

Tip

For our example, the node will use an Ubuntu 14.04 host created on DigitalOcean, an inexpensive cloud-hosting provider; you can bootstrap just about any modern Linux distribution, but if you are following along with the commands in the book, you will get the best results by using an Ubuntu 14.04 machine.

This process will go through the steps required to install the Chef client on the node and register it with your Chef service. Once it is complete, you will see that the Chef client has finished with an output similar to the following:

Chef Client finished, 0/0 resources updated in 4.264559367 seconds

If you want to verify that the host has been added, a simple knife node list command will show you that it has been registered with the Chef service. If you don't see the client output above, or you don't see the newly bootstrapped node in your list, make sure that the output of knife bootstrap doesn't indicate that anything went wrong along the way.

Once our node is registered, we can add our postgresql_server role to our node's run list using the following knife command:

knife node run_list set watermelon role[postgresql_server]

This command will set the run list on our new host, watermelon, to contain the postgresql_server role as its only entry. This can be verified using the knife node show command:

knife node show watermelon 

Node Name: watermelon
Environment: _default
FQDN: watermelon
IP: 162.243.132.34
Run List: role[postgresql_server]
Roles:
Recipes:
Platform: ubuntu 14.04
Tags:

Now that the node has a run list with entries, it's time to actually converge the node. Converging a node means that the Chef server will compile the configuration attributes and then provide the end host with a complete list of recipes to run along with the required cookbook data and then execute them on the node.

Converging a node

Converging a node is done by executing the chef-client command-line utility on the host; this can be done in one of two different ways. The simplest way is to SSH into the host using an SSH client and then execute chef-client as the root; another way is to use knife to execute a command on a set of hosts in parallel, which will be discussed in later chapters. For now, simply SSH into your server and run chef-client as the root:

root@watermelon:~# chef-client

The Chef client will connect to the Chef service and download any information needed to execute its complete run list. A node's run list is determined by expanding every entry in the node's run list until it is a list of recipes to execute. For example, our node contains one element in its run list, the postgresql_server role. This role, in turn, contains one entry, the postgresql::server recipe, which means that the fully expanded run list for our node contains only one entry. In this simple case, we could have just added the recipe directly to our node's run list. However, this has a number of shortcomings, including not being able to add extra configuration to all the PostgreSQL servers in our infrastructure, as well as a number of other reasons that will be discussed in the future.

In addition to computing the run list, the Chef service will also determine what the final set of configuration data for our node will look like and then deliver it to the client. This is computed according to a set of rules shown later in this chapter. Once that is delivered, the client will also download all the cookbooks needed in order to know how to execute the recipes specified in the final run list. In our case, it will download all the five cookbooks that we uploaded previously, and then, when the client run is complete, the result will be presented in a running PostgreSQL server.

Once the client run is complete, it will report on how long the run took and how many resources were modified. The output should look something like the following:

Chef Client finished, 8/10 resources updated in 61.524995797 seconds

Assuming that nothing went wrong, you should now have a fully functional PostgreSQL server deployed to your new host. This can be verified by looking at the process list for a PostgreSQL process:

root@watermelon:~# ps ax |grep post
11994 ? S 0:00 /usr/lib/postgresql/9.3/bin/postgres -D

There you have it, with only one command; your node has now been provisioned as a PostgreSQL database server. Now let's take a look at how we can use some other features of Chef to model our infrastructure.

Environments

Beyond creating roles and having resources to apply them to, there are often requirements around grouping certain resources together to create a distinct environment. An example of this might include building a staging environment that functions exactly like a production environment for the purposes of preproduction testing and simulation. In these cases, we would have an identical set of roles but would very likely be applying them to a different set of nodes and configuration values. Chef achieves this separation through the environment primitive, which provides a way to group separate configurations and hosts together so that you can model similar, yet separate, portions of your infrastructure.

In our hypothetical infrastructure, the three hosts in our production environment may be condensed down to one server in preproduction in order to save money on resources (or for other reasons). To do this, we would need to bootstrap a node, perhaps named passionfruit and then configure it to have all of the roles applied to it, rather than spreading them out across systems, as shown in the following figure:

Environments

Here, in the previous image, you can see that each environment has a very similar setup but a different set of IP addresses and resources. Even though we have a heterogeneous hardware scale in our environments (production has three nodes, and preproduction has only one), any changes we make will be applied to all of the systems in a consistent manner.

In order to achieve this type of functionality, Chef needs a way to organize and compile the configuration data in order to provide it to the end host when the time comes to configure the host. Now that we understand how to model our systems with Chef, let's take a look at how Chef handles the configuration data to make all of this happen.