Working with data bags
There are a number of things you can do with data bags.
Securing your data bags
Data bags are just JSON data, but they are stored in the system as plain text, without any security. They are also downloaded onto various hosts throughout the life cycle, which can lead to leaking of potentially sensitive information. Fortunately, Chef has a method that lets you secure this data by using knife
, along with secret keys to keep data in data bags encrypted.
Encrypting a data bag item requires a secret key; one way of generating a secret key is to generate a random number and use the Base64 encoding of that number as the secret key. This should have any line endings removed to ensure it works properly on all platforms, regardless of platform-specific line endings. Here is a quick way to generate one using the openssl
command line tool combined with tr
to remove any line endings:
$ openssl rand -base64 512 | tr -d '\r\n' > ~/.chef/data_bag_secret
In order to encrypt your data bag item, you must use knife
and pass the --secret
or --secret-file
flags to knife
when creating the item. For example, to create a data bag called credentials
and store a new entry, aws
, inside it, you would use the following command (make sure you set your EDITOR
environment variable first):
$ knife data bag create credentials aws --secret-file ~/.chef/data_bag_secret
As mentioned before, you will be presented with the contents of your new data bag item in your editor, unencrypted:
1 { 2 "id": "aws",· 3 }
Here, we can add some properties, such as a secret key:
1 { 2 "id": "aws",· 3 "secret_key": "A21AbFdeccFB213f" 4 }
Once you save this, knife
will tell you that the new data bag was created, along with the new data bag item in Chef, just as it did with the user data earlier. The only difference will be that this time the data stored in the Chef server has been encrypted using the symmetric key you provided:
$ knife data bag create credentials aws --secret-file ~/.chef/data_bag_secret Created data_bag[credentials] Created data_bag_item[aws]
To check whether your newly created data bag entry was encrypted, use knife
, as we have before, to show the contents of an item:
$ knife data bag show credentials aws id: aws secret_key: cipher: aes-256-cbc encrypted_data: SG4z4jd4VAnJ4gG0wPcJWOX7H+ZNSxG5PH+n7EgHFV9e1SciVznjaAbzK61c EW0/ iv: rKB0riCr84QhBkw+Wgc/5Q== version: 1
In order to decrypt the data in the data bag item, you need to provide the same symmetric key as you provided when you encrypted it, using the --secret
or --secret-file
argument, as can be seen here:
$ knife data bag show credentials aws --secret-file ~/.chef/data_bag_secret id: aws secret_key: A21AbFdeccFB213f
If it wasn't already obvious to you, make certain you do not lose this file. Without your secret key or secret file, you will not be able to decrypt the data in your data bag. It may be worth encrypting the secret file with a passphrase if you are going to be transmitting it to nontrusted locations as well.
An encryption key can also be stored in an alternate file on the nodes that need it, and you can specify the path location to the file inside an attribute; however, EncryptedDataBagItem.load
expects to see the actual secret as the third argument rather than a path to the secret file. In this case, you can use EncryptedDataBagItem.load_secret
to slurp the secret file contents and then pass them:
# inside your attribute file: default[:app][:aws_creds_secret] = "/opt/secret/aws.secret" # Inside your recipe aws_secret = Chef::EncryptedDataBagItem.load_secret "#{node[:app][: aws_creds_secret]}" aws_creds = Chef::EncryptedDataBagItem.load "credentials", "aws", aws_secret aws_creds["secret_key"]
Searching your data
As we discussed earlier, you can search through your data bags using Boolean search logic. This permits you to find only the entries in your data bags that you need. The same search query language is used on the command line with knife
as it is in your recipes, so that you can test your queries on the command line to ensure that they produce the right results before you put them in your recipes. You can also search through other resources as well, not just data bags.
The knife
tool uses the search
command to search through your data bags. The general syntax is:
knife search <source> "<search criteria>"
Inside a recipe, the search
method is used to search through a data bag. The syntax for this is:
search(:source, "search criteria")
The search query format is reasonably straightforward and looks like most other search engines that support the Boolean logic.
Searches on attributes come in the form of key:value
; so for example, if you wanted to find all of the users who were dwarves from our earlier data sets, you can use the search query:
knife search users "groups:dwarves"
Negating a search term can be accomplished by placing NOT
in front of the search term. For example, all users who are not hobbits will be:
knife search users "NOT groups:hobbits"
You can also use an OR
modifier:
knife search users "groups:elves OR groups:hobbits"
This last search criteria would yield the users legolas
, samwise
, and frodo
as Frodo and Samwise are in the group called hobbits and Legolas is in the elves group.
While combining search terms, you can logically AND
them together as well. For example, all users with a GID starting with 20
, who contain the group elves
can be found using the following query:
knife search users "groups:elves AND gid:20*"
You can search your nodes with the same query language—in order to find all nodes that are running some form of windows, you can search for the platform being anything that starts with win
:
knife search node "platform:win*"
This will yield all Windows hosts (results have been shortened a bit):
4 items found Node Name: i-13d0bd4f Roles: Platform: windows 6.2.9200 Node Name: WIN-CJDQ9DEOJFK Roles: umbraco_cms Platform: windows 6.2.9200 Node Name: 00c0ff3300 Roles: Platform: windows 6.2.9200 Node Name: rs-5889646228538071 Roles: Platform: windows 6.2.9200
Or, you can search for all nodes that are running windows and contain the role umbraco_cms
:
knife search node "platform:win* AND role:umbraco_cms"
Or, if you wanted to eliminate those nodes that run the Umbraco CMS, you can easily invert the role condition:
knife search node "platform:win* AND NOT role:umbraco_cms"
Because Chef uses Apache Solr to search its data, you can refer to the Apache Solr documentation on building more advanced query logic at http://wiki.apache.org/solr/SolrQuerySyntax.
Managing multiple machines with search queries
The search criteria can be used for all sorts of places: in recipes, on the command line, through API calls, and more. One very interesting application is being able to use the search query to SSH to multiple machines to perform commands in parallel:
knife ssh "fqdn:*.east.mycorp.com AND platform:ubuntu" "chef-client" -x app_user
This will contact the Chef server and ask for the nodes that match the given query string (machines whose FQDN matches the wildcard expression "*.east.mycorp.com"
and that are running Ubuntu) and then connect to them via SSH as the user app_user
and run the chef-client
command on each of them. Again, you can restrict (or expand) the server list by using a more (or less) specific query.
Once you have mastered this aspect of using knife
, you can learn more about its support for executing multiple connections concurrently and even interact with terminal multiplexers such as screen
and tmux
.