Variables and data types_Data Analysis with Stata-QQ阅读中文历史网

上QQ阅读APP看书，第一时间看更新

Variables and data types

There are different types of variables and data types, which we are going to see in this section.

Indicators or data variables

To find the insights and the data conclusions, the browse/edit command is helpful. Data variables store the fundamental data. As shown in the following table, the income data for different nations is stored in the Cccgdp variable and the country (Countrycode) data is stored in the pop variable. If we want to get an idea about the details of all kinds of data, then one indicator variable is needed. In the following case, Countrycode and yr will provide information regarding the country, the year, the country's GDP, and the population data (pops). The data might be as follows:

After importing the data in Stata, it is always a good practice to examine the data. It gives you an advantage in any modeling or visualization exercise.

Examining the data

Examining the data is always recommended. It is a good idea to examine your data when you first read it into Stata; you should check whether all the variables and observations are present and are in the correct format.

While the browse/edit command is used to examine the raw data, the list command is used to see the results of the data. Listing small data is possible through this command. For bigger datasets, options are used to track the data. An example is shown as follows:

List country* yr pops

Country       countrycode     yr        pops 
India         IND             2010      23452.9 |
U.S.          USA             2010      22222.1 |
Pakistan      PAK             2010      11111.2 |
China         CHN             2010      98765 |
Russia        RUS             2010      19876 |
Germany       GER             2010      23467 |

In the preceding table, the star is called the placeholder, and it instructs Stata to incorporate the entire data with the country. Alternatively, we could focus on all variables but list only a limited number of observations, for example, the observation from 14th to 19th row:

The following table contains the country, country code, year, and pops 14/19:

How to subset the data file using IN and IF

In the previous part, the in qualifier was used; it makes sure that the subset pertains to selected data. A lot of observations follow after this, for example:

The list in 14/19
The list in 90/l
The list in 30/l

As is clear from the preceding example, there are three observations:

The first command lists observations from 14 to 19
The second command lists 90 observations
The third command lists observations from 30 till the last observation

The if statement is the other way of subsetting data; it generally has values of true or false. The following is an example from the observation of the year 2010, where the variable name is yr:

list if yr == 2010

In order to examine the raw data, the browse window is used. However, a problem occurs when only selected variables are to be viewed; this happens in big datasets. So, in this condition, create a list of the variables you want to examine before browsing. This is done through the following command:

browse country yr popscon

It is important to note that this edit command will help change the dataset manually. The assert command helps Stata examine the observation. This is because when the bigger data (or big data, as it is called in today's world) arrives, checking single data through browse or edit commands becomes difficult. In this case, the assert command is helpful. There are a couple of advantages: it helps identify whether a data statement is right or wrong. For example, in the case of the population of the country (popscon), it will tell us that the values are positive:

assert popscon>0,
assert popscon<0

If the preceding command results in the value true, then assert does not give any output. However, if the command value is false, then an error message will appear.

The describe command accounts for various fundamental information regarding datasets and variables, such as the total size of the dataset and the variable, the total number of variables in the dataset, and different formats of the variables. This can be denominated as describe. It can only be applied to an unread file in Stata. An example is given as follows:

describe using "E:\Ind-Health-sample.dta"

Codebook can give information on variables in the dataset without the list of variables; an example of this is codebook country.

The summarize command delivers the statistics summary: means, standard deviation, and so on. The following table represents this tab:

summarize table
Variable         Obs      Mean       Std. Dev.    Min         Max

As we can see in the preceding table, string variables such as Cntry and Countrycode do not have numbers; this is why no summary details are available. Yr is a numeric variable; therefore, we can see that it has a statistics summary. For more details, the summarize detail option can be used.

The wide range of graphic qualities makes Stata a unique tool. One can easily get help by typing the help command in Stata. A histogram graph can be created through the following command:

graph twoway histogram cccgdps

For a scatter plot, you have to leverage the following command:

graph two-way scatter ccccgdps popscon

Even though there is some benefit of having advanced graphs in Stata, this makes it work slowly. In certain cases, it is better to use version 7 graphics because they help visualize the data properly without using papers or presentations. This can be seen as follows:

graph7 cccgdps popscon

Saving the dataset is a very easy command, and it is represented as follows:

Save "E:\Stata1\t1 less India pwt 80-2010.dta", replace

If we have sets of files of the same content, then the replace tab/option can be helpful. It will swap the last version and save it. If the old version is to be stored for some reason, then save it with a different name. One thing that should be kept in mind is that the original file content can be changed if it is saved with revised datasets. Therefore, after changes are made to the revised file, in order to open the file and restart it, just reopen it.

There are two ways to preserve and store the data. One option is to save the current data and revise it, and later, if you don't want to keep the data, then reopen the saved data version. Another option is to use the preserve and restore functions/commands; they will take an image of the data, and the data will come back after you type restore.

本周热推：

实战图解MACD波段交易技术软件测试的艺术(原书第3版)深入浅出 HTTPS：从原理到实战手把手教你学C语言 C语言编程兵书