Categorical variable characteristics and examples

2739
Simon Doyle
Categorical variable characteristics and examples

The categorical variable It is the one used in statistics to assign a non-numerical or qualitative characteristic or property to some object, individual, entity, condition or procedure. It is possible to define all kinds of categorical variables according to each need. 

Examples of categorical variables are: color, sex, blood group, marital status, type of material, form of payment or type of bank account, and they are used a lot on a daily basis..

Figure 1: Color is a categorical variable. Source: pixabay

The above are the variables, but their possible values ​​are qualitative, that is, of quality or characteristic and not of a numerical measurement. For example the possible values ​​for the variable sex They are: male , hembra.

When this variable is stored in a computer program it can be declared as a text variable and the only accepted values ​​will be those already named: Male , Female.

However the same variable sex can be declared and stored as an integer if a Male is assigned 1 already Female the value is assigned two. It is for this reason that categorical variables are sometimes referred to as enumerated type.

The main characteristic of categorical variables is that unlike other variables, such as continuous and discrete, it is not possible to do arithmetic with them. However you can do statistics with them, as will be seen later.

Article index

  • 1 Examples
  • 2 Classification of categorical variables
    • 2.1 Nominal categories 
    • 2.2 Ordinal categories
    • 2.3 Binary categories
  • 3 Statistics with categorical variables
    • 3.1 Graphic representation of categorical variables
  • 4 Solved exercises
    • 4.1 Exercise 1
    • 4.2 Example 2
    • 4.3 Example 3
  • 5 References

Examples

Note the following examples of categorical variables and their possible values:

-Blood type, Range of values: A, B, AB, O

-Civil status, Categorical Values: Single (TO), Married (B), Widower (C) , Divorced (D).

-Type of material, Categories or values: 1 = Wood, 2 = Metal, 3 = Plastic

-Way to pay, Values ​​or categories: (1) Cash, (2) Debit, (3) Transfer, (4) Credit

In the previous examples, a number has been associated with each category in a completely arbitrary way. 

It could then be thought that this arbitrary numerical association makes it equal to a discrete quantitative variable, but it is not, since arithmetic operations cannot be done with these numbers.

To illustrate the idea, in the variable Way to pay, the sum operation does not make sense:

 (1) Cash + (2) Debit will never equal (3) Transfer

Classification of categorical variables

The ranking is based on whether or not they have an implicit hierarchy or if the number of possible outcomes is more than two or two.

A categorical variable with only one possible outcome is not a variable, it is a categorical constant.

Nominal categories 

When they cannot be represented by a number or have any order. For example the variable: Material_type, thas nominal values ​​(Wood, Metal, Plastic), they have no hierarchy or order, even when an arbitrary number is assigned to each answer or category.

Ordinal categorical

Variable: Academic performance

Nominal values: High, Medium, Low

Although the values ​​of this variable are not numeric, they have an implicit order or hierarchy.

Binary categories

These are nominal variables with two possible answers, for example:

-Variable: Answer

-Nominal values: True, Fake

Note that the variable Answer does not have an implicit hierarchy and only has two possible outcomes, so it is a binary categorical variable.

Some authors call this type binary variable, and they do not consider it to belong to the categorical variables which are restricted to those with more than three possible categories.

Statistics with categorical variables

Statistics can be done with categorical variables, despite not being numerical or quantitative variables. For example, to know the trend or most probable value of a categorical variable, we take Fashion.

Fashion it is, in this case, the most repeated result or value of a categorical variable. In categorical variables it is not possible to calculate neither the mean nor the median.

The mean cannot be computed because arithmetic cannot be done with categorical variables. Neither is the median, because the quantitative or categorical variables do not have an order or hierarchy, so it is not possible to determine a central value..

Graphic representation of categorical variables

Given a certain categorical variable, the frequency or number of times with which a result of that variable is repeated can be found. If this is done for each outcome, then a graph of the frequency against each category or outcome can be made..

Here are some examples of how categorical variables can be represented graphically.

Solved exercises

Exercise 1

A company has records of the data of 170 employees. One of the variables that is in these registers is: Civil status. This variable has four categories or possible values:

Single (TO), Married (B), Widower (C) , Divorced (D).

Although it is a non-numeric variable, it is possible to know how many of the total records are in a certain category and be represented in the form of a bar graph, as shown in the following figure:

Figure 2. Representation of the results of a categorical variable. Source: self made

Example 2

A shoe store keeps track of its sales. Among the variables that manage their records is the shoe color for each model. The variable: 

Color_Shoe_Model_AW3

It is categorical and has five possible categories or values. For each category of this variable, the number of sales is totaled and the percentage of them is established. The results are presented in the graph of the following figure:

Figure 3. Categorical variable Color _Shoe. In this variable the mode is White. Source: self made.

It can then be said that the AW3 shoe model that is in fashion, the one that is sold most frequently is the White, closely followed by Black

It can also be said that with a 70% probability the next shoe sold of this model will be White or Black.

This information can be useful for the store when placing new orders, or it could even apply discounts on the least sold colors due to excess inventory..

Example 3

For a certain population of blood donors, you want to represent the number of people who belong to a certain blood group. A graphic way to visualize the results is by means of a pictogram, which is in the background a table.

The first column represents the variable Blood type and its possible outcomes or categories. The second column has the representation in iconic or pictorial form of the number of people in each category. In our example, a red droplet is used as an icon, each of which represents 10 people.

Figure 4. Pictogram. Source: self made

References

  1. Khan Academy. Analyzing categorical data. Recovered from: khanacademy.org
  2. Universe formulas. Qualitative variable. Recovered from: univesoformulas.com
  3. Minitab. Which are categorical, discrete and continuous variables. Recovered from: support.minitab.com
  4. Excel tutorial. Characterization of variables. Recovered from: help.xlslat.com.
  5. Wikipedia. Statistical Variable. Recovered from wikipedia.com
  6. Wikipedia. Categorical variable. Recovered from wikipedia.com
  7. Wikipedia. Categorical Variable. Recovered from wikipedia.com

Yet No Comments