Measures of position, central tendency and dispersion

1865
Charles McCarthy

The measures of central tendency, dispersion and position, are values ​​that are used to properly interpret a set of statistical data. These can be worked directly, as they are obtained from the statistical study, or they can be organized in groups of equal frequency, facilitating the analysis..

The three best-known measures of central tendency and some of their properties. Source: F. Zapata.

Measures of central tendency

They allow knowing around which values ​​the statistical data are grouped.

Arithmetic average

It is also known as the average of the values ​​of a variable and is obtained by adding all the values ​​and dividing the result by the total number of data.

  • Arithmetic mean for ungrouped data

Let be a variable x of which we have n data without organizing or grouping, its arithmetic mean is calculated as follows:

And in summation notation:

Example

The owners of a mountain tourist inn have the intention of knowing how many days on average the visitors stay in the facilities. For this, a record of the days of permanence of 20 groups of tourists was kept, obtaining the following data:

1; 1; two; two; 1; 4; 5; 1; 3; 4; 5; 4; 3; 1; 1; two; two; 3; 4; 1

The average number of days tourists stay is:

  • Arithmetic mean for grouped data

If the data of the variable are organized in a table of absolute frequencies fi and the class centers are x1, xtwo,..., xn, the mean is calculated by:

In summation notation:

Median

The median of a group of n values ​​of the variable x is the central value of the group, provided the values ​​are ordered in increasing order. In this way, half of all values ​​are less than the mode and the other half are greater..

  • Median of ungrouped data

The following cases may occur:

-Number n of values ​​of the variable x  odd: the median is the value that is right in the middle of the group of values:

-Number n of values ​​of the variable x pair: in this case the median is calculated as the average of the two central values ​​of the data group:

Example

To find the median of the data from the tourist hostel, they are first ordered from lowest to highest:

1; 1; 1; 1; 1; 1; 1; two; two; two; two; 3; 3; 3; 4; 4; 4; 4; 5; 5

The number of data is even, therefore there are two central data: X10 and Xeleven and since both are worth 2, their average is also.

Median = 2

  • Median of pooled data

The following formula is used:

The symbols in the formula mean:

-c: width of the interval that contains the median

-BM: lower bound of the same interval

-Fm: number of observations contained in the interval to which the median belongs.

-n: total data.

-FBM: number of observations before of the interval containing the median.

fashion

The mode for ungrouped data is the value with the highest frequency, while for grouped data it is the class with the highest frequency. Fashion is considered the most representative data or class of the distribution.

Two important characteristics of this measure is that a data set can have more than one mode, and the mode can be determined for both quantitative and qualitative data..

Example

Continuing with the data of the tourist parador, the one that is repeated the most is 1, therefore, the most usual thing is that tourists stay 1 day in the parador.

Measures of dispersion

Measures of dispersion describe how clustered the data is around the central measures.

Rank

It is calculated by subtracting the largest data and the smallest data. If this difference is large, it is a sign that the data is scattered, while small values ​​indicate that the data is close to the mean..

Example

The range for the data of the tourist parador is:

Range = 5−1 = 4

Variance

  • Variance for ungrouped data

To find the variance stwo It is necessary to first know the arithmetic mean, then the squared difference between each piece of data and the mean is calculated, all of them are added and divided by the total number of observations. These differences are known as deviations.

The variance, which is always positive (or zero), indicates how far the observations are from the mean: if the variance is high, the values ​​are more dispersed than when the variance is small.

Example

The variance for the data from the tourist hostel is:

1; 1; two; two; 1; 4; 5; 1; 3; 4; 5; 4; 3; 1; 1; two; two; 3; 4; 1

  • Variance for grouped data

To find the variance of a grouped data set, we require: i) the mean, ii) the frequency fi  which is the total data in each class and iii) xi  or class value:

The standard deviation is the positive square root of the variance, so it has an advantage over the variance: it comes in the same units as the variable under study and thus you have a more direct idea of ​​how close or far the variable is from average.

  • Standard deviation for ungrouped data

It is determined simply by finding the square root of the variance for ungrouped data:

The standard deviation for the data from the tourist hostel is:

s = √ (stwo) = √1.95 = 1.40

  • Standard deviation for grouped data

It is calculated by finding the square root of the variance for grouped data:

Position measurements

Measures of position divide an ordered set of data into pieces of equal size. The median, in addition to being a measure of central tendency, is also a measure of position, since it divides the whole into two equal parts. But smaller parts can be obtained with quartiles, deciles and percentiles.

Quartiles

The quartiles divide the set into four equal parts, each containing 25% of the data. They are denoted as Q1, Qtwo and Q3 and the median is the quartile Qtwo. In this way, 25% of the data is below the Q quartile.1, 50% below the Q quartiletwo or median and 75% below the Q quartile3.

Figure 2. The quartiles divide the data set into four equal parts. Source: F. Zapata.
  • Quartiles for ungrouped data

The data is ordered and the total is divided into 4 groups with the same number of data each. The position of the first quartile is found by:

Q1 = (n + 1) / 4

Where n is the total data. If the result is an integer, the data corresponding to that position is located, but if it is decimal, the data corresponding to the integer part is averaged with the next, or for greater precision it is linearly interpolated between said data.

Example

The position of the first quartile Q1 for the data of the tourist parador is:

Q1 = (n + 1) / 4 = (20 + 1) / 4 = 5.25

This is the position of quartile 1 and since the result is decimal, the data X is searched5 and X6, which are respectively X5 = 1 and X6 = 1 and are averaged, resulting in:

First quartile = 1

1; 1; 1; 1; 1; 1; 1; two; two; two; two; 3; 3; 3; 4; 4; 4; 4; 5; 5.

The position of the second quartile Qtwo it is:

Qtwo = 2 (n + 1) / 4 = 10.5

What is the average between X10 and Xeleven and matches the median:

Second quartile = Median = 2

The position of the third quartile is calculated by:

Q3 = 3 (n + 1) / 4 = 3 (20 + 1) / 4 = 15.75

It is also decimal, therefore X is averagedfifteen and X16:

1; 1; 1; 1; 1; 1; 1; two; two; two; two; 3; 3; 3; 4; 4; 4; 4; 5; 5.

But since both are worth 4:

Third quartile = 4

The general formula for the position of quartiles in ungrouped data is:

Qk = k (n + 1) / 4

With k = 1,2,3.

  • Quartiles for grouped data

They are calculated in a similar way to the median:

The explanation of the symbols is:

-BQ: lower boundary of the interval containing the quartile

-c: width of that interval

-Fwhat: number of observations contained in the quartile interval.

-n: total data.

-FBQ: number of data before of the interval containing the quartile.

Deciles and percentiles

The deciles and percentiles divide the data set into 10 equal parts and 100 equal parts respectively, and their calculation is carried out in a similar way to that of the quartiles.

  • Deciles and percentiles for ungrouped data

The formulas are used respectively:

Dk = k (n + 1) / 10

With k = 1,2,3… 9.

Decile Dmust be equal to the median.

Pk = k (n + 1) / 100

With k = 1,2,3… 99.

The P percentilefifty must be equal to the median.

Example

In the example of the tourist inn, the position of the D3 it is:

D3 = 3 (20 + 1) / 10 = 6.3

Since it is a decimal number, X is averaged6 and X7, both equal to 1:

1; 1; 1; 1; 1; 1; 1; two; two; two; two; 3; 3; 3; 4; 4; 4; 4; 5; 5

It means that 3 tenths of the data is below X7 = 1 and the rest above.

  • Deciles and percentiles for grouped data

The formulas are analogous to those for quartiles. D is used to denote deciles and P for percentiles, and the symbols are interpreted similarly:

The empirical rule

When the data is symmetrically distributed and the distribution is unimodal, there is a rule called  empirical rule or rule 68 - 95 - 99, that groups them in the following intervals:

  • 68% of the data is in the range:

  • 95% of the data is in the range:

  • 99% of the data is in the range:

Example

In what interval is 95% of the data of the tourist parador?

They are in the interval: [2.5−1.40; 2.5 + 1.40] = [1.1; 3.9].

References

  1. Berenson, M. 1985. Statistics for management and economics. Interamericana S.A.
  2. Devore, J. 2012. Probability and Statistics for Engineering and Science. 8th. Edition. Cengage.
  3. Levin, R. 1988. Statistics for Administrators. 2nd. Edition. Prentice hall.
  4. Spiegel, M. 2009. Statistics. Schaum series. 4th Edition. Mcgraw hill.
  5. Walpole, R. 2007. Probability and Statistics for Engineering and Sciences. Pearson.

Yet No Comments