11.2.7 Median and Percentiles

The median and percentile functions described in this section operate on sorted data. For convenience we use "quantiles", measured on a scale of 0 to 1, instead of percentiles (which use a scale of 0 to 100).

median_from_sorted_data( data)
This function returns the median value of data. The elements of the array must be in ascending numerical order. There are no checks to see whether the data are sorted, so the function sort should always be used first.

When the dataset has an odd number of elements the median is the value of element (n-1)/2. When the dataset has an even number of elements the median is the mean of the two nearest middle values, elements (n-1)/2 and n/2. Since the algorithm for computing the median involves interpolation this function always returns a floating-point number, even for integer data types.

quantile_from_sorted_data( data, F)
This function returns a quantile value of data. The elements of the array must be in ascending numerical order. The quantile is determined by the F, a fraction between 0 and 1. For example, to compute the value of the 75th percentile F should have the value 0.75.

There are no checks to see whether the data are sorted, so the function sort should always be used first.

The quantile is found by interpolation, using the formula

$\displaystyle quantile = (1 - \delta) x_i + \delta x_{i+1}$ (11.15)

where $ i$ is $ floor((n - 1)f)$ and $ \delta$ is $ (n-1)f - i$ .

Thus the minimum value of the array (data[0]) is given by F equal to zero, the maximum value (data[-1]) is given by F equal to one and the median value is given by F equal to 0.5. Since the algorithm for computing quantiles involves interpolation this function always returns a floating-point number, even for integer data types.