---
title: "Understanding percentile_cont()"
description: "Learn how the PostgreSQL functions percentile_cont() and percentile_disc() produce different results when calculating percentiles. Plus, examples and use cases."
section: "Postgres basics"
---

> **TimescaleDB is now Tiger Data.**

PostgreSQL has two functions to calculate the percentile for a list of values at any percentage: [`percentile_cont()`](https://www.postgresql.org/docs/14/functions-aggregate.html) and [`percentile_disc()`](https://www.postgresql.org/docs/14/functions-aggregate.html). These two functions work similarly, but they differ in how they produce the final result. Both are used with ordered-set aggregates returned by the `WITHIN GROUP` clause.

The `percentile_disc()` function returns a value from the input set that is the closest to the percentile requested. The value returned will actually exist in the set. 

Here's the `percentile_disc()` syntax:

`SELECT
	percentile_disc(<fraction double precision>) WITHIN GROUP (<sort_expression>)
FROM <table>;`

The `percentile_cont()` function returns an interpolated value between multiple values based on the distribution. It is more accurate, but it may return a fractional value between two values in the input set.

`percentile_cont()` syntax:

`SELECT
	percentile_cont(<fraction double precision>) WITHIN GROUP (<sort_expression>)
FROM <table>;`


## **Examples**

- Calculating the median
- Calculating multiple percentiles
- Calculating a series of percentiles

For the following examples, we will use this set of weather data stored in a table called `city_data:`

| day | city | temperature | precipitation |
| --- | --- | --- | --- |
| 2021-09-04 | Miami | 68.36 | 0.00 |
| 2021-09-05 | Miami | 72.50 | 0.00 |
| 2021-09-01 | Miami | 65.30 | 0.28 |
| 2021-09-02 | Miami | 64.40 | 0.79 |
| 2021-09-03 | Miami | 68.18 | 0.47 |
| 2021-09-04 | Atlanta | 67.28 | 0.00 |
| 2021-09-05 | Atlanta | 68.72 | 0.00 |
| 2021-09-01 | Atlanta | 63.14 | 0.20 |
| 2021-09-02 | Atlanta | 62.60 | 0.59 |
| 2021-09-03 | Atlanta | 62.60 | 0.39 |


### **Calculating the Median**

The median is also known as the 50th percentile. You can calculate it from the dataset with the following query:

`SELECT
	percentile_disc(0.5) WITHIN GROUP (
	ORDER BY temperature)
FROM city_data;`

The result is:

| percentile_disc |
| --- |
| 65.30 |

Because the query used `percentile_disc()`, the result is a value that exists in the dataset. If you want to find the true median, it is not a value in this data, and you have to use `percentile_cont()`. Here is the query:

`SELECT
	percentile_cont(0.5) WITHIN GROUP (
	ORDER BY temperature)
FROM city_data;`

And the result:

| percentile_cont |
| --- |
| 66.28999999999999 |

But since there are two cities, you might want to calculate the median temperature of each by adding a `GROUP BY` clause. Here is that query:

`SELECT
	city,
	percentile_cont(0.5) WITHIN GROUP (
ORDER BY
	temperature)
FROM
	city_data
GROUP BY
	city;`

The results is:

| city | percentile_cont |
| --- | --- |
| Atlanta | 63.14 |
| Miami | 68.18 |


### **Calculating Multiple Percentiles**

For this example, we are going to use a database table called conditions that contains these values:

| time | device_id | temperature | humidity |
| --- | --- | --- | --- |
| 2016-11-15 07:00:00 | weather-pro-000001 | 32.4 | 49.8 |
| 2016-11-15 07:00:00 | weather-pro-000002 | 39.800000000000004 | 50.2 |
| 2016-11-15 07:00:00 | weather-pro-000003 | 36.800000000000004 | 49.8 |
| 2016-11-15 07:00:00 | weather-pro-000004 | 71.8 | 50.1 |
| 2016-11-15 07:00:00 | weather-pro-000005 | 71.8 | 49.9 |
| 2016-11-15 07:00:00 | weather-pro-000006 | 37 | 49.8 |

Let’s say that we want to calculate various percentiles for the humidity for each device. Here is an example query:

`SELECT
	device_id,
	percentile_cont(0.25) WITHIN GROUP(
ORDER BY
	humidity) AS percentile_25,
	percentile_cont(0.50) WITHIN GROUP(
ORDER BY
	humidity) AS percentile_50,
	percentile_cont(0.75) WITHIN GROUP(
ORDER BY
	humidity) AS percentile_75,
	percentile_cont(0.95) WITHIN GROUP(
ORDER BY
	humidity) AS percentile_95
FROM
	conditions
GROUP BY
	device_id ;`

Here is a part of the result:

| device_id | percentile_25 | percentile_50 | percentile_75 | percentile_95 |
| --- | --- | --- | --- | --- |
| weather-pro-000000 | 49.29999999999999 | 50.500000000000036 | 53.10000000000007 | 54.9000000000001 |
| weather-pro-000001 | 49.09999999999999 | 50.00000000000003 | 51.60000000000005 | 55.6 |
| weather-pro-000002 | 52.500000000000036 | 53.60000000000005 | 54.00000000000006 | 54.500000000000064 |
| weather-pro-000003 | 51.100000000000016 | 51.90000000000003 | 52.90000000000004 | 53.800000000000054 |
| weather-pro-000004 | 48.60000000000001 | 49.20000000000002 | 49.60000000000002 | 50.400000000000034 |


### **Calculating a Series of Percentiles**

For this example, we are going back to our original `city_data` dataset because this query can take a long time to run on a big dataset. We are going to use the `generate_series()` to create every single whole percentage and then use those values in percentile_cont. Here is the query:

`SELECT
	city,
	percentile,
	percentile_cont(p) WITHIN GROUP (
ORDER BY
	temperature)
FROM
	city_data,
	generate_series(0.01, 1, 0.01) AS percentile
GROUP BY
	city, percentile;`

Here is a selection of the results since the query generates 200 rows of them:

| city | percentile | percentile_cont |
| --- | --- | --- |
| Atlanta | 0.25 | 62.6 |
| Atlanta | 0.26 | 62.6216 |
| Atlanta | 0.27 | 62.6432 |
| Atlanta | 0.28 | 62.6648 |
| Atlanta | 0.29 | 62.6864 |
| Atlanta | 0.30 | 62.708 |


## **Why Use the Timescale approx_percentile() Function Instead of PostgreSQL Percentile Functions?**

Calculating the percentile over large datasets, like time-series data in a Timesscale database, can involve a lot of expensive calculations. It can increase the memory footprint of the database, result in higher network costs, and make streaming data unfeasible. The aggregates are also not partializable or parallelizable.

Many times you don’t need this type of accuracy, and approximate percentile calculations will be close enough. This is why Timescale introduced the `approx_percentile()` [hyperfunction](https://www.timescale.com/learn/time-series-data-analysis-hyperfunctions). The `approx_percentile()` function implements the [UDDSketch](https://arxiv.org/pdf/2004.08604.pdf) algorithm that uses a modified histogram to approximate the shape of a distribution. This allows for calculating a “good enough” percentile without needing to use all the data or ordering it before it returns the result.

`approx_percentile()` syntax:

`approx_percentile(
    percentile DOUBLE PRECISION,
    sketch  uddsketch
) RETURNS DOUBLE PRECISION`

The second parameter is the sketch to perform the `approx_percentile` on and is usually returned from a `percentile_agg() `call. Here is an example query:

`SELECT
    approx_percentile(0.01, percentile_agg(data))
FROM generate_series(0, 100) data;`


Result:

| approx_percentile |
| --- |
| 0.999 |


## **Next Steps**

To learn more about how to use `percentile_cont()` and `percentile_disc()` in PostgreSQL, you can see the [PostgreSQL documentation](https://www.postgresql.org/docs/14/functions-aggregate.html). 

- To find out more about Timescale’s `approx_percentile()` function, you can read more about it in the [Times](https://docs.timescale.com/api/latest/hyperfunctions/percentile-approximation/approx_percentile/)[cale documentation](https://docs.timescale.com/api/latest/hyperfunctions/percentile-approximation/approx_percentile/). 

For examples of how to use these functions in your queries, see these sections of the Timescale documentation:

- [Additional approx_percentile() documentation](https://github.com/timescale/timescaledb-toolkit/blob/main/docs/percentile_approximation.md)
- [Info on aggregation and accessor functions](https://www.timescale.com/blog/how-postgresql-aggregation-works-and-how-it-inspired-our-hyperfunctions-design-2/)
- [Advanced percentile aggregation](https://docs.timescale.com/api/latest/hyperfunctions/percentile-approximation/percentile-aggregation-methods/#advanced-percentile-aggregation)
- [Percentile approximation advanced aggregation methods](https://docs.timescale.com/timescaledb/latest/how-to-guides/hyperfunctions/percentile-approx/advanced-agg/#main-content)
- [Percentile approximation](https://www.timescale.com/blog/how-percentile-approximation-works-and-why-its-more-useful-than-averages/)


## PostgreSQL Percentile Functions FAQ

**Q: What is the difference between **`**percentile_cont()**`** and **`**percentile_disc()**`** functions in PostgreSQL?**

A: The `percentile_cont()` function returns an interpolated value that may not exist in the original dataset, providing a more accurate statistical representation. In contrast, `percentile_disc()` returns an actual value from the dataset that is closest to the requested percentile. Both functions are used with `WITHIN GROUP` to specify the ordering of values.

**Q: How do I calculate the median (50th percentile) of a dataset in PostgreSQL?**

A: You can calculate the median using either percentile function with 0.5 as the parameter. For example, S`ELECT percentile_cont(0.5) WITHIN GROUP (ORDER BY temperature) FROM city_data;` will return the interpolated median value, while `percentile_disc(0.5)` would return an actual value from the dataset.

**Q: Can I calculate multiple percentiles in a single query?**

A: Yes, you can calculate multiple percentiles in one query by using multiple percentile function calls. For example: `SELECT percentile_cont(0.25) WITHIN GROUP(ORDER BY humidity) AS p25, percentile_cont(0.5) WITHIN GROUP(ORDER BY humidity) AS p50, percentile_cont(0.75) WITHIN GROUP(ORDER BY humidity) AS p75 FROM conditions GROUP BY device_id;` will return the 25th, 50th, and 75th percentiles.

**Q: How can I calculate percentiles by group in PostgreSQL?**

A: You can calculate percentiles for each group by adding a `GROUP BY` clause to your query. For example, `SELECT city, percentile_cont(0.5) WITHIN GROUP (ORDER BY temperature) FROM city_data GROUP BY city;` will return the median temperature for each city in the dataset.

**Q: What is the **`**approx_percentile()**`** function in Timescale, and when should I use it?**

A: The `approx_percentile()` function is a Timescale hyperfunction that approximates percentiles using the UDDSketch algorithm, which is much more efficient for large datasets. You should use it when exact precision isn't required, but performance is important, especially with time-series data, as it requires less memory and computational resources than PostgreSQL's native percentile functions.