---
title: "Understanding SQL Aggregate Functions"
description: "Aggregate functions are the backbone of data analysis in SQL. Learn about what they are, how they work, and advanced tools for aggregation."
section: "Postgres basics"
---

> **TimescaleDB is now Tiger Data.**

*Written by *[*Dylan Paulus*](https://www.timescale.com/blog/author/dylan/)*
*

You may have heard that "data is the new oil." By itself, data is unrefined and not valuable, but given processing and refinement, it becomes precious. We gain insights into our products, applications, and customers by exploring our data. PostgreSQL exposes aggregate functions that give us the tools to transform and process our data to provide meaning. 

In this article, we'll take a look at how to use SQL aggregate functions, the pitfalls, and how Timescale gives us advanced tooling to aggregate time-series data.

> 
[<u>Learn the basics of PostgreSQL aggregation</u>](https://www.timescale.com/learn/data-aggregation-postgresql).


## Aggregate Functions

PostgreSQL aggregate functions allow us to pull meaning from all the data we store in our database. Aggregate functions take in a list of data (a bunch of rows) to produce a single, meaningful output. 


The best way to visualize aggregate functions is to work through an example. Let's look at the `avg()` or average function. The average function tells us our dataset's [<u>arithmetic mean</u>](https://en.wikipedia.org/wiki/Arithmetic_mean).

Let's say we have a table of products in a hypothetical store:

`-- create
CREATE TABLE products (
  id SERIAL PRIMARY KEY,
  name TEXT NOT NULL,
  price DECIMAL NOT NULL
);`

`-- insert
INSERT INTO products (name, price) VALUES ('pen', 2.50);
INSERT INTO products (name, price) VALUES ('paper', 1.25);
INSERT INTO products (name, price) VALUES ('hammer', 6.76);
INSERT INTO products (name, price) VALUES ('blanket', 12.45);
INSERT INTO products (name, price) VALUES ('chair', 59.99);
`

We can write a query using `avg()` to find out the average price of all our products by running:

`SELECT avg(price) FROM products;
`


Of course, a large list of different [<u>aggregate functions provided by PostgreSQL</u>](https://www.postgresql.org/docs/current/functions-aggregate.html) is at our disposal. A few of the most used aggregate functions include:

- `SUM()` : adds up all the input values

- `MAX()` : finds the largest of the input values

- `MIN()` : finds the smallest of the input values

- `COUNT()` : adds up the number of rows (not to be confused with `SUM()`!)


### Grouping aggregates

One of the biggest sources of frustration around aggregates is intermixing aggregate functions with column data. Building on our previous product table, let's include a `category` column.

`CREATE TABLE products (
  id SERIAL PRIMARY KEY,
  name TEXT NOT NULL,
  price DECIMAL NOT NULL,
  category TEXT
);`

`INSERT INTO products (name, price, category) VALUES ('pen', 2.50, 'office');
INSERT INTO products (name, price, category) VALUES ('paper', 1.25, 'office');
INSERT INTO products (name, price, category) VALUES ('hammer', 6.76, 'tools');
INSERT INTO products (name, price, category) VALUES ('blanket', 12.45, 'home');
INSERT INTO products (name, price, category) VALUES ('chair', 59.99, 'home');
`

And when finding the average price of all the products, we want to include the `category` column in the result like so:

`SELECT avg(price), category FROM products;
`

Run the SQL command and boom! An error is given to us. 


This is because the `price` column gets [reduced](https://en.wikipedia.org/wiki/Fold_(higher-order_function)) or "smushed down" into a single value. `category` loses meaning when we find the average price of *all* products. In this error, PostgreSQL is letting us know that we need to 1) include `category` in the `avg()` aggregation or 2) group the average price by category. Since finding the average of a string value is impossible, our best bet is option 2. We can use [<u>SQL's </u>`<u>GROUP BY</u>`](https://www.timescale.com/learn/understanding-group-by-in-postgresql-with-examples) to group the results by `category`—finding the average price by category.

`SELECT avg(price), category FROM products GROUP BY category;

`


Taking advantage of PostgreSQL's `GROUP BY`, we can start to see the power of aggregate functions; in this example, we have insight into the average cost of products in a given category.


### HAVING vs. WHERE

You have probably run into the [`<u>WHERE</u>`<u> clause</u>](https://www.timescale.com/learn/understanding-where-in-postgresql-with-examples) when filtering queries, but there is another way to filter results using the `HAVING` clause, which is generally less used. Though they appear to behave similarly, `WHERE` and [`<u>HAVING </u>`<u>clauses</u>](https://www.timescale.com/learn/understanding-having-in-postgresql-with-examples) have unique and distinct effects on aggregate functions. Let's take a look at both.


Both `HAVING` and `WHERE` will filter the result set by some conditional. If we don't want to include the average price of `home` items, we could write the query using either SQL clause:

`-- where
SELECT avg(price), category FROM products WHERE category != 'home' GROUP BY category;`

`-- having
SELECT avg(price), category FROM products GROUP BY category HAVING category != 'home';

`


Though it's a slightly different syntax, the result is the same.


Instead of filtering by `category`, we want to only get the `categories` whose average price is over $2. Easy enough; let's modify both queries.

`-- where
SELECT avg(price), category FROM products WHERE avg(price) > 2.0 GROUP BY category;
`

`-- having
SELECT avg(price), category FROM products GROUP BY category HAVING avg(price) > 2.0;
`

Run these two queries separately, and you'll find a problem. The query using `WHERE` fails, but the query using `HAVING` succeeds. What gives? The main distinction between `WHERE` and `HAVING` is that the `WHERE` filter is applied *before* aggregation takes place. `HAVING` filters get applied *after* aggregation takes place. Since our example filters the result set using an aggregate function `avg(price) > 2.0`, we can only filter after aggregation occurs—by using `HAVING`.


### Filter

The `FILTER` clause adds an additional way to limit the data aggregate functions operate on. Instead of `WHERE` or `HAVING`, which filters the result for the entire query, `FILTER` only applies to the given aggregate function. This means we can use multiple aggregate functions in a single query. First, let's look at an example of querying for products with a single `FILTER` clause:

`SELECT 
 avg(price) FILTER (where category = 'home') as avg_home_prices
FROM products;
`


Using `FILTER`, we can include multiple aggregate functions in a query with different filtering conditions.


`SELECT 
 avg(price) FILTER (where category = 'home') as avg_home_prices, 
 sum(price) filter (where category = 'office') as sum_office_prices,
 count(*) filter (where category = 'tools') as total_tools
FROM products;
`


## How Aggregates Work


On the surface, aggregate functions look similar to standard functions, but there is a critical difference between the two. Aggregate functions work on columns, whereas standard functions work on rows. For example, a standard function like `CEIL()` rounds a value to the greatest integer *per row*. An aggregate function like `SUM()` takes in a range of columns and produces a single result.

Aggregation has three main components. PostgreSQL loops through all the rows and keeps track of new and already-seen rows. A function called the `state transition function` is called on each new row, which updates an `internal value`. Once all the rows have been looped through, a `final function `is called with the internal value to produce a final result. 

Let's take, for example, the `AVG()` aggregate function with our `products` table.

- The initial state is `(0, 0)` for `price = 0` and `count = 0`

- The `state transaction function` is called for each row in the table

	- For `AVG()`, the current price is added to the total price, and count gets one added to it

	- `(total price + row price, index + 1)`

- Finally, the `final function` calculates the average from the `internal state`

	- `total price / index`


The exact process is followed for all aggregate functions.

The separation of `state transition function` and `final function` optimizes aggregate functions by keeping state transition functions small and offloading the heavy processing until all the rows have been looped through.


## Aggregation With TimescaleDB

TimescaleDB expands on aggregation functions over hypertables using [<u>hyperfunction aggregates</u>](https://docs.timescale.com/api/latest/hyperfunctions/). Hyperfunction aggregates allow us to analyze time-series data. Some hyperfunction aggregates are provided out of the box, but others require the [<u>timescaledb_toolkit</u>](https://docs.timescale.com/self-hosted/latest/tooling/install-toolkit/) extension installed. 

Similarly to PostgreSQL aggregate functions, hyperfunction aggregates have a `state transition function` (accessor) and `final function` (rollup). By combining different aggregations, accessors, and [<u>rollup functions</u>](https://www.timescale.com/blog/function-pipelines-building-functional-programming-into-postgresql-using-custom-operators/), we can create powerful insights into our data. Each of these operations is separated to provide a more functional programming approach to data aggregation. For example, to create a hyperfunction aggregation, we first create the aggregation (with an aggregation function like `stats_agg`), and then we pass the aggregation result to an accessor (like `average`).

To get a practical look at how this works, let's look at an example using `stats_agg`, `average`, and `time_bucket` to find an average.

First, create a `conditions` table with data:

`CREATE TABLE conditions (
   time        TIMESTAMPTZ       NOT NULL,
   location    TEXT              NOT NULL,
   device      TEXT              NOT NULL,
   temperature DOUBLE PRECISION  NULL,
   humidity    DOUBLE PRECISION  NULL
);`

`
SELECT create_hypertable('conditions', by_range('time'));
`

`INSERT INTO conditions (time, location, device, temperature) VALUES (NOW(), 'home', 'omega', 72.3);
INSERT INTO conditions (time, location, device, temperature) VALUES (NOW() + interval '1 day', 'home', 'omega', 55);
INSERT INTO conditions (time, location, device, temperature) VALUES (NOW() + interval '2 day', 'home', 'omega', 65);
INSERT INTO conditions (time, location, device, temperature) VALUES (NOW() + interval '2 day', 'home', 'alpha', 82);
INSERT INTO conditions (time, location, device, temperature) VALUES (NOW(), 'home', 'alpha', 83);
INSERT INTO conditions (time, location, device, temperature) VALUES (NOW(), 'home', 'alpha', 83);
INSERT INTO conditions (time, location, device, temperature) VALUES (NOW() + interval '25 minutes', 'home', 'alpha', 90);
`

We want to find the average temperature by day. First, we need to group the [time series data](https://www.tigerdata.com/blog/time-series-introduction) into buckets of one-day intervals. Then, by using `stats_agg()` to create an aggregate, we can pass that into `average()` to calculate the average temperature per day.

`SELECT 
    time_bucket('1 day'::interval, time), 
    average(stats_agg(temperature))
FROM conditions
GROUP BY 1;
`


By combining different aggregates, accessors, and [<u>rollup functions</u>](https://www.timescale.com/blog/function-pipelines-building-functional-programming-into-postgresql-using-custom-operators/) (if you prefer to watch a video, check the one below) provided by Timescale, we can gain even more power over our time-series data.

[.](https://www.youtube.com/embed/vX8i0Bcb08I?si=FWSuXanIQEx1MJLB)


## Conclusion

PostgreSQL's aggregate functions are powerful tools for extracting meaningful insights from datasets, aiding in data-driven decision-making. But why stop there? Timescale takes these capabilities to the next level with hyperfunctions that easily give insights into your time-series data. 


- You can look at [<u>Timescale's documentation on hyperfunction aggregates to learn even more about hyperfunction aggregates</u>](https://docs.timescale.com/api/latest/hyperfunctions/). 
- In this blog post, we also explain how [<u>PostgreSQL aggregation influenced the design of our hyperfunctions</u>](https://www.timescale.com/blog/how-postgresql-aggregation-works-and-how-it-inspired-our-hyperfunctions-design/). 
- Additionally, to learn more about aggregate functions and possible options, look at the [<u>official PostgreSQL documentation</u>](https://www.postgresql.org/docs/current/functions-aggregate.html). 


If you want to try aggregate functions and experiment with the extremely powerful hyperfunction aggregates, [<u>create a free Timescale account</u>](https://console.cloud.timescale.com/signup) to get started today!