Date:

Share:

Functional Dependencies in SQL GROUP BY – Java, SQL and jOOQ.

Related Articles

The SQL standard recognizes an interesting feature where you can project any functional dependency of a primary (or unique) key listed in GROUP BY Section without adding the functional dependency to GROUP BY Explicit clause.

What it means? Consider this simple scheme:

CREATE TABLE author (
  id INT NOT NULL PRIMARY KEY,
  name TEXT NOT NULL
);

CREATE TABLE book (
  id INT NOT NULL PRIMARY KEY,
  author_id INT NOT NULL REFERENCES author,
  title TEXT NOT NULL
);

In order to count the number of books by author, we tend to write:

SELECT a.name, count(b.id)
FROM author a
LEFT JOIN book b ON a.id = b.author_id
GROUP BY 
  a.id,  -- Required, because names aren't unique
  a.name -- Required in some dialects, but not in others

We need to group together according to something unique in this case, because if two authors are called John Doe, we still want them to produce separate groups. so GROUP BY a.id He is given.

We’re used to it too GROUP BY a.name, Especially in those dialects that require it, since we specify a.name In the SELECT section:

  • Db2
  • Derby
  • Exasol
  • fire Bird
  • Hana
  • Informix
  • prophecy
  • SQL Server

But is it really required? It’s not according to the SQL standard, because there is a Functional dependence between author.id and author.name. In other words, for each value of

But is it really required? It’s not according to the SQL standard, because there is a Functional dependence between author.id and author.name. In other words, for each value of author.id, Has exactly one possible value of author.name, Or author.name Is a function of author.id

That means it does not matter if we are GROUP BY The two columns, or just the master key. The result must be the same in both cases, so it is possible:

SELECT a.name, count(b.id)
FROM author a
LEFT JOIN book b ON a.id = b.author_id
GROUP BY a.id

Which SQL dialects support this?

At least the following SQL dialects support this language feature:

  • Jock DB
  • H2
  • HSQLDB
  • MariaDB
  • MySQL
  • PostgreSQL
  • SQLite
  • Yugbeit

It is worth noting that MySQL used to simply ignore whether or not a column could be projected unambiguously, in the presence GROUP BY. While the following query was rejected in most dialects, it was not, in MySQL, before the entry ONLY_FULL_GROUP_BY:

SELECT author_id, title, count(*)
FROM author
GROUP BY author_id

Why should we introduce author.title, If an author wrote more than one number? This does not make sense, and yet MySQL still used to allow it, and would simply project any arbitrary value from the group.

Currently, MySQL only allows the projection of columns with a functional dependency on GROUP BY Clause, as permitted in the SQL standard.

Disadvantages and advantages

While the shorter syntax avoided by the additional columns may be easier to maintain (it is easier to project additional columns, if required), there is some risk of breaking queries in production, i.e. when basic constraints are disabled, for example for transfer. While a primary key is unlikely to be disabled in a live system, this may still be the case, and without the key, a query that was previously valid would not be valid for the same reason that the old MySQL interpretation was invalid: there is no longer a guarantee of functional dependency.

Another syntax

Starting with jOOQ 3.16, and # 11834, It will be possible to refer to tables directly in GROUP BY Section, instead of individual columns. for example:

SELECT a.name, count(b.id)
FROM author a
LEFT JOIN book b ON a.id = b.author_id
GROUP BY a

The semantics will be:

  • If the table has a master key (composite or not), use it in GROUP BY Clause, instead
  • If the table does not have a master key, write all the columns from the table instead.

Because none of the JOOQ-supported RDBMSs currently support this syntax, it is purely jOOQ synthetic feature.

Source

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Popular Articles