Every now and then, I see folks lament the SQL syntax’s peculiar disconnect between
Most recently here in a Youtube comment reply to a recent jOOQ / kotlin talk. Let’s look at why jOOQ did not fall into this trap of trying to “fix” this, and why this is even a trap.
The English Language
SQL has a simple syntax model. All commands start with a verb in imperative form, as we “Command” the database to execute a statement. Common commands include:
All of these are verbs in imperative form. Think about adding an exclamation mark everywhere, eg
INSERT [this record]!
The Order of Operations
We can argue that natural languages are very poor inspiration for computer programming languages, which tend to be more mathematical (some more than others). A lot of criticism about the SQL language is that it does not “compose” (in its native form).
We can argue, that it would be much better for a more composable SQL language to start with
FROMwhich is the first operation in
SELECT according to the logical order of operations. Eg
FROM book WHERE book.title LIKE 'A%' SELECT book.id, book.title
Yes, that would be better in the sense that it would be more logical. First, we declare the data source, predicates, etc. and only in the end would we declare the projection. With the Java
Stream API, we would write:
books.stream() .filter(book -> book.title.startsWith("A")) .map(book -> new B(book.id, book.title))
The benefits of this would be:
- No disconnect between syntax and logic
- Hence: No confusion around syntax, specifically why you can not reference
- Better auto-completion (because you do not write stuff that is not declared yetfirst)
In a way, this ordering would be consistent with what some RDBMS implemented when
RETURNING data from DML statements, such as:
INSERT INTO book (id, title) VALUES (3, 'The Book') RETURNING id, created_at
With DML statements, the command (“imperative”) is still
DELETEie a verb that clearly tells the database what to do with the data. The “projection” is more of an afterthought. A utility that is occasionally useful, hence
RETURNING can be placed at the end.
RETURNING seems like a pragmatic choice of syntax, and is not even part of the standard. The standard defines the
<data change delta table>as implemented by Db2 and H2, whose syntax is:
SELECT id, created_at FROM FINAL TABLE ( INSERT INTO book (id, title) VALUES (3, 'The Book') ) AS book
I mean, why not. I do not have a strong preference for one or the other syntax (jOOQ supports both and emulates them into one another). SQL Server invented a third variant, whose syntax is probably the least intuitive (I always have to look up the exact location of the
INSERT INTO book (id, title) OUTPUT id, created_at VALUES (3, 'The Book')
Cypher query language
Probably worth mentioning here is that there exists a modern query language out there that is sufficiently popular to be considered for such discussions: The Cypher Query Language from neo4j. With a simple “trick”, it both:
- Maintained the language model where a verb in imperative form starts a statement (the verb is
MATCHwhich is similar to
FROMbut it is a verb), so it inherits SQL’s “strength” of being intuitive also for non-programmers.
- Reversed the logical order of operations within the reading statements, to be of the form
MATCH .. RETURNmaking
RETURNthe universal form of projecting things for all operations, not just
MATCHalso for writing operations, including
SET(which corresponds to SQL’s
While operating on a different data paradigm (the network model as opposed to the relational model), I’ve always found the Cypher Query Language to be generally superior to SQL in terms of syntax, at least on a high level. If I had to actually “fix” SQL by creating SQL 2.0, I’d take inspiration here.
Fixing this in an API like jOOQ isn’t worth it
As discussed before, SQL has some obvious shortcomings, and there exist better languages like Cypher solving the same kind of problem. But SQL is here, and it’s 50 years old, and it will stay. It will not be fixed.
That’s something that just has to be accepted:
SQL will not be fixed
It will be amended. It incorporates new ideas, including:
It always does so in an idiomatic, SQL style way. If you’re reading the SQL standard, or if you’re working with PostgreSQL, which is very close to the standard, you’ll feel that SQL is quite consistent as a language. Or, it is consistently weird, depending on your tastes.
For jOOQ, one of the main success factors has always been to be as close as possible to this vision of what SQL really is in terms of syntax. A lot of folks are very effective writing native SQL. Since Java has text blocks, it has become a lot more bearable to just copy paste a static SQL query from your SQL editor into your Java program, and eg execute it with JDBC or with jOOQ’s plain SQL templating API:
for (Record record : ctx.fetch( """ SELECT id, title FROM book WHERE title LIKE 'A%' """ )) System.out.println(record);
This approach is sufficient for very simple applications out there. If your “application” runs a total of 5 distinct SQL queries, you can do it with JDBC alone (although, once you’ve started to get a hang of jOOQ, you’ll probably use jOOQ even for those applications as well).
But jOOQ really shines when your application has 100s of queries, including many dynamic ones, and your database has 100s of tables, in case of which the type safety and model safety benefits really help. However, it can shine only when your SQL query translates 1: 1 to the jOOQ API. Randomly fixing SQL to some extent in this most important statement (
SELECT) will not do the trick.
Because: Where will you stop fixing SQL? SQL is still weird even if you switch to
FROM .. SELECT. For example, the semantics of
GROUP BY is still weird. Or the relationship between
ORDER BY. Eg this would appear to be much better at first (eg to separate
DISTINCTwhich should not be located so closely together):
FROM book WHERE book.title LIKE 'A%' SELECT book.title DISTINCT ORDER BY book.title
But the weird caveats would still not disappear, namely that you can
ORDER BY expressions that are not listed in
SELECT in the absence of
DISTINCTbut not in the presence of
DISTINCT (see our previous article about that).
Alternative syntaxes in other DSL APIs
So, where does the “fixing” of SQL stop? When will SQL be “fixed?” It will never be fixed, and as such, an API like jOOQ would be much harder to learn that it should be. Some competing APIs follow this model, eg
Both of these APIs are based on the idea that SQL needs “fixing,” and that a more “native,” a more “idiomatic” feel of the API would be somewhat better. Some examples:
Here’s an example from the getting started guide:
This corresponds to the following SQL:
SELECT max(price) FROM coffees
It’s arguably a bit more idiomatic. It looks like ordinary Scala collection API usage, removing the SQL feel from the equation. After all, the usual
map(x => y) collection methods really correspond to a SQL
SELECT clause (a “projection”).
Here’s an example from Baeldung:
StarWarsFilms .slice(StarWarsFilms.sequelId.count(), StarWarsFilms.director) .selectAll() .groupBy(StarWarsFilms.director)
The API introduces new terms, eg
slicewhich means the same thing as
SELECTthough foreign to both SQL or kotlin collection APIs
selectAllwhich corresponds to the relational algebra term “selection”corresponding to SQL
Synthetic convenience syntax instead of “fixing” SQL
jOOQ does not follow down this road and never will. SQL is what it is, and jOOQ will not be able to “fix” that. The 1: 1 mapping between SQL syntax and jOOQ API means that even if you want to use something sophisticated, like:
Even then, jOOQ will not let you down and will allow you to write exactly what you have in mind in terms of SQL feature. I mean, would it really make sense to support
CONNECT BY in Slick or Exposed? Probably not. They would have to invent their own syntax to give access to SQL recursion. But will it be complete? That’s a problem jOOQ will not have.
The only reason why some syntax is not available is because it’s not possible yet (and please do send a feature request). The example of
FOR XML is an excellent one. SQL Server invented this
FOR clause, and while it’s convenient for simple cases, it’s not very powerful for complex ones. I much prefer standard SQL / XML and SQL / JSON syntax, (which jOOQ also supports). But while I do not very much like the syntax, jOOQ will not judge. What good would a third syntax, entirely invented by jOOQ be for users? As I said before.
When will the “fixing” stop?
It will never stop. The alternatives I’ve mentioned will run into very difficult questions down the line when they start adding more features, if they start adding more features. While it is always easy to implement a simple
SELECT .. FROM .. WHERE query builder, and support that functionality using arbitrary API, claiming SQL has been “fixed,” it is much harder to evolve this API, addressing all sorts of advanced SQL use-cases. Just look at their issue trackers for feature requests like CTEs. The answer is always: “Use native SQL.”
Even “simple” SQL features, such as
UNION become more complex once basic SQL syntax is changed. The semantics is already tricky enough in SQL (and it’s entirely SQL’s fault, sure), but “fixing” these things is never as simple as it may look at first.
Now, there are 2 exceptions to this rule:
One exception is: “Synthetic syntax.” The most powerful synthetic syntax in jOOQ are implicit joins. Implicit joins aren’t “fixing” SQL, they’re “enhancing” SQL with a syntax that SQL itself might have (hopefully will have, eventually). Just like there exist SQL dialects, which “enhance” the SQL standard, eg
jOOQ is very conservative about such synthetic syntax. There are a lot of good ideas, but few are forward compatible. Each one of these syntaxes makes other SQL transformation features more complex, and each one has flaws that may not have been addressed yet (eg as of jOOQ 3.16, implicit joins are not possible in DML statements such as
DELETEeven if they make a lot of sense there as well. See issue # 7508).
Another type of improvement is what I call “convenience syntax.” For example, irrespective of the underlying RDBMS, jOOQ allows you to write:
select(someFunction()); // No FROM clause selectFrom(someTable); // No explicit SELECT list
In both cases, users can omit clauses that may be mandatory in the underlying SQL dialect, and jOOQ fills the generated SQL with a reasonable default:
FROM DUALtable declaration, or something similar
SELECT *projection declaration, or something similar
The idea that jOOQ should stick to SQL syntax on a 1: 1 basis was a gamble I took 13 years ago, when I made jOOQ. I wanted to design jOOQ in a way that everyone who already knew SQL would have no problems learning jOOQ, because everything is absolutely straightforward. The technique behind this API design is described here.
Others have attempted to “fix” SQL by either making their API very idiomatic considering the target language, or by inventing a new language.
13 years later, I have found that the 1: 1 mimicking approach is the only viable one, as I keep discovering new, arcane SQL features:
Creating a language is incredibly difficult (let’s consider an internal DSL API to be a sort of language). It’s almost impossible to design properly, if the goal is to support pretty much any underlying SQL feature, unless, the designer lets go of this dream of “fixing” things, and starts embracing the “dream” of “supporting” things. All the things.
SQL is what it is. And that means, the syntax is
SELECT .. FROMnot
FROM .. SELECT.