The exists
keyword can be used in that way, but really it's intended as a way to avoid counting:
--this statement needs to check the entire table
select count(*) from [table] where ...
--this statement is true as soon as one match is found
exists ( select * from [table] where ... )
This is most useful where you have if
conditional statements, as exists
can be a lot quicker than count
.
The in
is best used where you have a static list to pass:
select * from [table]
where [field] in (1, 2, 3)
When you have a table in an in
statement it makes more sense to use a join
, but mostly it shouldn't matter. The query optimiser should return the same plan either way. In some implementations (mostly older, such as Microsoft SQL Server 2000) in
queries will always get a nested join plan, while join
queries will use nested, merge or hash as appropriate. More modern implementations are smarter and can adjust the plan even when in
is used.
Assuming you're joining on columns with no duplicates, which is a very common case:
An inner join of A and B gives the result of A intersect B, i.e. the inner part of a Venn diagram intersection.
An outer join of A and B gives the results of A union B, i.e. the outer parts of a Venn diagram union.
Examples
Suppose you have two tables, with a single column each, and data as follows:
A B
- -
1 3
2 4
3 5
4 6
Note that (1,2) are unique to A, (3,4) are common, and (5,6) are unique to B.
Inner join
An inner join using either of the equivalent queries gives the intersection of the two tables, i.e. the two rows they have in common.
select * from a INNER JOIN b on a.a = b.b;
select a.*, b.* from a,b where a.a = b.b;
a | b
--+--
3 | 3
4 | 4
Left outer join
A left outer join will give all rows in A, plus any common rows in B.
select * from a LEFT OUTER JOIN b on a.a = b.b;
select a.*, b.* from a,b where a.a = b.b(+);
a | b
--+-----
1 | null
2 | null
3 | 3
4 | 4
Right outer join
A right outer join will give all rows in B, plus any common rows in A.
select * from a RIGHT OUTER JOIN b on a.a = b.b;
select a.*, b.* from a,b where a.a(+) = b.b;
a | b
-----+----
3 | 3
4 | 4
null | 5
null | 6
Full outer join
A full outer join will give you the union of A and B, i.e. all the rows in A and all the rows in B. If something in A doesn't have a corresponding datum in B, then the B portion is null, and vice versa.
select * from a FULL OUTER JOIN b on a.a = b.b;
a | b
-----+-----
1 | null
2 | null
3 | 3
4 | 4
null | 6
null | 5
Best Answer
SQL functions
... are the better choice:
For simple scalar queries. Not much to plan, better save any overhead.
For single (or very few) calls per session. Nothing to gain from plan caching via prepared statements that PL/pgSQL has to offer. See below.
If they are typically called in the context of bigger queries and are simple enough to be inlined.
For lack of experience with any procedural language like PL/pgSQL. Many know SQL well and that's about all you need for SQL functions. Few can say the same about PL/pgSQL. (Though it's rather simple.)
A bit shorter code. No block overhead.
PL/pgSQL functions
... are the better choice:
When you need any procedural elements or variables that are not available in SQL functions, obviously.
For any kind of dynamic SQL, where you build and
EXECUTE
statements dynamically. Special care is needed to avoid SQL injection. More details:When you have computations that can be reused in several places and a CTE can't be stretched for the purpose. In an SQL function you don't have variables and would be forced to compute repeatedly or write to a table. This related answer on dba.SE has side-by-side code examples for solving the same problem using an SQL function / a plpgsql function / a query with CTEs:
Assignments are somewhat more expensive than in other procedural languages. Adapt a programming style that doesn't use more assignments than necessary.
When a function cannot be inlined and is called repeatedly. Unlike with SQL functions, query plans can be cached for all SQL statements inside a PL/pgSQL functions; they are treated like prepared statements, the plan is cached for repeated calls within the same session (if Postgres expects the cached (generic) plan to perform better than re-planning every time. That's the reason why PL/pgSQL functions are typically faster after the first couple of calls in such cases.
Here is a thread on pgsql-performance discussing some of these items:
When you need to trap errors.
For trigger functions.
When including DDL statements changing objects or altering system catalogs in any way relevant to subsequent commands - because all statements in SQL functions are parsed at once while PL/pgSQL functions plan and execute each statement sequentially (like a prepared statement). See:
Also consider:
To actually return from a PL/pgSQL function, you could write:
There are other ways: