The exists
keyword can be used in that way, but really it's intended as a way to avoid counting:
--this statement needs to check the entire table
select count(*) from [table] where ...
--this statement is true as soon as one match is found
exists ( select * from [table] where ... )
This is most useful where you have if
conditional statements, as exists
can be a lot quicker than count
.
The in
is best used where you have a static list to pass:
select * from [table]
where [field] in (1, 2, 3)
When you have a table in an in
statement it makes more sense to use a join
, but mostly it shouldn't matter. The query optimiser should return the same plan either way. In some implementations (mostly older, such as Microsoft SQL Server 2000) in
queries will always get a nested join plan, while join
queries will use nested, merge or hash as appropriate. More modern implementations are smarter and can adjust the plan even when in
is used.
Assuming you're joining on columns with no duplicates, which is a very common case:
An inner join of A and B gives the result of A intersect B, i.e. the inner part of a Venn diagram intersection.
An outer join of A and B gives the results of A union B, i.e. the outer parts of a Venn diagram union.
Examples
Suppose you have two tables, with a single column each, and data as follows:
A B
- -
1 3
2 4
3 5
4 6
Note that (1,2) are unique to A, (3,4) are common, and (5,6) are unique to B.
Inner join
An inner join using either of the equivalent queries gives the intersection of the two tables, i.e. the two rows they have in common.
select * from a INNER JOIN b on a.a = b.b;
select a.*, b.* from a,b where a.a = b.b;
a | b
--+--
3 | 3
4 | 4
Left outer join
A left outer join will give all rows in A, plus any common rows in B.
select * from a LEFT OUTER JOIN b on a.a = b.b;
select a.*, b.* from a,b where a.a = b.b(+);
a | b
--+-----
1 | null
2 | null
3 | 3
4 | 4
Right outer join
A right outer join will give all rows in B, plus any common rows in A.
select * from a RIGHT OUTER JOIN b on a.a = b.b;
select a.*, b.* from a,b where a.a(+) = b.b;
a | b
-----+----
3 | 3
4 | 4
null | 5
null | 6
Full outer join
A full outer join will give you the union of A and B, i.e. all the rows in A and all the rows in B. If something in A doesn't have a corresponding datum in B, then the B portion is null, and vice versa.
select * from a FULL OUTER JOIN b on a.a = b.b;
a | b
-----+-----
1 | null
2 | null
3 | 3
4 | 4
null | 6
null | 5
Best Answer
From a performance standpoint, the difference is a lot more complicated than the Tim Hall tip that OMG Ponies linked to would imply. I believe that this tip is an introduction to a larger section that has been excerpted for the web-- I expect that Tim went on to make most if not all of these points in the book. Additionally, this entire discussion depends on the Oracle version you're using. I believe this is correct for 10.2, 11.1, and 11.2 but there are definitely differences if you start going back to older releases.
The particular example in the tip, first of all, is rather unrealistic. I've never seen anyone code a single-row fetch using an explicit cursor rather than a SELECT INTO. So the fact that SELECT INTO is more efficient is of very limited practical importance. If we're discussing loops, the performance we're interested in is how expensive it is to fetch many rows. And that's where the complexity starts to come in.
Oracle introduced the ability to do a BULK COLLECT of data from a cursor into a PL/SQL collection in 10.1. This is a much more efficient way to get data from the SQL engine to the PL/SQL collection because it allows you to minimize context shifts by fetching many rows at once. And subsequent operations on those collections are more efficient because your code can stay within the PL/SQL engine.
In order to take maximum advantage of the BULK COLLECT syntax, though, you generally have to use explicit cursors because that way you can populate a PL/SQL collection and then subsequently use the FORALL syntax to write the data back to the database (on the reasonable assumption that if you are fetching a bunch of data in a cursor, there is a strong probability that you are doing some sort of manipulation and saving the manipulated data somewhere). If you use an implicit cursor in a FOR loop, as OMG Ponies correctly points out, Oracle will be doing a BULK COLLECT behind the scenes to make the fetching of the data less expensive. But your code will be doing slower row-by-row inserts and updates because the data is not in a collection. Explicit cursors also offer the opportunity to set the LIMIT explicitly which can improve performance over the default of 100 for an implicit cursor in a FOR loop.
In general, assuming that you're on 10.2 or greater and that your code is fetching data and writing it back to the database,
Fastest
Slowest
On the other hand, using implicit cursors gets you quite a bit of the benefit of using bulk operations for very little of the upfront cost in refactoring old code or learning the new feature. If most of your PL/SQL development is done by developers whose primary language is something else or who don't necessarily keep up with new language features, FOR loops are going to be easier to understand and maintain than explicit cursor code that used all the new BULK COLLECT functionality. And when Oracle introduces new optimizations in the future, it's far more likely that the implicit cursor code would get the benefit automatically while the explicit code may require some manual rework.
Of course, by the time you're troubleshooting performance to the point where you really care about how much faster different variants of your looping code might be, you're often at the point where you would want to consider moving more logic into pure SQL and ditching the looping code entirely.