The following is the simplified explanation while keeping the details private - but thought that others might learn a bit from it.
The problematic code wasn't just a single statement - it was a SQL batch that resembled:
declare @var1, @var2, @var3
select @var1=literal1, @var2=literal2, @var3=literal3
select into #T1 from A, B, C.....
create index TI1 on #T1(C1)
select into #T2 from #T1, D where....
create index TI2 on #T1(C1)
select (multiple nested subqueries) from #T1, E (I1) where #T1.C1=E.C1 and #T1.C2 is null
Note: Table E had a clustered unique index on C1 called I1 - which was forced in the SQL
Now, this is where things get a bit fun....the real problem was else where - a colleague and I had picked up on the extremely huge LIO's and scan counts and were looking elsewhere for the problem (before I even got the code). However, here are some interesting aspects to consider:
SQL Batch optimization (no statement cache)
- entire batch is compiled and optimized before execution
- @vars in queries being optimized are substituted with ASE default values for the datatype - e.g. Jan 1 1900 for datetime, and 0 for int, etc.
- #temp tables are optimized as 100 rows on 10 pages
- indexes created inside the batch are not seen at optimization time
SQL Batch optimization (w statement cache)
- selects (and any other statement cache eligible SQL) are optimized at runtime (the first exec) with the @vars using the actual literal values and #temp tables with known physical attributes
- #temp tables have known actual row counts and page counts
- #temp tables have auto stats (I think this was added early in 15.7 or late 15.5 - forget which) based on 17% (1/6th) sampling for all columns in #temp table
Soooo....what happened was 2 different QP's. Because of the large number of subqueries, it was difficult to spot unless looking for it ...and I was. In this case, without statement cache enabled, due to one of the subqueries having a single row result with a LIKE clause based on columns in E, the optimizer chose to do the following:
SQ1 --> (#T1 --> E)
Because of the 100 rows/10 pages, it table scanned the #T1 (as one would assume) and then did a NLJoin to E using the unique index - then it finished by doing the lateral join to SQ1 using the filter expression.....and everything was fine - except one big area we will talk about later.
Now with the statement cache enabled, the optimizer saw that #T1 was 80000+ rows on nearly 6000 pages....and table scanning that much is often not a good idea when an alternative exists. In this case it looked at the SQ1 --> E leg based on the lateral join with the LIKE expression - and that there was a unique index on the E columns involved in the LIKE expression (e.g. the expression was similar to F.COL LIKE %+E.C3+E.C4+% - and there was a unique index on E for {C3,C4}- and then noted that #T1 had an index on the JOIN key C1. Just from thinking about it - the optimizer knew that F.COL was returning a single row - and that likely only some small number of rows would match from E due to the unique index....and then it could join on the index back to #T1, so it computed that it was less expensive to join:
SQ1 --> (E --> #T1)
Than the above - despite the fact the index force on C1 in E when C1 was not involved in the LIKE expression.
However, this was not the big problem. The big problem was that two of the subqueries did a self join of #T1 to itself on the non-indexed column. This is where the explosion of LIO and scan counts were coming from.
The fix was simple:
Remove the index on #T1.C1 - this makes the join (E --> #T1) expensive
Add an index on #T1.C7 (the self join column)
Remove the index force on E
Result was a predictable significant reduction in execution time. Theoretically, we could have left the index on #T1.C1 to vet the join order, but in order for that lateral join to work, it would have to do an index scan - and given the size of E, it was unlikely to be faster. BTW, the 17% stats sampling also said the #T1.C@ is null expression was 90% of the table, so adding an index there would not likely benefit either.