Re: update Index Stats using Sampling

You definitely should think of moving off sp122 quicker.....not a lot of changes in sp135 from where you are currently. WRT sampling and stats think of it this way. <start simplified explanation> Normally, without sampling, we scan the table (or index if updating stats only on a single index) and record the distinct values along with the frequency of occurrence. When finished, we construct the (default) 20 step values and compute the cell histograms. For example, let's say we have a table with 100 rows with an identity column with values 1-100. When finished, the stats would have 20 histogram cells and we would have a weight of 5/100's for each step (0.05). With stats sampling at 10%, we start scanning the table and we read every 10th row. For that row, we record in a work table the values for the index keys...so we would read rows 10, 20, 30, 40, 50, 60, 70, 80, 90 & 100. In our case, we have gotten lucky as the max value (100) was detected during the sampling....but we might not be able to create 20 steps as we only have 10 values, so we only create 10 histograms with 10/100 (0.10) for a weighting for each step. Now, let's increase to 30% sampling (or let's say 33% for fun to make things easier). Now the values we read are 3, 6, 9, 12, 15, 18, ....,90, 93, 96, 99. We will have 30 distinct values from which we can construct 20 steps and the weight of each histogram would be ~7/100. However, we do have more steps, consequently, queries with IN() or OR might do better as the finer granularity is less likely to aggregate to above 40% when a table scan is chosen - for example, where col in (15,25,35,45,55) might result in a tablescan with 10% sampling due to 5x costing each SARG (with aggregation due to OR logic) vs. with 30% sampling, it might not. However a query with col=100 will hit the out of range histogram (if enabled) on 30% sampling, whereas it hits a cell on 10% sampling. In that case we are lucky as we got the last row with 10% sampling vs. say if the values were more interdispersed. So, I would say that generally 30% sampling gives *better* statistics but there may be edge cases where it is worse. I think the lower the sampling percentage, the bigger the issue would be with low(er) cardinality columns - particularly where distinct(*) on column/sampling % is < histogram steps requested. For really high cardinality columns such as names, it likely makes minimal difference. In either case, if you have issues, it may be as much the number of histogram cells (update index stats with n values clause) vs. the sampling %...... One can't say what the exact impact would be without seeing the actual stats that result as well as query predicates.

One of the diffs is that in scanning the table, we end up creating a worktable that has to be sorted for each column and then frequency cells/histogram steps derived. This is where tempdb (and proc cache) get hit the hardest - and a lot of the slowness comes from as worktables oft end up flushed to disk and PIO to re-read when sorting/aggregating stats. With hash stats, we use an in-memory hash table for the values and hence no tempdb nor sorting hence a lot less PIO involved on the tempdb side - which is likely to give you better stats as it will read entire table, but I have found runs 5x+ faster, which will give you the target speed you are were after with sampling.

Re: update Index Stats using Sampling

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List