Re: statistics and density

It's all about trade-offs really. Maintaining statistics and processing them during optimisation can be a costly operation so it is a case of picking a model which given the constraints you have to work with will suffice. However it will still have its limitations. These limitations can be minimised in some areas.

The relevancy/usefulness of the range cell density may depend on:

The data
The number of steps in the histogram
Sampling rate
Method used to gather the stats (hash based or sort based)

We'll ignore the data bit. Yes that can be controlled but let's not go redesigning data models (much as they might need it!)

When you search for a single fixed search argument it will use the lower of the range cell density and the individual range cell weight (upper limit) if that value falls within a range cell.

The idea here is it picks the most selective of the the two. It's not foolproof in any way and the bigger the tables and the less uniform the distribution of the values are, the more chances you might have individual value skew within a range cell and large discrepancies between the various weights of each range cell.

You mimimise the chance of this by increasing the number of steps.

By specifying a higher number of steps you increase the chances of frequency cells for values that previously may have skewed weights within range cells. It follows that you will most likely have less range cells with more steps. Increasing histogram step count on large tables can have a dramatic effect on the range cell density (and rightly so).

Total density is the overall measure of the average run of duplicates and in versions prior to 15, it was used to cost joins.

In version15 and 16, it'll still get used (there may be others as well):

For joins when one side of the join does not have a histogram available (if there is a histogram on both sides, it'll merge the histograms)
Unknowns such as - select blah where column in (select column2 from table)
It'll also be used if you have compatibility_mode enabled.
It'll also be used to cost joins in any query with 7 or more tables during the alternative greedy costing (used to prime the search engine)

In an ideal world you have statistics that tell you the weight of every value in every column and every combination of every value of every column. It doesn't take long for the storage required to be in the terabyte range and for maintenance to take weeks (it takes long enough already!).

There are other ways of representing data but at this point, ASE is what it is.

Re: statistics and density

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List