柱状图的作用和收集的方法我在 Block Selectivity and Row Selectivity 这篇文章里面提到过一些。
如果使用了一般的统计信息,比如
exec dbms_stats.gather_table_stats (‘user’,’tab’) ;
这时并没有对indexed columns进行数据逻辑分布的柱状图分析,可以通过user_tab_col_statistics的num_buckes列来验证,这时该列为1,也就是说没有柱状图。 如果在这个列上面值很多的话,可能也只存储最大最小值,然后根据该列上值的distinct多少来取平均。比如下面的这个例子
id value Row selectivity (pr) Block selectivity (pb)
01 89.87% 100%
02 2.34% 7%
03 2.16% 6%
04 1.86% 2%
05 1.42% 1%
06 1.17% 1%
07 0.75% 1%
08 0.18% < 1%
09 0.14% < 1%
10 0.11% < 1%
TOTAL 100.00% N/A
如果没有histogram的话,那么select x=1的时候,优化器肯定认为这个1占用行的百分比10%,很有可能使用index,但是实际情形下这样效率就很低下了。
但是在使用histogram的时候可能存在问题:
It is a long story, but Oracle histogram-based optimization has a history of painful design tradeoffs. Prior to Oracle9i, the use of bind variables prohibits histogram optimization. This is a pity because, in general, an application that does not use bind variables cannot scale to large user counts. In Oracle9i, the optimizer does almost what we want: it “peeks” at the content bound to a bind variable and makes a valid histogram-based decision. However, on subsequent executions of a shared SQL statement after the first, each execution gets the plan produced by the session’s first optimization.
So if the first execution of the query uses :a1=’01’, then the second (and every other) execution will use the optimal plan for :a1=’01’, even if the subsequent execution searches for :a1=’07’.
也就是说如果使用绑定变量,那么以后的执行,无论predicates是什么,都使用第一次的执行计划,这样是相当糟糕的情况。Biti_rainy 有一篇“一个SQL导致的数据库整体性能下降的诊断及解决”上面遇到的就是这个问题。
The good news is that it is not really necessary to use bind variables for where-clause predicates that don’t have a large value domain. For example, using literal values in a sex=’m’ predicate is okay, because it makes only up to two copies of an otherwise sharable SQL statement in the library cache. However, using literal values in something like order_id=’1289942’ would be catastrophic, because it would run potentially thousands of nearly identical but nonetheless distinct SQL statements through the library cache.
在不使用绑定变量的OTAP系统里面,histogram就会非常有效了。But even in a transactional system — not *everything* is bound, you still have many literals in many queries and these will use the histograms once again upon the first hard parse to find “the best overall general plan” for that query.
