[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Chapter summary:
vpdl
is a library of probability distributions.This library contains data structures to represent univariate and multivariate probability distributions and provides algorithms to operate on them. This includes estimating a distribution from data points and sampling from a distribution.
vpdl
is built on top ofvnl
andvnl_algo
.
vpdl
is comprised of two programming paradigms:
generic programming and polymorphic inheritance.
The generic programming part is in its own sub-library called vpdt
.
vpdt
is a template library (like STL).
There is no compiled library for vpdt
,
only a collection of header files in subdirectory vpdl/vpdt
.
vpdt
works with vnl
types,
but in many cases can generalize to other types.
The goal of vpdt
is to provide generic implementations that are both time and
memory efficient when types are known at compile time.
The rest of vpdl
uses a polymorphic design and provides greater run time
flexibility and easy of use. It is limited to distributions over scalar,
vnl_vector
, and vnl_vector_fixed
types.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
vpdl
is the combination of two different design patterns
for probability distributions.
It was formed from the merger of three contrib libraries:
mul/vpdfl
, mul/pdf1d
, and brl/bbas/bsta
.
Created by Manchester, vpdfl
provided a polymorphic hierarchy
(using virtual functions) for multivariate distributions based on
vnl_vector
and vnl_matrix types
.
For univariate distributions, pdf1d
mirrored the design of vpdfl
,
but used scalar types (i.e. double
).
These libraries were very flexible at run time.
Both distribution type and, in the case of vpdfl
,
dimension could be selected at run time.
Created by Brown, bsta
provided a generic programming hierarchy
(using templates) for both univariate and multivariate distributions.
Template parameters specified scalar type (float
or double
)
and dimension.
Templates allowed the same code base to use scalars in the univariate case and
vnl_vector_fixed
and vnl_matrix_fixed
in the multivariate case.
The goal of bsta
was to be very efficient.
Many optimizations are possible by assuming types and
dimension are known at compile time.
vpdl
was designed as a core library
to meet the needs of both original designs.
The main library generalizes Manchester's polymorphic design with
a templated distribution library.
This creates polymorphic inheritance hierarchy
for each choice to template parameters.
The templates allow hierarchies based on scalar,
vnl_vector
, and vnl_vector_fixed
types.
When possible, the implementations of vpdl
distributions are wrappers
around corresponding vpdt
distributions.
The vpdt
distributions provide even more generalized implementations but
without virtual functions or inheritance.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Each polymorphic distribution is derived (directly or indirectly) from
a common templated base class called vpdl_distribution<T,n>
.
Template parameter T
specifies the scalar numeric type
(float
or double
) and n
specifies the dimension.
In particular:
n==0
uses vnl_vector<T>
and vnl_matrix<T>
(dimension specified at run time)
n==1
uses T
(scalar computations)
n>1
uses vnl_vector_fixed<T,n>
and
vnl_matrix_fixed<T,n,n>
(fixed dimension of n
)
The distribution class is an abstract base class. The most important functions it defines are:
virtual T density (const vector &pt) const=0;
Evaluate the unnormalized density at a point.
virtual T norm_const () const=0;
The normalization constant for the density.
virtual T prob_density (const vector &pt) const;
Evaluate the probability density at a point.
Defaults to density(pt) * norm_const()
.
These functions allow the density of the distribution to be evaluated
independently from the normalization constant.
This is important because the normalization constant is sometimes
expensive to compute and not required for some calculations.
Furthermore, the normalization constant is independent of the evaluation point
so it can be computed once in advance and reused.
Alternatively, the function prob_density
returns the normalized density directly.
The type vector
is a typedef defined in the class that selects the
appropriate vector class depending on T
and n
.
Similarly, the type matrix
is defined to the appropriate matrix type.
Several other functions are defined on the distribution base class including:
gradient_density
, cumulative_prob
, compute_mean
,
and compute_covar
.
All of these functions provide a polymorphic interface to probability distributions.
However, in most cases, the vpdl
classes are simply light wrappers around
generic vpdt
implementations that do the real work.
One additional component of the polymorphic wrappers is a class hierarchy.
In vpdt
, the distribution classes need not have any inheritance relations,
but in vpdl
the currently implemented distributions are arranged into
the following class hierarchy. Each top level class below is derived directly
from vpdl_distribution
(template arguments are omitted).
vpdl_gaussian_base
- a base class for all Gaussian varieties.
vpdl_gaussian
- a general Gaussian with full covariance.
vpdl_gaussian_indep
- a Gaussian with independent (i.e. diagonal) covariance.
vpdl_gaussian_sphere
- a Gaussian with spherical (i.e. scaled identity) covariance.
vpdl_muli_cmp_dist
- a base class for all multi-component distributions
vpdl_kernel_base
- a base class for kernel density distributions.
vpdl_kernel_fbw_base
- a base class for fixed bandwidth kernel densities
vpdl_kernel_gaussian_sfbw
- a fixed bandwidth spherical Gaussian kernel distribution
vpdl_kernel_vbw_base
- a base class for variable bandwidth kernel densities
vpdl_mixture
- a mixture distribution (linear combination of arbitrary distributions).
vpdl_mixture_of
- a mixture distribution where each component has the same type
(but may have different parameters).
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
vpdt
) The subdirectory vpdl/vpdt
contains the
generic programming components of this library.
vpdt
is a self-sufficient template library
that does not depended on the rest of vpdl
.
However, there is nothing to stop vpdl
distribution classes from being
used as vpdt
distribution types as long as they meet the requirements.
Whenever possible, vpdl
distributions should be wrappers around
a generic vpdt
implementation.
This strategy allows the code to be used both in generic and polymorphic designs.
Generic programming uses concepts
to implement compile time type polymorphism rather than inheritance
to implement run time object-oriented polymorphism.
Concepts are sets of requirements that a data type must meet.
To satisfy a concept a class must define the required typedefs,
static constants, member functions, or related global functions.
Classes that satisfy a concept need not be related by inheritance.
The Standard Template Library (STL) is built upon the definition of concepts.
In STL, some concepts (like Iterators) are general enough
to include built-in types (like pointers).
In vpdt
, there are also some basic concepts
which can be satisfied by built-in types.
However, the most useful concepts (like Distributions)
are a bit more specific and require a class implementation.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A collection of more basic concepts will form the building blocks of the probability distribution concept. The relevant concepts are:
vnl
data structures satisfiy these concepts and are used in practice.
However, other representations could be used
if the appropriate functions are provided.
These concepts are all interrelated by various function requirements.
The type that meets the Field concept is considered the primary type.
For each Field type there should be a unique associated
Scalar, Vector, and Matrix type
The vpdt_field_traits<F>
struct defines these types for each F
.
vpdt_field_traits<F>
needs to be specialized for each Field type.
This traits struct defines the following typedefs:
vpdt_field_traits<F>::dimension
This is actually a static const unsigned int
and indicates the
fixed dimension of the Field type
vpdt_field_traits<F>::scalar_type
The Scalar type associated with the Field type
vpdt_field_traits<F>::field_type
The Field type itself (F==field_type
).
Redundant, but provided for consistency.
vpdt_field_traits<F>::vector_type
The Vector type associated with the Field type
vpdt_field_traits<F>::matrix_type
The Matrix type associated with the Field type
In the special case of univariate distributions,
all four of these types become equal, and dimension==1
.
There is also a special typedef vpdt_field_traits<F>::type_is_scalar
defined to void
that only exists if F
is scalar.
Similarly, vpdt_field_traits<F>::type_is_vector
is defined to
void
only if F
is non-scalar.
The existence of these special types is used to disambiguate some
template specializations, and to provide faster implementations
for the scalar case.
A probability distribution is defined over some scalar field.
The Scalar concept represents this value at each point in space.
It is satisfied by floating point types.
For now this is likely to be limited to double
or float
.
Scalars must support all built in arithmetic operators (+, -, *, /, etc.)
as well as all standard cmath functions like
sqrt
, exp
, and log
.
If the type does not support these functions directly,
it must be able to implicitly cast itself to and from a type that does.
Each point in a probability space is represented by the Field concept.
When used in vpdl
distributions the Field is satisfied by either
a scalar, a vnl_vector
, or a vnl_vector_fixed
.
The Field has an associated Scalar and dimension.
For dimension n
, a Field contains n
values of its Scalar type.
A Field actually has two measure of dimension:
a fixed (compile time) dimension and a variable (run time) dimension.
These dimensions may or may not be equal.
Field types with data allocated on the stack
will have size specified at compile time.
The fixed dimension and variable dimension will both equal this fixed size.
Field types with data allocated on the heap
will have a fixed dimension of zero and a variable dimension equal to
the number of currently allocated Scalar objects.
A Field type will be equal to its Scalar type
in the case of univariate distributions.
Since Scalar types are usually built-in C++ types,
they can not be required to implement any member functions.
Instead, they are required to implement a set of overloaded global functions
to perform the required set of actions.
For the standard types used in vpdl
,
these functions are implemented in vpdl/vpdt/vpdt_access.h
.
For some types that satisfy the concept these functions are trivial or even empty.
In the table below the Field type is denoted field
and
its corresponding Scalar type is scalar
.
unsigned int vpdt_size(const field&)
Access the run time size (i.e. dimension)
void vpdt_set_size(field&, unsigned int)
Change the run time size for heap-based structures, otherwise do nothing.
void vpdt_fill(field&, const scalar&)
Fill all dimensions of the field with the scalar value
const scalar& vpdt_index(const field&, unsigned int)
Access a scalar element reference (const) by index
scalar& vpdt_index(field&, unsigned int)
Access a scalar element reference (non-const) by index
The Field type should also support the subtraction operator on two Field types. The return type for the difference of two Field types is a Vector type.
The Vector concept represents the difference between two Field values.
In the case of the default vnl
types,
the Field and Vector types are the same.
The requirements for the Vector are a superset of
the requirements for a Field.
A Vector should support all the operation of Field
plus several other arithmetic operations:
vector = vector + vector
(and equivalent for -
)
vector += vector
(and equivalent for -=
)
field = field + vector
(and equivalent for -
)
field += vector
(and equivalent for -=
)
vector = scalar + vector
(and equivalent for -
)
vector += scalar
(and equivalent for -=
)
vector = scalar * vector
vector *= scalar
Vectors should also define these global functions
scalar dot_product(const vector&, const vector&)
vector element_product(const vector&, const vector&)
matrix outer_product(const vector&, const vector&)
The Matrix concept actually refers to a square matrix.
It is used for second order statistics (like covariance).
Since it is square, the matrix also has a single (fixed or variable) dimension
like the Field and Vector.
It requires all the same vpdt_access.h
functions except
the signature of the vpdt_index
function now has two indices
const scalar& vpdt_index(const matrix&, unsigned int, unsigned int)
scalar& vpdt_index(matrix&, unsigned int, unsigned int)
A few arithmetic operations are also needed:
matrix = matrix + matrix
(and equivalent for -
)
matrix += matrix
(and equivalent for -=
)
matrix = scalar + matrix
(and equivalent for -
)
matrix += scalar
(and equivalent for -=
)
matrix = scalar * matrix
matrix *= scalar
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The Distribution concept is the central concept in vpdt
.
The Distribution is a probability distribution class that is analogous
to vpdl_distribution<T,n>
.
In fact, vpdl_distribution<T,n>
and all its subclasses satisfy
the Distribution concept.
The Distribution requires a subset of the functions that are
declared virtual on vpdl_distribution<T,n>
.
The following example lays out the required API that
a Distribution is required to have.
In this example, it is assumed that a consistent set of basic types
have been defined such that T
satisfies Scalar,
F
satisfies Field, V
satisfies Vector,
and M
satisfies Matrix.
class vpdt_distribution { public: //: The field type typedef F field_type; //: Return the variable (run time) dimension unsigned int dimension() const; //: Evaluate the unnormalized density at a point // This must be multiplied by norm_const() to integrate to 1 T density(const F& pt) const; //: Compute the gradient of the density function, returned in \a g // The return value of the function is the density itself T gradient_density(const F& pt, V& g) const; //: Compute the normalization constant (independent of sample point). T norm_const() const; //: Evaluate the cumulative distribution function at a point // This is the integral of the density function from negative infinity // (in all dimensions) to the point in question T cumulative_prob(const F& pt) const; //: Compute the mean of the distribution. void compute_mean(F& m) const; //: Compute the covariance matrix of the distribution. void compute_covar(M& c) const; }; |
The other member functions that are defined in vpdl_distribution<T,n>
are not needed for a vpdt
Distribution because they can be
implemented in terms of the required functions.
These extra functions are provided as global functions instead.
For example, T vpdt_prob_density(const dist& d, const F& pt)
is a function
that returns d.density(pt) * d.norm_const()
.
These functions can be overloaded when a more efficient implementation
exists for a particular distribution class.
For example, the default vpdt_log_density
function applied to a
Gaussian distribution would compute the log of an exponential.
An overload of vpdt_log_density
for Gaussians can eliminate this
extra unnecessary step.
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated by Rumpelstiltskin on November, 22 2009 using texi2html 1.78.