Sparse Grid Basics

This page slowly walks you through an introduction of sparse grids. Apart from mathematical knowledge at a level similar to high school, no other prerequisites are required.

What Are Sparse Grids?

Sparse grids are a numerical discretization technique that can be used to accelerate the solution of a wide range of computational problems: interpolation, regression, classification, density estimation, quadrature, uncertainty quantification, partial differential equations, and so on.

Often, these problems are solved on full grids (also called “regular grids”). While this is a feasible approach if the dimensionality $d$ of the problem is low ($d = 2$, $d = 3$), full grids become very expensive in computational sense when used for higher dimensionalities $d > 4$. This is due to the curse of dimensionality, which states that the complexity of full grids grows exponentially with $d$.

If the dimensionality cannot be reduced by dimension-reducing methods (e.g., principal component analysis), then different approaches have to be used in order to solve these problems. This is where sparse grids come into play.

Sparse grids defeat the curse of dimensionality and allow the solution of the problems with much smaller effort, at the cost of slightly deteriorated errors.

Setting and Prerequisites

The basic setting is as follows: We have a scalar-valued function $\objfun$ which maps some input parameter $\*x$ to some output value $\objfun(\*x)$.

The requirements of $\objfun$ are as follows:

  • $\objfun$ is defined on the unit hyper-cube $\clint{0, 1}^d$. This means that $\objfun(\*x)$ can be computed for all $\*x \in \clint{0, 1}^d$. If every parameter $x_t$ independently “lives” on some interval $\clint{a_t, b_t}$, then the $x_t$ can be easily transformed to be in $\clint{0, 1}$.
  • $\objfun$ can be evaluated at arbitrary points in $\clint{0, 1}^d$, so $\objfun$ is not given by a “point cloud” of values at predetermined points.

Furthermore, usually we assume that $\objfun$ is a “computationally expensive” function, which means that evaluations $\objfun(\*x)$ might take a long time.

To study $\objfun$ and to solve problems involving $\objfun$ (e.g., function optimization), we want to replace $\objfun$ with another function $\surfun\colon \clint{0, 1}^d \to \real$ that approximates $\objfun$ well, but is much cheaper to evaluate. We could then discard the original function $\objfun$ and study $\surfun$ instead.

In our setting, we construct $\surfun$ by interpolation of $\objfun$: We evaluate $\objfun$ at a (hopefully small) number of points $\*x_k$ and use the values $\objfun(\*x_k)$ to define $\surfun$. These points $\*x_k$ will later be our sparse grid.

Full Grids in One Dimension

The most common method is to use full grids for the points $\*x_k$.

To explain full grids, let’s start with one dimension $d = 1$, i.e., our domain is the unit interval $\clint{0, 1}$. Full grids subdivide this interval with equidistant points into intervals of equal size. Choosing the number of intervals as a power $2^n$ of two will later make it easier for us.

Hence, our one-dimensional (full) grid is given by

$$\fgset{n} := \{\gp{n,i} \mid i = 0, \dotsc, 2^n\},\quad \gp{n,i} := i \gs{n},\quad \gs{n} := 2^{-n},$$

where $n \ge 0$ is the level of the grid, $\gp{n,i}$ are the grid points of index $i$, and $\gs{n}$ is the grid spacing. For example, the 1D grid of level two is given by $\fgset{2} = \{0, 0.25, 0.5, 0.75, 1\}$, since $\gs{2} = 0.25$.

With this grid, we now construct our interpolant function $\surfun$ as a linear combination of basis functions, one for each grid point:

$$\surfun(x) := \sum_{i=0}^{2^n} \coeff{n,i} \basis{n,i}(x),\quad \coeff{n,i} \in \real.$$

The simplest, reasonably good basis is given by piecewise linear functions. Basically, we evaluate the original function $\objfun$ at the grid points $\gp{n,i}$ and connect the resulting dots in the plot by drawing straight lines between them.

Formally, piecewise linear basis functions are defined as

$$\basis{n,i}(\*x) := \max(1 – |2^n x – i|, 0).$$

This means that $\basis{n,i}$ is zero everywhere except between the two neighbors $\gp{n,i-1}$ and $\gp{n,i+1}$, where it increases linearly from $0$ to $1$ on $\clint{\gp{n,i-1}, \gp{n,i}}$ and then decreases back again from $1$ to $0$ on $\clint{\gp{n,i}, \gp{n,i+1}}$. The shape of $\basis{n,i}$ is why these piecewise linear functions are also called hat functions.

Since every basis function $\basis{n,i}$ vanishes at all grid points but $\gp{n,i}$, the coefficients $\coeff{n,i}$ are simply equal to the value $\objfun(\gp{n,i})$ of the original function at the grid points. This is the reason why this basis is called a nodal basis.

Full Grids in Higher Dimensions

The one-dimensional grid is generalized to higher dimensions $d > 1$ by using the so-called tensor product approach, which uses Cartesian products to construct the higher-dimensional grids and tensor products to construct higher-dimensional functions.

The level of the grid is now a vector $\*n = (n_1, \dotsc, n_d)$ of 1D levels $n_t$ ($t = 1, \dotsc, d$). The full grid of level $\*n$ is the Cartesian product

$$\fgset{\*n} := \fgset{n_1} \times \dotsb \times \fgset{n_d}$$

of the one-dimensional grids $\fgset{n_t}$. This means that $\fgset{\*n}$ consists of all possible combinations of the 1D grid points. The $d$-dimensional grid points (the elements of $\fgset{\*n}$) are defined $\vgp{\*n,\*i} := (\gp{n_1,i_1}, \dotsc, \gp{n_d,i_d})$, where the index $\*i = (i_1, \dotsc, i_d)$ is now a vector, too.

As an example, we want to determine the 2D grid of level $\*n = (2, 1)$. The 1D grid of level $n_1 = 2$ is $\fgset{2} = \{0, 0.25, 0.5, 0.75, 1\}$ and the 1D grid of level $n_2 = 1$ is $\fgset{1} = \{0, 0.5, 1\}$. The 2D grid is now the Cartesian product with all $5 \cdot 3 = 15$ possible combinations:

$$\begin{align}
\fgset{(2,1)} = \fgset{2} \times \fgset{1} = \{&(0, 0), (0.25, 0), (0.5, 0), (0.75, 0), (1, 0),\\
&(0, 0.5), (0.25, 0.5), (0.5, 0.5), (0.75, 0.5), (1, 0.5),\\
&(0, 1), (0.25, 1), (0.5, 1), (0.75, 1), (1, 1)\}.
\end{align}$$

Again, each grid point $\vgp{\*n,\*i}$ is corresponding to one basis function $\basis{\*n,\*i}$. This multivariate basis function is constructed from the univariate functions $\basis{n_t,i_t}$ as a tensor product:

$$\basis{\*n,\*i}(\*x) := \basis{n_1,i_1}(x_1) \;\cdot\; \dotsm \;\cdot\; \basis{n_d,i_d}(x_d).$$

In other words, to evaluate $\basis{\*n,\*i}(\*x)$ at some point $\*x = (x_1, \dotsc, x_d) \in \clint{0, 1}^d$, we iterate over all dimensions $t = 1, \dotsc, d$, evaluate the 1D basis function of level $n_t$ and index $i_t$ at point $x_t$, and multiply the $d$ resulting values to obtain $\basis{\*n,\*i}(\*x)$.

The corresponding interpolant function is simply

$$\surfun(\*x) := \sum_{i_1=0}^{2^{n_1}} \dotsb \sum_{i_d=0}^{2^{n_d}} \coeff{\*n,\*i} \basis{\*n,\*i}(\*x),\quad \coeff{\*n,\*i} \in \real,$$

where $\coeff{\*n,\*i} = \objfun(\vgp{\*n,\*i})$ for the piecewise linear basis (called piecewise $d$-linear in higher dimensions).

The Limits of Full Grids

Obviously, the size of $\fgset{\*n}$ will rapidly grow in higher dimensions. This is because $\fgset{\*n}$ has $2^{n_1} \dotsm 2^{n_d}$ grid points. If $\*n$ is chosen as the same number $n$ in all dimensions, this equals $2^{nd}$ grid points, which grows exponentially in $d$ (curse of dimensionality).

This is a problem, since for interpolation we have to know the values $\objfun(\vgp{\*n,\*i})$ of the objective function at each grid point $\vgp{\*n,\*i}$, and evaluations of $\objfun$ are by assumption computationally expensive. Even if evaluations of $\objfun$ are cheap, the sheer number of grid points quickly exhausts all available memory even on large computers already for moderate dimensionalities $d$ and levels $\*n$.

We therefore try to reduce the size of the grid by removing “unimportant” basis functions and their corresponding grid points. Unfortunately, in the nodal basis as introduced above, all basis functions are equally important, because they all look the same.

Thus, it is necessary to perform a change of basis, so that the basis functions and their grid points have different levels of importance, but the same functions (continuous, piecewise linear functions) can be represented by the basis. It suffices to perform this change of basis for the 1D case, since the multivariate case will be treated by the same tensor product approach as used above.

Hierarchy in One Dimension

To this end, we note that the 1D grids are nested in the sense that $\fgset{n-1}$ is contained in $\fgset{n}$. For instance, the grid $\Omega_0 = \{0, 1\}$ of level zero is contained in the grid $\fgset{1} = \{0, 0.5, 1\}$ of level one, which is in turn contained in the grid $\fgset{2} = \{0, 0.25, 0.5, 0.75, 1\}$ of level two, and so on. For each level, only the grid points with odd index $i$ (e.g., $0.25$ and $0.75$ with index $1$ and $3$) are not already contained in the grid of the previous level.

This means that the grid $\fgset{n}$ actually decomposes into $n+1$ incremental grids $\hsset{\ell}$ ($\ell = 0, \dotsc, n$):

$$\fgset{n} = \bigdcup_{\ell=0}^n \,\hsset{\ell}\, = \,\hsset{0} \,\dcup\, \hsset{1} \,\dcup\, \dotsb \,\dcup\, \hsset{n},\quad \hsset{\ell}\, := \{\gp{l,i} \mid i \in \hiset{\ell}\},$$

where $\hiset{\ell}$ only contains the odd indices from $\fgset{\ell}$ (except for level zero as a special case):

$$\hiset{\ell} := \begin{cases}\{1, 3, 5, \dotsc, 2^\ell – 1\}&\text{for }\ell \ge 1,\\\{0, 1\}&\text{for }\ell = 0.\end{cases}$$

Here, the symbol “$\dcup$” denotes the disjoint union of sets. It is the same as “$\cup$” with the additional hint that the sets being joined are pairwise disjoint (they do not have points in common).

In short, this can be written as

$$\fgset{n} = \{\gp{\ell,i} \mid 0 \le \ell \le n,\, i \in \hiset{\ell}\}.$$

From a data structure point of view, this hierarchy of grid points with different can be represented by tree, which is binary up to $\ell = 1$. (For level zero, again a special case has to be employed.) The two children of a grid point $\gp{\ell,i}$ ($\ell \ge 1$) are given by $\gp{\ell+1,2i\pm 1}$.

This grid hierarchy can be applied to the basis functions as well, if we just take the corresponding basis function for each grid point:

$$\surfun(x) := \sum_{\ell=0}^n \sum_{i \in \hiset{\ell}} \surplus{\ell,i} \basis{\ell,i}(x),\quad \surplus{\ell,i} \in \real.$$

If specific conditions are fulfilled, then the involved functions $\{\basis{\ell,i} \mid 0 \le \ell \le n,\, i \in \hiset{\ell}\}$ are also a basis (called hierarchical basis) and span the same function space as the nodal basis. One can show that these conditions are fulfilled for the piecewise linear basis. Of course, the coefficients $\surplus{\ell,i}$ (usually called hierarchical surpluses) are different than the nodal coefficients $\coeff{n,i}$ from above, but the resulting function $\surfun$ is exactly the same.

Hierarchy in Higher Dimensions

The tensor product approach from the nodal case can also be applied to the hierarchical representation of the one-dimensional grid and basis.

Similarly to the 1D case, the $d$-dimensional full grid $\fgset{\*n}$ decomposes into $(n_1 + 1) \dotsm (n_d + 1)$ incremental grids $\hsset{\*\ell}$:

$$\fgset{\*n} = \bigdcup_{\ell_1=0}^{n_1} \dotsb \bigdcup_{\ell_d=0}^{n_d} \,\hsset{\*\ell},\quad \hsset{\*\ell}\, := \,\hsset{\ell_1}\, \times \dotsb \times \,\hsset{\ell_d}.$$

Again, the incremental grids only contain the grid points with odd indices, i.e., $\hsset{\*\ell}$ can also be formulated as follows:

$$\hsset{\*\ell}\, = \{\vgp{\*\ell,\*i} \mid \*i \in \hiset{\*\ell}\},\quad \hiset{\*\ell} := \hiset{\ell_1} \times \dotsb \times \hiset{\ell_d}.$$

As an example, the grid $\fgset{(2,1)}$ from above, i.e.,

$$\begin{align}
\fgset{(2,1)} = \{&(0, 0), (0.25, 0), (0.5, 0), (0.75, 0), (1, 0),\\
&(0, 0.5), (0.25, 0.5), (0.5, 0.5), (0.75, 0.5), (1, 0.5),\\
&(0, 1), (0.25, 1), (0.5, 1), (0.75, 1), (1, 1)\},
\end{align}$$

decomposes as follows:

$$\begin{align}
\fgset{(2,1)} &= \,\hsset{(0,0)}\, \dcup \,\hsset{(0,1)}\, \dcup \,\hsset{(1,0)}\, \dcup \,\hsset{(1,1)}\, \dcup \,\hsset{(2,0)}\, \dcup \,\hsset{(2,1)},\text{ where}\\
\hsset{(0,0)}\, &= \{(0, 0), (0, 1), (1, 0), (1, 1)\},\\
\hsset{(0,1)}\, &= \{(0, 0.5), (1, 0.5)\},\\
\hsset{(1,0)}\, &= \{(0.5, 0), (0.5, 1)\},\\
\hsset{(1,1)}\, &= \{(0.5, 0.5)\},\\
\hsset{(2,0)}\, &= \{(0.25, 0), (0.25, 1), (0.75, 0), (0.75, 1)\},\\
\hsset{(2,1)}\, &= \{(0.25, 0.5), (0.75, 0.5)\}.
\end{align}$$

Data-structure-wise, the binary tree from the 1D case becomes are directed acyclic graph (DAG) in the multivariate case as grid points have multiple direct ancestors; every grid point $\vgp{\*\ell,\*i}$ has $2d$ children, two for each dimension.

The interpolant function is given by

$$\surfun(\*x) := \sum_{\*\ell = \*0}^{\*n} \sum_{\*i \in \hiset{\*\ell}} \surplus{\*\ell,\*i} \basis{\*\ell,\*i}(\*x),\quad \surplus{\*\ell,\*i} \in \real,$$

where the first sum is shorthand for $\sum_{\ell_1=0}^{n_1} \dotsb \sum_{\ell_d=0}^{n_d}$. The hierarchical surpluses $\surplus{\*\ell,\*i}$ are again different from the coefficients $c_{\*\ell,\*i}$ for the nodal basis, but the resulting interpolant function $\surfun$ is the same.

Sparse Grids

However, this different representation now enables us to assess the contribution/importance of each basis function $\basis{\*\ell,\*i}$ for the resulting $\surfun$.

To this end, we consider again the piecewise linear basis in 1D. The key observation is that the size of the support (basically the set where the function is non-zero) of $\basis{\ell,i}$ is smaller in higher levels: The size of the support of $\basis{\ell,i}$ is $\gp{\ell,i+1} – \gp{\ell,i-1} = 2 \gs{\ell} = 2^{-\ell+1}$, i.e., it halves in size for each level. Hence, basis functions of a very high level (say $\ell \ge 10$) have a very small support, which means that these functions only have little influence on the resulting linear combination $\surfun$, no matter what their coefficient $\surplus{\ell,i}$ is.

Conversely, basis function of a low level $\ell$ have a large support. The three basis functions of level $\ell \le 1$ even have global support, which means that they influence $\surfun$ globally on the whole domain $\clint{0, 1}$.

In $d$ dimensions, the area of the support of $\basis{\*\ell,\*i}$ is

$$(2\gs{\ell_1}) \dotsm (2\gs{\ell_d}) = 2^{-\normone{\ell} + d},\quad \normone{\ell} := \ell_1 + \dotsb + \ell_d,$$

where $\normone{\ell}$ is the $1$-norm of $\ell$, i.e., the area decreases with increasing $\normone{\ell}$.

Thus, it may be supposed that functions with high $\normone{\ell}$ only contribute little to the solution and can thus be omitted. This is exactly what sparse grids do: For the definition of the interpolant function, we only use the levels $\*l$ whose $1$-norm doesn’t exceed a specific threshold $n \ge 0$:

$$\surfun(\*x) := \sum_{\normone{\*\ell} \le n} \sum_{\*i \in \hiset{\*\ell}} \surplus{\*\ell,\*i} \basis{\*\ell,\*i}(\*x),\quad \surplus{\*\ell,\*i} \in \real.$$

The corresponding grid points form the regular sparse grid $\sgset{n}{d}$ of level $n$:

$$\sgset{n}{d} := \{\gp{\*\ell,\*i} \mid \normone{\*\ell} \le n,\, \*i \in \hiset{\*\ell}\}.$$

Comparison to Full Grids

For the piecewise linear basis, homogeneous boundary conditions, the full grid of level $n$ (same level in all dimensions), and the corresponding regular sparse grid of level $n$, the following estimates hold for the number of grid points and the $\Ltwo$ interpolation error $\normLtwo{\objfun – \surfun}$:

Number of grid points$\Ltwo$ interpolation error
Full grid$\landauO{2^{nd}}$$\landauO{2^{-2n}}$
Sparse grid$\landauO{2^n n^{d-1}}$$\landauO{2^{-2n} n^{d-1}}$

This means that while the number of grid points is massively reduced as the curse of dimensionality (exponential dependency on $d$) is removed, the $\Ltwo$ interpolation error is only slightly increased by a factor of $n^{d-1}$.

Compared to full grids, sparse grids enable the construction of function interpolants with much less (expensive) grid points, at the expense of a slightly increased error.

Continue reading on the next page to learn about adaptive sparse grids.