Interpolating scattered data with RBFs

What is the interpolation problem?

We want to approximate a real valued function $ f(\vec{x})$by $ s(\vec{x})$given the set of values $ \vec{f}=(f_1,\dots,f_N)$at the distinct points $ X=\{\vec{x}_1,\dots,\vec{x}_N\}\subset\ensuremath{\mathbb{R}}^d$.

What is an RBF?

We choose $ s(\vec{x})$to be a Radial Basis Function of the form
(1)

where $ p$ is a polynomial of degree at most $ k$$ \lambda_i$ is a real-valued weight, $ \vert\cdot\vert$ denotes the Euclidean norm, $ \phi$ is a basic function$ \phi:\ensuremath{\mathbb{R}}^+\to\ensuremath{\mathbb{R}}$, and$ \vert\vec{x}- \vec{x}_i\vert$ is simply a distance -- how far $ \vec{x}$ is from the point $ \vec{x}_i$.

  • An RBF is a weighted sum of translations of a radially symmetric basic function augmented by a polynomial term.

What is a basic function?

The basic function $ \phi$, in this context, is a real function of a positive real $ r$, where $ r$ is the distance (radius) from the origin. Popular choices for $ \phi$ include

  • The thin-plate spline $ \phi(r)=r^2\log(r)$ (for fitting smooth functions of two variables)
  • The Gaussian  $ \phi(r)=\exp(-cr^2)$ (mainly for neural networks)
  • The multiquadric $ \phi(r)=\sqrt{r^2+c^2}$ (for various applications, in particular fitting to topographical data)

For fitting functions of three variables, good choices include

  • The biharmonic ($ \phi(r)=r$) spline + linear polynomial
  • The triharmonic ($ \phi(r)=r^3$) spline + quadratic polynomial

These polyharmonic splines (which include the thin-plate spline) minimise certain energy semi-norms and are therefore the ``smoothest'' interpolators. Note that the associated basic functions are not compactly supported - they grow as r  increases from the origin.

Why are RBFs suited to scattered data interpolation?

RBFs are popular for interpolating scattered data as the associated system of linear equations is guaranteed to be invertible under very mild conditions on the locations of the data points. For example, the thin-plate spline only requires that the points are not co-linear while the Gaussian and multiquadric place no restrictions on the locations of the points. In particular, RBFs do not require that the data lie on any sort of regular grid.

How do we find an RBF interpolant?

The RBF interpolant  $ s(\vec{x})$ is defined by the coefficients of the polynomial $ p$  and the weights $ \lambda_i$.
Given the interpolation values $ \vec{f}=(f_1,\dots,f_N)$, we seek the weights$ \lambda_i$ so that the RBF satisfies

$\displaystyle s(\vec{x}_i) = f_i \qquad i=1,\dots,N.$ (2)

Because this gives an under-determined system, i.e., there are more parameters than data, the orthogonality or side conditions

$\displaystyle \sum_{j=1}^{N} \lambda_j q(\vec{x}_j) = 0$ for all polynomials $ p$of degree at most $ k$ . (3)

are further imposed on the coefficients $ \vec{\lambda}=(\lambda_1,\dots,\lambda_N)$.
Let $ \{p_1,\dots,p_\ell\}$be a basis for polynomials of degree at most $ k$and let $ \vec{c}=(c_1,\dots,c_\ell)$be the coefficients that give $ p$in terms of this basis. Then Equation (2) and (3) may be written in matrix form as

$\displaystyle \begin{pmatrix}A & P P{{}^{\mathsf{T}}}& 0 \end{pmatrix} \beg......c{\lambda} \vec{c}\end{pmatrix} = \begin{pmatrix}\vec{f} 0 \end{pmatrix},$ (4)

where

$\displaystyle A_{i,j}$ $\displaystyle = \phi(\vert\vec{x}_i - \vec{x_j}\vert),$ $\displaystyle i,j$ $\displaystyle = 1,\dots, N,$  
$\displaystyle P_{i,j}$ $\displaystyle = p_j(\vec{x}_i),$ $\displaystyle i$ $\displaystyle = 1,\dots, N, \quad j = 1,\dots, \ell.$  

Solving the linear system (4) determines c and $ \lambda_i$, hence$ s(\vec{x})$.

Why do I need ARANZ's FastRBFTM engine?

Basic functions, such as the polyharmonic splines, that have desirable approximation properties, tend to have non-compact support and grow arbitrarily large. This means that the matrix $ A$ in Equation (4) is not sparse and, except for symmetry, has no structure that can be exploited for solving the system. This means direct solution of Equation (4) requires $ \ensuremath{\mathcal{O}}(N^3)$ operations and  $ \ensuremath{\mathcal{O}}(N^2)$ storage. Moreover, naïve evaluation of Equation (1) at $ M$ points requires  $ \ensuremath{\mathcal{O}}(MN)$ operations. This has led many authors to conclude that RBFs are suitable for small problems with up to a few thousand points.  However, ARANZ's FastRBFTM solvers are based on new mathematical algorithms that reduce the fitting task down to just $ \ensuremath{\mathcal{O}}(N\log(N))$ operations and $ \ensuremath{\mathcal{O}}(N)$ storage, and the evaluation task down to just $ \ensuremath{\mathcal{O}}(N\log(N))$ setup then $ \ensuremath{\mathcal{O}}(M)$ operations. Furthermore, these fast methods run on standard desktop hardware, even when the number of data points exceeds one million. Previously, even storage of the system (4) has been impossible.

It should also be noted that with the naïve approach, the matrix in Equation (4) typically has poor conditioning. This means that substantial errors will easily creep into any standard numerical solution.
 

Iterative fitting of an RBF to the Buddha sculpture
Iterations from a `greedy' fit of an RBF to laser surface scan data

Greedy fitting - How does data compression come about?

The equations which define an RBF interpolant (Eq. 4) imply that the number of terms in the RBF equals the number of data points.
FastRBFTM provides a reduction option which uses a greedy algorithm to iteratively add data points to the RBF so that the data may be modelled with fewer terms to the same desired accuracy. This process is graphically illustrated above and in the associated MPEG (1.2MB) where an RBF is iteratively fitted to laser scan data from the smiling Buddha figurine. The RBF surface modelling process is described in the surfacing FAQ. The figure illustrates how the zero-valued iso-surface of the fitted RBF more closely approximates the object as more terms are added. The original data, consisting of 543,000 vertices, was modelled by 82,000 terms to within 1e-4 accuracy - a significant reduction in data.
  • The RBF is guaranteed to pass through ALL the data points to within the user-specified precision.

What is a fitting tolerance and why specify one?

FastRBFTM allows users to specify a fitting tolerance and solve the corresponding inexact interpolation problem,

$\displaystyle \vert f_i - s(\vec{x}_i) \vert \leq \epsilon, \qquad i =1, \ldots, N.$ (5)

where $\epsilon_i is the tolerance at each data point. The RBF fitted is guaranteed to be within the specified fitting tolerance. Specifying an acceptable tolerance results in much faster fitting times, even when centre reduction (greedy fitting) is not specified. A tolerance also gives some robustness to low level noise on the data. Better methods for optimally fitting to noisy data are discussed in the smoothing and approximation FAQ.

The figure below illustrates the affect of the specifying a fitting tolerance. The first image is an exact FastRBFTM fit to the data - there is no difference between the RBF surface and the original surface data. In the second image a 1mm tolerance has been specified, resulting in some deviation from the raw data, but within the 1mm tolerance. In the final image a greedy fit has been employed to reduce the number of centres required to represent the surface to within 1mm. The number of centres has reduced from 6400 to 1800. The greedy fit has utilised the maximum 1mm deviation specified by the user at several of the original mesh vertices, this deviation is exaggerated by the colour scale which indicates the 1mm extremes by the blue and red colours. The head is 225mm x 150mm and the 1mm deviation is not noticeable at this scale and consequently suitable for many graphics applications.

RBF fitting strategies
FastRBF fitting strategies with deviation from original data colour coded. (a) Exact fit to mesh vertices (6400 centres) (b) Fit to all mesh vertices to an accuracy of 1mm (6400 centres) (c) Greedy fit to an accuracy within 1mm (1800 centres).

Links