Faster, More Accurate Fits

Priming Fits

Good fits often require fit functions with several exponentials and many parameters. Such fits can be costly. One strategy that can speed things up is to use fits with fewer terms to generate estimates for the most important parameters. These estimates are then used as starting values for the full fit. The smaller fit is usually faster, because it has fewer parameters, but the fit is not adequate (because there are too few parameters). Fitting the full fit function is usually faster given reasonable starting estimates, from the smaller fit, for the most important parameters. Continuing with the example from the previous section, the code

data = make_data('mcfile')
fitter = cf.CorrFitter(models=make_models())
p0 = None
for N in [1,2,3,4,5,6,7,8]:
    prior = make_prior(N)
    fit = fitter.lsqfit(data=data, prior=prior, p0=p0)
    print_results(fit, prior, data)
    p0 = fit.pmean

does fits using fit functions with N=1...8 terms. Parameter mean-values fit.pmean from the fit with N exponentials are used as starting values p0 for the fit with N+1 exponentials, hopefully reducing the time required to find the best fit for N+1.

Postive Parameters

Priors used in corrfitter.CorrFitter assign an a priori Gaussian/normal distribution to each parameter. It is possible instead to assign a log-normal distribution, which forces the corresponding parameter to be positive. Consider, for example, energy parameters labeled by 'dE' in the definition of a model (e.g., Corr2(dE='dE',...)). To assign log-normal distributions to these parameters, include their logarithms in the prior and label the logarithms with 'log(dE)': for example, in make_prior(N) use

prior['log(dE)'] = gv.log(gv.gvar(N * ['0.25(25)']))

instead of prior['dE'] = gv.gvar(N * ['0.25(25)']). The fitter then uses the logarithms as the fit parameters. The original 'dE' parameters are recovered (automatically) inside the fit function from exponentials of the 'log(dE)' fit parameters.

Using log-normal distributions where possible can significantly improve the stability of a fit. This is because otherwise the fit function typically has many symmetries that lead to large numbers of equivalent but different best fits. For example, the fit functions Gaa(t,N) and Gab(t,N) above are unchanged by exchanging a[i], b[i] and E[i] with a[j], b[j] and E[j] for any i and j. We can remove this degeneracy by using a log-normal distribution for the dE[i]s since this guarantees that all dE[i]s are positive, and therefore that E[0],E[1],E[2]... are ordered (in decreasing order of importance to the fit at large t).

Another symmetry of Gaa and Gab, which leaves both fit functions unchanged, is replacing a[i],b[i] by -a[i],-b[i]. Yet another is to add a new term to the fit functions with a[k],b[k],dE[k] where a[k]=0 and the other two have arbitrary values. Both of these symmetries can be removed by using a log-normal distribution for the a[i] priors, thereby forcing all a[i]>0.

The log-normal distributions for the a[i] and dE[i] are introduced into the code example above by changing the corresponding labels in make_prior(N), and taking logarithms of the corresponding prior values:

import gvar as gv
import corrfitter as cf

def make_models():                          # same as before
    models = [
        cf.Corr2(datatag='Gaa', tmin=2, tmax=63, a='a', b='a', dE='dE'),
        cf.Corr2(datatag='Gab', tmin=2, tmax=63, a='a', b='b', dE='dE'),
        ]
    return models

def make_prior(N):
    prior = gv.BufferDict()
    prior['log(a)'] = gv.log(gv.gvar(N * ['0.1(5)']))
    prior['b'] = gv.gvar(N * ['1(5)'])
    prior['log(dE)'] = gv.log(gv.gvar(N * ['0.25(25)']))
    return prior

This replaces the original fit parameters, a[i] and dE[i], by new fit parameters, log(a)[i] and log(dE)[i]. The a priori distributions for the logarithms are Gaussian/normal, with priors of log(0.1±0.5) and log(0.25±0.25) for the log(a)s and log(dE)s respectively.

Note that the labels are unchanged here in make_models(). It is unnecessary to change labels in the models; corrfitter.CorrFitter will automatically connect the modified terms in the prior with the appropriate terms in the models. This allows one to switch back and forth between log-normal and normal distributions without changing the models (or any other code) — only the names in the prior need be changed. corrfitter.CorrFitter also supports “sqrt-normal” distributions, and other distributions, as discussed in the lsqfit documentation.

Finally note that another option for stabilizings fits involving many sources and sinks is to generate priors for the fit amplitudes and energies using corrfitter.EigenBasis.

Marginalization

Often we care only about parameters in the leading term of the fit function, or just a few of the leading terms. The non-leading terms are needed for a good fit, but we are uninterested in the values of their parameters. In such cases the non-leading terms can be absorbed into the fit data, leaving behind only the leading terms to be fit (to the modified fit data) — non-leading parameters are, in effect, integrated out of the analysis, or marginalized. The errors in the modified data are adjusted to account for uncertainties in the marginalized terms, as specified by their priors. The resulting fit function has many fewer parameters, and so the fit can be much faster.

Continuing with the example in Priming Fits, imagine that Nmax=8 terms are needed to get a good fit, but we only care about parameter values for the first couple of terms. The code from that section can be modified to fit only the leading N terms where N<Nmax, while incorporating (marginalizing) the remaining, non-leading terms as corrections to the data:

Nmax = 8
data = make_data('mcfile')
models = make_models()
fitter = cf.CorrFitter(models=make_models())
prior = make_prior(Nmax)        # build priors for Nmax terms
p0 = None
for N in [1,2,3]:               # fit N terms
    fit = fitter.lsqfit(data=data, prior=prior, p0=p0, nterm=N)
    print_results(fit, prior, data)
    p0 = fit.pmean

Here the nterm parameter in fitter.lsqfit specifies how many terms are used in the fit functions. The prior specifies Nmax terms in all, but only parameters in nterm=N terms are varied in the fit. The remaining terms specified by the prior are automatically incorporated into the fit data by corrfitter.CorrFitter.

Remarkably this method is usually as accurate with N=1 or 2 as a full Nmax-term fit with the original fit data; but it is much faster. If this is not the case, check for singular priors, where the mean is much smaller than the standard deviation. These can lead to singularities in the covariance matrix for the corrected fit data. Such priors are easily fixed: for example, use gvar.gvar('0.1(1.0)') rather than gvar.gvar('0(1)'). In some situations an SVD cut (see below) can also help.

Chained Fits

Large complicated fits, where lots of models and data are fit simultaneously, can take a very long time. This is especially true if there are strong correlations in the data. Such correlations can also cause problems from numerical roundoff errors when the inverse of the data’s covariance matrix is computed for the \chi^2 function, requiring large SVD cuts which can degrade precision (see below). An alternative approach is to use chained fits. In a chained fit, each model is fit by itself in sequence, but with the best-fit parameters from each fit serving as priors for fit parameters in the next fit. All parameters from one fit become fit parameters in the next, including those parameters that are not explicitly needed by the next fit (since they may be correlated with the input data for the next fit or with its priors). Statistical correlations between data/priors from different models are preserved throughout (approximately).

The results from a chained fit are identical to a standard simultaneous fit in the limit of large statistics (that is, in the Gaussian limit), but a chained fit usually involves fitting only a single correlator at a time. Single-correlator fits are typically much faster than simultaneous multi-correlator fits, and roundoff errors (and therefore SVD cuts) are much less of a problem.

Converting to chained fits is trivial: simply replace fit = fitter.lsqfit(...) by fit = fitter.chained_lsqfit(...). The output from this function comes from the last fit in the chain, whose fit results represent the cummulative results of the entire chain of fits. Results from the different links in the chain — that is, from the fits for individual models — are displayed using print(fit.formatall()).

There are various ways of chaining fits. For example, setting

models = [m1, m2, (m3a, m3b), m4]

causes models m1, m2 and m4 to be fit separately, but fits models m3a and m3b together in a single simultaneous fit:

m1 -> m2 -> (simultaneous fit of m3a, m3b) -> m4

Simultaneous fits make sense when there is lots of overlap between the parameters for the different models.

Another option is

models = [m1, m2, [m3a,m3b], m4]

in fitter.chained_lsqfit which causes the following chain of fits:

m1 -> m2 -> (parallel fit of m3a, m3b) -> m4

Here the output from m1 is used in the prior for fit m2, and the output from m2 is used as the prior for a parallel fit of m3a and m3b together — that is, m3a and m3b are not chained, but rather are fit in parallel with each using a prior from fit m2. The result of the parallel fit of [m3a,m3b] is used as the prior for m4. Parallel fits make sense when there is little overlap between the parameters used by the different fits.

Chained fits are particularly useful for combining results from 2-point correlators with those from 3-point correlators, to determine a mixing amplitude or form factor for a ground state particle. A simultaneous fit of all these correlators can be quite unwieldy for realistic applications, but the analysis falls naturally into two parts. The first part uses the 2-point correlators to determine the energies and amplitudes of the various relevant states. The second part combines this information with the 3-point correlators to extract the desired 3-point amplitudes(Vnn, Vno, etc.).

Schematically one might structure such a fit with the following models list:

models = [(2-pt correlators), dict(nterm=(1,0)), (3-pt correlators)]

where (2-pt correlators) is a tuple containing all 2-point correlators and (3-pt correlators) is a tuple containing all 3-point correlators. The dictionary dict(nterm=(1,0)) resets fit parameter nterm for subsequent fits (i.e., the 3-point fits), which causes those fits to remove all but the ground state using marginalization (see Marginalization). Marginalization is particularly effective here when the 2-point fits give good estimates for the nonlinear parameters (2-point amplitudes and energies; the 3-point amplitudes are all linear). Extreme marginalization then makes the 3-point fits much faster, but also accurate.

Faster Fitters

When fits take many iterations to converge (or converge to an obviously wrong result), it is worthwhile trying a different fitter. The lsqfit module, which is used by corrfitter for fitting, offers a variety of alternative fitting algorithms that can sometimes be much faster (2 or 3 times faster). These are deployed by adding extra directives for lsqfit when constructing the fitter or when doing the fit: for example,

import corrfitter as cf

fitter = cf.CorrFitter(
    models=make_models(),
    fitter='gsl_multifit', alg='subspace2D', solver='cholesky'
    )

uses the subspace2D algorithm for subsequent fits with fitter. It is also possible to reset the default algorithms for all fits:

import lsqfit

lsqfit.nonlinear_fit.set(
    fitter='gsl_multifit', alg='subspace2D', solver='cholesky'
    )

The documentation for lsqfit describes many more options.

Processed Datasets

When fitting very large data sets, it is usually worthwhile to pare the data down to the smallest subset that is needed for the fit. Ideally this is done before the Monte Carlo data are averaged, to keep the size of the covariance matrix down. One way to do this is to process the Monte Carlo data with the models, just before averaging it, by using

import gvar as gv
import corrfitter as cf

def make_pdata(filename, models):
    dset = cf.read_dataset(filename)
    return cf.process_dataset(dset, models)

in place of make_data(filename). Here models is the list of models used by the fitter (fitter.models). Function make_pdata returns processed data which is passed to fitter.lsqfit using the pdata keyword:

import corrfitter as cf

def main():
    N = 4
    models = make_models()
    pdata = make_pdata('mcfile', models)
    prior = make_prior(N)
    fitter = cf.CorrFitter(models=models)
    fit = fitter.lsqfit(pdata=pdata, prior=prior)
    print(fit)
    print_results(fit, prior, pdata)

...

if __name__ == '__main__':
    main()

Processed data can only be used with the models that created it, so parameters in those models should not be changed after the data is processed.

Accurate Fits — SVD Cuts

A key feature of corrfitter is its ability to fit multiple correlators simultaneously, taking account of the statistical correlations between correlators at different times and between different correlators. Information about the correlations typically comes from Monte Carlo samples of the correlators. Problems arise, however, when the number N_s of samples is not much larger than the number N_d of data points being fit. Specifically the smallest eigenvalues of the correlation matrix can be substantially underestimated if N_s is not sufficiently large (10 or 100 times larger than N_d). Indeed there must be N_d-N_s zero eigenvalues when N_s\le N_d. The underestimated (or zero) eigenvalues lead to incorrect and large (or infinite) contributions to the fit’s \chi^2 function, invalidating the fit results.

These problems tend show up as an unexpectedly large \chi^2s, for example, in fits where the \chi^2 per degree of freedom remains substantially larger than one no matter how many fit terms are employed. Such situations are usually improved by introducing an SVD cut: e.g.,

fit = fitter.lsqfit(data=data, prior=prior, p0=p0, svdcut=1e-2)

This replaces the smallest eigenvalues of the correlation matrix as needed so that no eigenvalue is smaller than svdcut times the largest eigenvalue. Introducing an SVD cut increases the effective errors and so is a conservative move.

The method gvar.dataset.svd_diagnoisis() in module gvar is useful for assessing whether an SVD cut is needed, and for setting its value. One way to use it is to write a separate script to read the fit data (from a file named 'datafile') into a dataset, which then is passed it to svd_diagnosis() for analysis:

import gvar as gv
import corrfitter as cf

def main():
    dset = cf.read_dataset('datafile')
    s = gv.dataset.svd_diagnosis(dset, models=make_models())
    print('svdcut =', s.svdcut)
    s.plot_ratio(show=True)

Method make_models(), from the fit code, should also be added to the script (see Basic Fits); it returns the list of the models used by the fit (to tell svd_diagnosis() exactly which data points are going to be fit). The script prints out a suggested value for svdcut.

gv.dataset.svd_diagnosis(dset, models...) uses a bootstrap simulation to test the reliability of the eigenvalues determined from the Monte Carlo data in dset. It places the SVD cut at the point where the bootstrapped eigenvalues fall significantly below the actual values. A plot showing the ratio of bootstrapped to actual eigenvalues is displayed by s.plot_ratio(show=True). The following are sample plots from two otherwise identical simulations of 3 correlators (63 data points in all), one with 100 configurations and the other with 10,000 configurations:

   
_images/svd-bootstrap-1e2.png _images/svd-bootstrap-1e4.png

With only 100 configurations, three quarters of the eigenvalues are too small in the bootstrap simulation, and therefore also likely too small for the real data. Simulated and actual eigenvalues come into agreement around 0.07 (red dashed line), which is the suggested value for svdcut. With 10,000 configurations, all of the eigenvalues are robust and no SVD cut is needed. Both data sets produce good fits (using the appropriate svdcut for each).

On average, individual terms in the \chi^2 should contribute 1 \pm \sqrt{2/N_d}, where again N_d is the number of data points. Underestimating eigenvalues artificially increases the size of the corresponding terms in the \chi^2, and therefore eigenvalues that are smaller than 1 - \sqrt{2/N_d} times the correct value cause problems. The dotted horizontal line shows where this threshold is located. The SVD cut is placed where the blue data points (the ratios of bootstrapped to exact eigenvalues) cross that threshold.

Having determined the SVD cut, we modify the data by setting keywoard argument svdcut in the fits, or by using gvar.regulate() to apply the SVD cut explicitly.

Goodness of Fit

A small \chi^2 is generally a sign of a good fit, but \chi^2 becomes unreliable as an indicator of fit quality when using significant SVD cuts. The SVD cut increases uncertainties in the fit data without increasing the fluctuations in the data’s mean values. As a result contributions to the \chi^2 from terms affected by the SVD cut tend to be much smaller than expected, artificially pulling the total \chi^2 down. A similar issue is created by using broad priors. (For more discussion, see the section on goodness of fit in the documentation with moduled lsqfit.)

To check the goodness of fit using a fit’s \chi^2 we must add random fluctuations to the data means associated with the SVD cut and the priors. This is done by refitting the data but with parameter noise=True. A good fit will still have a \chi^2 of order one or less per degree freedom even with the added noise. The fit parameters from the noisy fit should also agree within errors with best-fit parameters from the original fit.

See the Case Studies for examples.