# Input Data Format

There are three components to the input data format:

- A specification of the 3D grid used to sample the correlation function
- A correlation function data vector specified on the grid
- An estimate of the data vector covariance

The 3D grid is specified via runtime configuration options that are normally saved in a file but can also be provided on the command line (see also the `custom-grid` option described below). The data vector and covariance matrix are stored in files using one of the following extensions:

Extension |
Meaning |

.data |
Unweighted data vector (default) |

.wdata |
Inverse covariance weighted data vector (use --load-wdata option) |

.cov |
Covariance matrix (default) |

.icov |
Inverse covariance matrix (use --load-icov) |

The data and covariance can be sparse, and can also be split into subsamples for resampling methods.

The grid used to sample the correlation function consists of 3 axes:

Separation along the line of sight, e.g., r

_{||}, Δv, log(λ1/λ2).Separation transverse to the line of sight, e.g., r⊥, μ, Δθ, multipole index.

- Average pair distance, e.g., z, D(z).

The following combinations are currently supported via the `data-format` command-line/config option, but other conventions are easy to add:

data-format |
Axis 1 |
Axis 2 |
Axis 3 |
Notes |

comoving-cartesian |
r(par) in Mpc/h |
r(perp) in Mpc/h |
redshift |
use --axis(1,2,3)-bins options |

comoving-polar |
r in Mpc/h |
mu = r(par)/r |
redshift |
use --axis(1,2,3)-bins options |

comoving-multipole |
r in Mpc/h |
multipole ell |
redshift |
use --axis(1,2,3)-bins options |

quasar |
Δlog(λ) |
Δθ in arcmins |
redshift |
use --axis(1,2,3)-bins options |

cosmolib |
Δlog(λ) |
Δθ in arcmins |
redshift |
use cosmolib options (BOSS legacy mode) |

The values along each axes do not need to be equally spaced and are specified via command-line or configuration file options, for example:

# use 50 equally spaced bins for the first axis, covering 0-200 axis1-bins = [0:200]*50 # use the specified bin centers for the second axis axis2-bins = {0.1,0.3,0.4,0.45,0.5,0.6,0.8} # use a single bin for the third axis, with the specified bin center axis3-bins = {2.35}

Read the documentation for `createBinning` in AbsBinning.h for details.

The grid must be rectangular but can be sparsely populated. Any 3D grid point `(i1,i2,i3)` is uniquely specified by its global index `j`:

0 <= i1 < N1 , 0 <= i2 < N2 , 0 <= i3 < N3 j = (i1*N2+i2)*N3+i3

Optionally, a 3D custom grid of non-uniform sampling points is supported via the `custom-grid` option and read from a text input file with the extension ".grid" consisting of columns (global index, axis1 bin center, axis2 bin center, axis3 bin center):

j1 axis1bin(j1) axis2bin(j1) axis3bin(j1) j2 axis1bin(j2) axis2bin(j2) axis3bin(j2) j3 axis1bin(j3) axis2bin(j3) axis3bin(j3) ...

Entries can appear in any order and the custom grid does not need to be rectangular. A default rectangular grid defining the data format always has to be specified (but will not be used explicitly).

The data vector is specified by a text input file using the ".data" extension and consisting of (global index, correlation estimate) pairs:

j1 xi(j1) j2 xi(j2) j3 xi(j3) ...

Entries can appear in any order and a missing entry implies that there is no information available (rather than zero correlation). Values of Cinv.xi can be provided instead of xi (this is flagged via a command-line option).

The covariance matrix is specified by a separate text input file using the ".cov" or ".icov" extension and consisting of triplets (global index 1, global index 2, covariance estimate):

j1 j2 cov(j1,j2) j1 j3 cov(j1,j3) j4 j5 cov(j4,j5) ...

Entries can appear in any order and a missing entry implies that the corresponding covariance is zero. Duplicate entries (j1,j2) and (j2,j1) only need to be specified once. It is an error for the covariance to to refer to global indices that are not present in the data vector. Values of inverse covariance can be provided instead of covariance (in which case files should use the ".icov" extension and the "load-icov" option should be included on the command line or in the config file).

It is often useful to divide input data into independent datasets (e.g., based on subregions of the sky) for the purposes of bootstrapping, etc. In this case, an additional "plate list" file is used to specify the dataset file names (without the ".data" or ".cov" / ".icov" extensions), with one file name per line. Multiple datasets must all use the same binning, including the set of unused bins (if any). In practice, small variations between datasets in which bins have valid correlation function estimates can be accommodated in two ways: either take the intersection of all valid bins (dropping bins which are only sometimes invalid) or else use the union and assign a large error to invalid bins. In practice, it is more efficient to provide ".icov" files than ".cov" files when using many observations.

As an analysis option, a distortion matrix describing a correction to the continuum fitting broadband distortion may be provided as an additional data product. The distortion matrix is specified by a separate text input file using the ".dmat" extension, with a file path that may be independent of the compulsory data components, and consisting of triplets (global index 1, global index 2, distortion estimate):

j1 j1 distortion(j1,j1) j1 j2 distortion(j1,j2) j1 j3 distortion(j1,j3) ...

Entries have to appear in order and a missing entry implies that the corresponding distortion is zero.