next up previous [pdf]

Next: Finding the prediction-error filter Up: PEF ESTIMATION WITH MISSING Previous: PEF ESTIMATION WITH MISSING

Internal boundaries to multidimensional convolution

Sometimes we deal with small patches of data. In order that boundary phenomena not dominate the calculation intended in the central region, we need to take care that input data is not assumed to be zero beyond the interval that the data is given.

The two little triangular patches of zeros in the convolution matrix in equation (29) describe end conditions where it is assumed that the data $ y_t$ vanishes before $ t=1$ and after $ t=6$ . Alternately we might not wish to make that assumption. Thus the triangles filled with zeros could be regarded as missing data. In this one-dimensional example, it is easy to see that the filter, say yy->mis should be set to true at the ends so no output would ever be computed there. We would like to find a general multidimensional algorithm to correctly specify yy->mis around the multidimensional boundaries. This proceeds like the missing data algorithm, i.e. we apply a filter of all ones to a data space template that is taken all zeros except ones at the locations of missing data, in this case $ y_0,y_{-1}$ and $ y_7,y_8$ . This amounts to surrounding the original data set with some missing data. We need padding the size of the filter on all sides. The padded region would be filled with ones (designating missing inputs). Where the convolution output is nonzero, there yy->mis is set to true denoting an output with missing inputs.

The two-dimensional case is a little more cluttered than the 1-D case but the principle is about the same. Figure 14 shows a larger input domain, a $ 5\times 3$ filter, and a smaller output domain.

Figure 14.
Domain of inputs and outputs of a two-dimensional filter like a PEF.
[pdf] [png] [xfig]

There are two things to notice. First, sliding the filter everywhere inside the outer box, we get outputs (under the 1 location) only in the inner box. Second, (the adjoint idea) crosscorrelating the inner and outer boxes gives us the $ 3\times 5$ patch of information we use to build the filter coefficients. We need to be careful not to assume that signals vanish outside the region where they are defined. In a later chapter we will break data spaces into overlapping patches, separately analyze the patches, and put everything back together. We do this because crosscorrelations change with time and they are handled as constant in short time windows. There we must be particularly careful that zero signal values not be presumed outside of the small volumes; otherwise the many edges and faces of the many small volumes can overwhelm the interior that we want to study.

In practice, the input and output are allocated equal memory, but the output residual is initialized to zero everywhere and then not computed except where shown in figure 14. Below is module bound to build a selector for filter outputs that should never be examined or even computed (because they need input data from outside the given data space). Inputs are a filter aa and the size of its cube na = (na(1),na(2),...). Also input are two cube dimensions, that of the data last used by the filter nold and that of the filter's next intended use nd. (nold and nd are often the same). Module bound begins by defining a bigger data space with room for a filter surrounding the original data space nd on all sides. It does this by the line nb=nd+2*na. Then we allocate two data spaces xx and yy of the bigger size nb and pack many ones in a frame of width na around the outside of xx. The filter aa is also filled with ones. The filter aa must be regridded for the bigger nb data space (regridding merely changes the lag values of the ones). Now we filter the input xx with aa getting yy. Wherever the output is nonzero, we have an output that has been affected by the boundary. Such an output should not be computed. Thus we allocate the logical mask aa->mis (a part of the helix filter definition in module helix [*]) and wherever we see a nonzero value of yy in the output, we designate the output as depending on missing inputs by setting aa->mis to true.

void bound (int dim         /* number of dimensions */, 
	    const int *nold /* old data coordinates [dim] */, 
	    const int *nd   /* new data coordinates [dim] */, 
	    const int *na   /* filter box size [dim] */, 
	    const sf_filter aa /* helix filter */) 
/*< Mark helix filter outputs where input is off data. >*/
    int iy, my, ib, mb, i, nb[SF_MAX_DIM], ii[SF_MAX_DIM];
    float *xx, *yy;
    my = mb = 1;
    for (i=0; i < dim; i++) {
	nb[i] = nd[i] + 2*na[i]; /* nb is a bigger space. */   
	mb *= nb[i];
	my *= nd[i];

    xx = sf_floatalloc(mb); yy = sf_floatalloc(mb);

    for (ib=0; ib < mb; ib++) { 
	sf_line2cart(dim, nb, ib, ii);    
	xx[ib] = 0.;  	
	for (i=0; i < dim; i++) 
	    if(ii[i]+1 <= na[i] ||  ii[i]+1 > nb[i]-na[i]) {
		xx[ib] = 1.; 
    sf_helicon_init( aa);		  
    regrid(dim, nold, nb, aa);  
    for (i=0; i < aa->nh; i++) aa->flt[i] = 1.;		
    /* apply filter */
    sf_helicon_lop(false, false, mb, mb, xx, yy); 
    regrid(dim, nb, nd, aa);  
    for (i=0; i < aa->nh; i++) aa->flt[i] = 0.;

    aa->mis = sf_boolalloc(my); /* attach missing designation */
    for (iy = 0; iy < my; iy++) {  /* map to padded space */
	sf_line2cart(dim, nd, iy, ii);
	for (i=0; i < dim; i++) ii[i] += na[i];
	ib = sf_cart2line(dim, nb, ii);
	aa->mis[iy] = (bool) (yy[ib] > 0.);  
    free (xx); free (yy);

In reality one would set up the boundary conditions with module bound before identifying locations of missing data with module misinput. Both modules are based on the same concept, but the boundaries are more cluttered and confusing which is why we examined them later.

next up previous [pdf]

Next: Finding the prediction-error filter Up: PEF ESTIMATION WITH MISSING Previous: PEF ESTIMATION WITH MISSING