next up previous [pdf]

Next: Writing back Up: Method Previous: Sequential main function

Relaxation

Algorithm 3 is the kernel Relaxation. It takes care of finding the smaller traveltime to each vertex up to the current iteration. This kernel is launched for each vertex, so in a similar way to the kernel function that precalculates the weights, it starts by obtaining the current vertex indices $(col,row)$. The following step is to iterate each neighbor $(i,k)$ of vertex $(col,row)$. During each iteration it calculates the following expression:


$\displaystyle newtt$ $\textstyle =$ $\displaystyle tt[i][k] + w[col][row][i][k]$ (1)

Figure 3 shows schematically this calculation. It is the current smaller traveltime from source $(si,sk)$ to neighbor vertex $(i,k)$ plus the traveltime from this vertex to vertex $(col,row)$. This is a candidate smaller traveltime from source vertex to $(col,row)$ passing through $(i,k)$. If this traveltime is smaller than the current smaller traveltime from source vertex to vertex $(col,row)$, we have to actualize its value and record that this new smaller traveltime is reached through vertex $(i,k)$

tt+w1
tt+w1
Figure 3.
Traveltime compared during relaxation kernel.
[pdf] [png]

It should be noted that during this operation each kernel thread, although reads information from many differente vertices, only modifies information of its own vertex $(col,row)$ and no other kernel thread modify this vertex. This is the main difference with the shortest path algorithm of Harish and Narayanan (2007), that at this point calculates the expression:


$\displaystyle newtt$ $\textstyle =$ $\displaystyle tt[col][row] + w[col][row][i][k],$ (2)

i.e. the sum of the traveltime from source to vertex $(col,row)$ and the traveltime from this vertex to vertex $(i,k)$. Figure 4 displays schematically this calculation. This value is a tentative smaller traveltime from source to vertex $(i,k)$ passing through $(col,row)$. If this is small that the current smaller traveltime to vertex $(i,k)$ it modifies traveltime and predecessor values for vertex $(i,k)$. The problem is that another kernel thread might be, concurrently, attempting to modify information of the same vertex $(i,k)$ (because vertices usually belong to many neighborhoods) and this can lead to race conditions.

tt+w2
tt+w2
Figure 4.
Traveltime compared in Harish and Narayanan (2007).
[pdf] [png]


\begin{algorithm}
% latex2html id marker 67\caption{Relaxation(tt,auxtt,w,pr,...
...TE $auxpr[col][row] = (i,k)$
\ENDIF
\ENDFOR
\end{algorithmic}\end{algorithm}

The solution of Harish and Narayanan (2007) is to use an atomic function that compares and possibly stores in one single instruction the smaller traveltime in $tt[i][k]$. This strategy avoids the race condition but has two drawbacks. First, atomic functions slow down kernel execution. And second, it only allows to write a single value in an atomic operation, but we need to write two values: traveltime and predecessor. To solve this last problem it is possible to implement and use mutexes to do all the actualizations avoiding the race conditions, but this only slows down even more the kernel execution, as mutexes are usually implemented using atomic functions.

Another detail to take into account is that during the traveltime and predecesor actualization we have been setting the auxiliary arrays $auxtt$ and $auxpr$ instead of the arrays $tt$ and $pr$. The reason is that in the current kernel iteration other threads are still using the current values of $tt$ and $pred$ and we only want to use the new values in the next invocation of this kernel for consistency.


next up previous [pdf]

Next: Writing back Up: Method Previous: Sequential main function

2013-10-09