![]() |
![]() |
![]() |
![]() | Shortest path ray tracing on parallel GPU devices | ![]() |
![]() |
Algorithm 3 is the kernel Relaxation. It takes
care of finding the smaller traveltime to
each vertex up to the current iteration. This kernel is
launched for each vertex, so in a similar way to the kernel
function that precalculates the weights, it starts by obtaining
the current vertex indices . The following step is
to iterate each neighbor
of vertex
. During
each iteration it calculates the following expression:
![]() |
![]() |
![]() |
(1) |
Figure 3 shows schematically this calculation.
It is the current smaller traveltime from source to
neighbor vertex
plus the traveltime from this vertex
to vertex
. This is a candidate smaller traveltime
from source vertex to
passing through
.
If this traveltime is smaller than the current smaller
traveltime from source vertex to vertex
, we have
to actualize its value and record that this new smaller
traveltime is reached through vertex
![]() |
---|
tt+w1
Figure 3. Traveltime compared during relaxation kernel. |
![]() ![]() |
It should be noted that during this operation each
kernel thread, although reads information from many
differente vertices, only modifies information of its
own vertex and no other kernel thread modify
this vertex. This is the main difference with
the shortest path algorithm of
Harish and Narayanan (2007), that at this
point calculates the expression:
![]() |
![]() |
![]() |
(2) |
i.e. the sum of the traveltime from source to vertex
and the traveltime from this vertex to vertex
. Figure
4 displays schematically this calculation. This
value is a tentative smaller traveltime from source to vertex
passing through
. If this is small that
the current smaller traveltime to vertex
it modifies
traveltime and predecessor values for vertex
. The
problem is that another kernel thread might be, concurrently,
attempting to modify information of the same vertex
(because vertices usually belong to many neighborhoods)
and this can lead to race conditions.
![]() |
---|
tt+w2
Figure 4. Traveltime compared in Harish and Narayanan (2007). |
![]() ![]() |
The solution of Harish and Narayanan (2007) is
to use an atomic function that compares and possibly stores
in one single instruction the smaller traveltime in .
This strategy avoids the race condition but has two drawbacks.
First, atomic functions slow down kernel execution. And second,
it only allows to write a single value in an atomic operation, but we
need to write two values: traveltime and predecessor. To solve this
last problem it is possible to implement and use mutexes
to do all the actualizations avoiding the race
conditions, but this only slows down even more the kernel
execution, as mutexes are usually implemented using atomic
functions.
Another detail to take into account is that during the traveltime
and predecesor actualization we have been setting the auxiliary
arrays and
instead of the arrays
and
.
The reason is that in the current kernel iteration other threads
are still using the current values of
and
and we
only want to use the new values in the next invocation of this
kernel for consistency.
![]() |
![]() |
![]() |
![]() | Shortest path ray tracing on parallel GPU devices | ![]() |
![]() |