At the 4th IPC (ICAPS 2004, Whistler,
Canada) not all planners used the same notion of plan quality. Some
planners adopted the number of plan actions, while others used the plan metric
specified
in the PDDL2.2 problem description (e.g., the plan makespan for
temporal planning problems). LPG-td used the number of plan actions,
for simple STRIPS problems, and the specified plan metric in all the
other variants of the test problems. We believe that, for temporal or
numerical domains, the use of the specified plan metric is much more
natural and practically useful than the use of the number of plan
actions. However, overall, the solutions produced by LPG-td.quality and
especially
LPG-td.bestquality are better
than the solutions produced by the other IPC4 planners in terms
of both the number of actions
and the specified plan metric.
At IPC4 some domains had different
formalizations (ADL or STRIPS), that the competitors were free to chose
for their planner. The official results of IPC4, that are available
from the IPC4 website, compared planners that addressed different
formalizations of the same domain. Here we do the same, with
the only exception of the temporal variant of the Airport domain,
because the problems in this domain have optimal solutions with
different quality in the STRIPS and ADL formalizations. We note, however, that the results of
our analysis are similar when, for every domain, the planners are
compared only if they use the same domain formalization.
As plan quality indexes, we use
the
following differences that are computed considering only the problems
solved by both the two compared planners. When these differences are
positive values, they indicate a better performance of LPG-td.
- % of problems for which LPG-td found a solution with quality
better than the quality of corresponding solution found by the other
planner minus % of problems
for which LPG-td found a solution with
quality worse than the quality of the corresponding solution found by
the other planner. The quality of two solutions are considered
different, if their values differ by more than 1%;
-
% of problems for which LPG-td found a solution with quality
much better
than the quality of corresponding solution found by the other planner minus % of problems for
which LPG-td found a solution with quality
much worse than the quality of the corresponding solution found by the
other
planner. A solution X produced by a planner is much better (much
worse) than the solution Y produced by another planner when the
quality of X
is at least 50% better (worse) than the quality of Y. E.g., If X has
quality 180 and Y has quality 100, then X is much better than Y (Y is
much worse than X).
The first and the second of the
following four tables compare the performance of
LPG-td.quality and of the other IPC4 planners. The third and the fourth
tables compares the performance of
LPG-td.bestquality and of the other IPC4 planners.
Comparison of LPG-td.quality and the
other planners of IPC4
(plan quality =
problem-specified plan metric)
LPG-td.quality
versus |
Problems Solved by
LPG-td/IPC4-Planner/Both |
% Better Quality Plans
minus
% Worse Quality Plans
|
% Much Better Quality Plans
minus
% Much Worse Quality Plans
|
SGPlan |
845 / 1090 / 771 |
57.6% |
36.6% |
Crickey |
845 / 364 / 293 |
45.4% |
36.2% |
Downward-diagonally |
845 / 380 / 305 |
49.2% |
28.5% |
Downward |
845 / 360 / 296 |
41.2% |
29.4% |
Marvin |
845 / 224 / 211 |
36.5% |
28.4% |
Yahsp |
845 / 255 / 210 |
71.4% |
47.1% |
Macro-FF |
845 / 189 / 138 |
84.8% |
44.9% |
Til-Sapa |
845 / 63 / 63 |
82.5% |
3.2% |
P-mep |
845 / 98 / 91 |
70.3% |
39.6% |
Roadmapper |
845 / 52 / 51 |
84.3% |
70.6% |
FAP |
845 / 81 / 28 |
71.4% |
50.0% |
|
Comment: Overall these
results show that, using the problem-specified plan metric (that for
STRIPS problems was the "Graphplan plan length" or number of time
steps), LPG-td.quality performs much better than the other planners.
Comparison of LPG-td.quality
and the other planners of IPC4
(plan
quality = number of plan actions)
LPG-td.quality
versus |
Problems Solved by
LPG-td/IPC4-Planner/Both |
% Better Quality Plans
minus
% Worse Quality Plans
|
% Much Better Quality Plans
minus
% Much Worse Quality Plans
|
SGPlan |
845 / 1090 / 771 |
5.7% |
6.1% |
Crickey |
845 / 364 / 293 |
40.6% |
22.2% |
Downward-diagonally |
845 / 380 / 305 |
15.4% |
-2.0% |
Downward |
845 / 360 / 296 |
6.8% |
-0.7% |
Yahsp |
845 / 255 / 210 |
30.5% |
14.3% |
Macro-FF |
845 / 189 / 138 |
22.5% |
0.7% |
Roadmapper |
845 / 52 / 51 |
56.9% |
15.7% |
FAP |
845 / 81 / 28 |
39.3% |
35 |
|
Comment:
In this table we consider only the
planners that did not attempt to optimize the problem-specified plan
metric. As plan quality metric, these planners used the number of
actions in the plan, or they just provided any solution they could find
with no attempt to optimize plan quality. In terms
of number of actions, the Downward planner found a few solutions
that are much better than the solutions found by LPG-td.quality;
however, if in the comparison we include plans with small differences
in plan quality, then in general LPG-td.quality performed better than
Downward.
Comparison of LPG-td.bestquality and
the other planners of IPC4
(plan
quality = problem-specified plan metric)
LPG-td.bestquality
versus |
Problems Solved by
LPG-td/IPC4-Planner/Both |
% Better Quality Plans
minus
% Worse Quality Plans
|
% Much Better Quality Plans
minus
% Much Worse Quality Plans
|
SGPlan |
845 / 1090 / 771 |
65.6% |
41.0% |
Crickey |
845 / 364 / 293 |
56.3% |
43.0% |
Downward-diagonally |
845 / 380 / 305 |
61.3% |
36.1% |
Downward |
845 / 360 / 296 |
55.7% |
36.8% |
Marvin |
845 / 224 / 211 |
46.0% |
36.5% |
Yahsp |
845 / 255 / 210 |
84.8% |
54.3% |
Macro-FF |
845 / 189 / 138 |
87.7% |
49.3% |
Til-Sapa |
845 / 63 / 63 |
82.5% |
3.2% |
P-mep |
845 / 98 / 91 |
70.3% |
41.8% |
Roadmapper |
845 / 52 / 51 |
88.2% |
72.5% |
FAP |
845 / 81 / 28 |
71.4% |
53.6% |
Comment:
Overall these results show that,
using the problem-specified plan
metric (that for STRIPS problems was the "Graphplan plan length" or
number of time steps), LPG-td.bestquality performs much better than the
other planners.
Comparison of LPG-td.bestquality and
the other planners of IPC4
(plan quality = number of
plan actions)
LPG-td.bestquality
versus |
Problems Solved by
LPG-td/IPC4-Planner/Both |
% Better Quality Plans
minus
% Worse Quality Plans
|
% Much Better Quality Plans
minus
% Much Worse Quality Plans
|
SGPlan |
845 / 1090 / 771 |
12.5% |
8.9% |
Crickey |
845 / 364 / 293 |
51.5% |
27.3% |
Downward-diagonally |
845 / 380 / 305 |
27.5% |
4.6% |
Downward |
845 / 360 / 296 |
22.3% |
6.4% |
Yahsp |
845 / 255 / 210 |
41.9% |
20.5% |
Macro-FF |
845 / 189 / 138 |
31.9% |
4.3% |
Roadmapper |
845 / 52 / 51 |
62.7% |
19.6% |
FAP |
845 / 81 / 28 |
42.9% |
35.7% |
Comment: In
this table we consider only
the planners that did not attempt to optimize the problem-specified
plan metric. As plan quality metric, these planners used the number of
actions in the plan, or they just provided any solution they could find
with no attempt to optimize plan quality. Overall
these results show that,
using the number of plan actions, LPG-td.bestquality performs much
better than the
other planners.