Quality of the Plans Generated by LPG-td at IPC4

At the 4th IPC (ICAPS 2004, Whistler, Canada) not all planners used the same notion of plan quality. Some planners adopted the number of plan actions, while others used the plan metric specified in the PDDL2.2 problem description (e.g., the plan makespan for temporal planning problems). LPG-td used the number of plan actions, for simple STRIPS problems, and the specified plan metric in all the other variants of the test problems. We believe that, for temporal or numerical domains, the use of the specified plan metric is much more natural and practically useful than the use of the number of plan actions. However, overall, the solutions produced by LPG-td.quality and especially LPG-td.bestquality are better than the solutions produced by the other IPC4 planners  in terms of both the number of actions and the specified plan metric.

At IPC4 some domains had different formalizations (ADL or STRIPS), that the competitors were free to chose for their planner. The official results of IPC4, that are available from the IPC4 website, compared planners that addressed different formalizations of the same domain. Here we do the same, with the only exception of the temporal variant of the Airport domain, because the problems in this domain have optimal solutions with different quality in the STRIPS and ADL formalizations.
We note, however, that the results of our analysis are similar when, for every domain, the planners are compared only if they use the same domain formalization.

As plan quality indexes, we use the following differences that are computed considering only the problems solved by both the two compared planners. When these differences are positive values, they indicate a better performance of LPG-td.
The first and the second of the following four tables compare the performance of LPG-td.quality and of the other IPC4 planners. The third and the fourth tables  compares the performance of LPG-td.bestquality and of the other IPC4 planners.


Comparison of LPG-td.quality and the other planners of IPC4
 (plan quality = problem-specified plan metric)

LPG-td.quality
versus
Problems Solved by
LPG-td/IPC4-Planner/Both
% Better Quality Plans
minus
% Worse Quality Plans
% Much Better Quality Plans
minus 
% Much Worse Quality Plans
SGPlan 845 / 1090 / 771 57.6% 36.6%
Crickey 845 / 364 / 293 45.4% 36.2%
Downward-diagonally 845 / 380 / 305 49.2% 28.5%
Downward 845 / 360 / 296 41.2% 29.4%
Marvin 845 / 224 / 211 36.5% 28.4%
Yahsp 845 / 255 / 210 71.4% 47.1%
Macro-FF 845 / 189 / 138 84.8% 44.9%
Til-Sapa 845 / 63 / 63 82.5% 3.2%
P-mep 845 / 98 / 91 70.3% 39.6%
Roadmapper 845 / 52 / 51 84.3% 70.6%
FAP 845 / 81 / 28 71.4% 50.0%

Comment
Overall these results show that, using the problem-specified plan metric (that for STRIPS problems was the "Graphplan plan length" or number of time steps), LPG-td.quality performs much better than the other planners.




Comparison of LPG-td.quality and the other planners of IPC4
 (plan quality = number of plan actions)

LPG-td.quality
versus
Problems Solved by
LPG-td/IPC4-Planner/Both
% Better Quality Plans
minus
% Worse Quality Plans
% Much Better Quality Plans
minus
% Much Worse Quality Plans
SGPlan 845 / 1090 / 771 5.7% 6.1%
Crickey 845 / 364 / 293 40.6% 22.2%
Downward-diagonally 845 / 380 / 305 15.4% -2.0%
Downward 845 / 360 / 296 6.8% -0.7%
Yahsp 845 / 255 / 210 30.5% 14.3%
Macro-FF 845 / 189 / 138 22.5% 0.7%
Roadmapper 845 / 52 / 51 56.9% 15.7%
FAP 845 / 81 / 28 39.3% 35
Comment: In this table we consider only the planners that did not attempt to optimize the problem-specified plan metric. As plan quality metric, these planners used the number of actions in the plan, or they just provided any solution they could find with no attempt to optimize plan quality.  In terms of number of actions, the Downward planner found a few solutions that are much better than the solutions found by LPG-td.quality; however, if in the comparison we include plans with small differences in plan quality, then in general LPG-td.quality performed better than Downward.



Comparison of LPG-td.bestquality and the other planners of IPC4
 (plan quality = problem-specified plan metric)
LPG-td.bestquality
versus
Problems Solved by
 LPG-td/IPC4-Planner/Both
% Better Quality Plans
minus
% Worse Quality Plans
% Much Better Quality Plans
minus
% Much Worse Quality Plans
SGPlan 845 / 1090 / 771 65.6% 41.0%
Crickey 845 / 364 / 293 56.3% 43.0%
Downward-diagonally 845 / 380 / 305 61.3% 36.1%
Downward 845 / 360 / 296 55.7% 36.8%
Marvin 845 / 224 / 211 46.0% 36.5%
Yahsp 845 / 255 / 210 84.8% 54.3%
Macro-FF 845 / 189 / 138 87.7% 49.3%
Til-Sapa 845 / 63 / 63 82.5% 3.2%
P-mep 845 / 98 / 91 70.3% 41.8%
Roadmapper 845 / 52 / 51 88.2% 72.5%
FAP 845 / 81 / 28 71.4% 53.6%

Comment
Overall these results show that, using the problem-specified plan metric (that for STRIPS problems was the "Graphplan plan length" or number of time steps), LPG-td.bestquality performs much better than the other planners.



Comparison of LPG-td.bestquality and the other planners of IPC4
 (plan quality = number of plan actions)
LPG-td.bestquality
 versus
Problems Solved by
 LPG-td/IPC4-Planner/Both
% Better Quality Plans
minus
% Worse Quality Plans
% Much Better Quality Plans
 minus
% Much Worse Quality Plans
SGPlan 845 / 1090 / 771 12.5% 8.9%
Crickey 845 / 364 / 293 51.5% 27.3%
Downward-diagonally 845 / 380 / 305 27.5% 4.6%
Downward 845 / 360 / 296 22.3% 6.4%
Yahsp 845 / 255 / 210 41.9% 20.5%
Macro-FF 845 / 189 / 138 31.9% 4.3%
Roadmapper 845 / 52 / 51 62.7% 19.6%
FAP 845 / 81 / 28 42.9% 35.7%

Comment
:
In this table we consider only the planners that did not attempt to optimize the problem-specified plan metric. As plan quality metric, these planners used the number of actions in the plan, or they just provided any solution they could find with no attempt to optimize plan quality.  Overall these results show that, using the number of plan actions, LPG-td.bestquality performs much better than the other planners.