PDDLFuse: A Tool for Generating Diverse Planning Domains
Published in GenPlan Workshop, AAAI 2025, 2025
PDDLFuse generates structurally novel, solvable planning domains by fusing base PDDL domains and introducing controlled probabilistic variations.
Why it matters
- Planning models are typically trained and evaluated on a limited set of manually created or reconstructed domains.
- Limited domain diversity restricts generalization and encourages overfitting to benchmark distributions (for example, IPC-style domains).
- Domain reconstruction with LLMs often reproduces known domains rather than creating structurally novel ones.
- Reinforcement learning shows domain randomization improves robustness, but planning lacks an equivalent systematic mechanism.
- Robust evaluation of planners and planning foundation models requires scalable generation of diverse and structurally complex domains.
What we did
- Introduced PDDLFuse, a tool that generates new PDDL domains by fusing two existing domains.
- Applied probabilistic modifications to action preconditions and effects (add, remove, and negate predicates) to control structural variation.
- Generated solvable problem instances by executing random action sequences from initial states.
- Evaluated solvability across increasing fusion depths (0-5), showing performance drops for LPG (10/10 to 4/10) and GOOSE (9/10 to 2/10), while the AVI-based model maintains 7/10 at Depth 5.
- Measured structural complexity using the k-WL test, where at Depth 5, 55% of domains require 3-WL and 20% require greater than 3-WL.
| Depth | FD(FF) | LPG | GOOSE | AVI-based |
|---|---|---|---|---|
| 0 | 9/10 | 10/10 | 9/10 | 10/10 |
| 1 | 10/10 | 10/10 | 10/10 | 9/10 |
| 2 | 10/10 | 8/10 | 8/10 | 9/10 |
| 3 | 9/10 | 7/10 | 5/10 | 8/10 |
| 4 | 8/10 | 5/10 | 3/10 | 8/10 |
| 5 | 8/10 | 4/10 | 2/10 | 7/10 |
As shown in Table 1, increasing fusion depth systematically challenges both traditional planners and foundation models.
How it works
- Domain fusion: merge objects, predicates, and actions from two base domains with systematic renaming to prevent overlap.
- Probabilistic modification: add, remove, and negate predicates in preconditions and effects using configurable probabilities.
- Problem generation: execute random action sequences to derive solvable goal states.
- Complexity analysis: measure structural expressivity using k-WL tests across increasing depth levels.
- Planner evaluation: test FD(FF), LPG, GOOSE, and an AVI-based model using Batch Weighted A* search.
Key contributions
- A configurable tool for generating novel PDDL domains via domain fusion rather than reconstruction.
- Empirical evidence that increasing fusion depth reduces solvability for domain-independent planners (for example, LPG declines from 10/10 at Depth 1 to 4/10 at Depth 5).
- Demonstration that training on PDDLFuse-generated domains improves generalization for an AVI-based model compared to GOOSE at higher depths.
- WL-based expressivity analysis showing that at Depth 5, 55% of domains require 3-WL and 20% require greater than 3-WL, indicating increasing structural complexity.

Figure 1 shows that deeper fused domains increasingly require higher-order WL tests.
Recommended citation: Vedant Khandelwal, Amit Sheth, and Forest Agostinelli. (2025). "PDDLFuse: A Tool for Generating Diverse Planning Domains." GenPlan Workshop, AAAI 2025.
Download Paper
