Specifying goals to deep neural networks with answer set programming

Published in International Conference on Automated Planning and Scheduling (ICAPS), 2024

This paper shows how logical goal specifications can condition a learned heuristic so one trained DNN can solve many goal variants without retraining.

Why it matters

DNN-based heuristics for planning typically assume a fixed, pre-defined goal.
Changing the goal often requires retraining or enumerating all acceptable goal states.
Many real problems require specifying properties of a goal rather than a single exact state.
There is no formal mechanism to express logical goal constraints directly to a trained DNN.
In domains with many valid target states (for example, Rubik's cube patterns), enumerating all goals is computationally burdensome.

What we did

Introduced a goal-conditioned DQN, Q(s, a, G), that estimates cost-to-go for a set of goal states.
Represented goals as sets of ground atoms in first-order logic.
Used Answer Set Programming (ASP) to generate stable models that define goal specifications.
Trained with random walks (100-200 moves for Rubik's cube starts) and subsampled logical goal atoms.
Combined the learned heuristic with weighted batched Q* search (batch size 10,000; weight 0.6).
Demonstrated diverse Rubik's cube and Sokoban goals without retraining the DNN.
Showed broader goal specifications can reduce solve time (for example, Cross6: 218.45 s vs canonical: 625.62 s).

How it works

Goal-conditioned Q-learning: learn Q(s, a, G) where G is a set of ground atoms.
Goal generation: convert a state to logical atoms and remove subsets to create generalized goals.
ASP specification: use stable models from an ASP program to represent goal sets.
Model refinement: if a stable model is not a valid goal model, iteratively expand it.
Search: use batched weighted Q* search to reach states satisfying the logical goal.

Training and specification pipeline for goal-conditioned DQN with ASP-based goal construction. — Figure 1. Overview of training goal-conditioned DQN and specifying goals via ASP.

Figure 1 illustrates how logical specifications are converted into ground atoms and passed into the DQN.

Key contributions

Formalizes goal specification to DNN heuristics using first-order logic and ASP.
Introduces a training procedure that generalizes across unseen goals without retraining.
Demonstrates diverse Rubik's cube goals (for example, Cross6, CupSpot, Checkers) and Sokoban goals.
Empirically shows that broader goal sets can reduce solve time and path cost (for example, Cross6 vs canonical).

Recommended citation: Forest Agostinelli, Rojina Panta, and Vedant Khandelwal. (2024). "Specifying goals to deep neural networks with answer set programming." Proceedings of the ICAPS, vol. 34, pp. 2-10.
Download Paper