Graph Neural Network Based Action Ranking for Planning

Rajesh Mangananvar Stefan Lee Alan Fern Prasad Tadepalli

Oregon State University Logo

Shifts from learning global value functions to ranking local actions—a simpler, graph-based approach that generalizes from small training examples to much larger real-world problems, achieving 89% success on tasks 8× larger than training data (vs. 6.5% for value-based methods).

Abstract

We propose a novel approach to learn relational policies for classical planning based on learning to rank actions. We introduce a new graph representation that explicitly captures action information and propose a Graph Neural Network (GNN) architecture augmented with Gated Recurrent Units (GRUs) to learn action rankings. Unlike value-function based approaches that must learn a globally consistent function, our action ranking method only needs to learn locally consistent ranking. Our model is trained on data generated from small problem instances that are easily solved by planners and is applied to significantly larger instances where planning is computationally prohibitive. Experimental results across standard planning benchmarks demonstrate that our action-ranking approach not only achieves better generalization to larger problems than those used in training but also outperforms multiple baselines (value function and action ranking) methods in terms of success rate and plan quality.

Why Learning for Planning? [Expand All]

The Problem with Existing Approaches [Expand All]

GABAR's Key Insight [Expand All]

Method Overview [Expand All]

Results [Expand All]

Table 2 (from paper): Main results showing GABAR vs. Baselines and Ablations.

Domain	Diff	Baselines						GABAR		Ablations
		GPL		ASNets		GRAPL		GABAR		GABAR-ACT		GABAR-CD		GABAR-RANK
		C↑	P↑	C↑	P↑	C↑	P↑	C↑	P↑	C↑	P↑	C↑	P↑	C↑	P↑
Blocks	E	100	1.1	100	1.6	64	0.65	100	1.5	44	0.65	100	0.92	29	0.79
	M	45	0.68	100	1.5	48	0.44	100	1.6	14	0.49	92	0.81	21	0.71
	H	10	0.33	92	1.4	38	0.28	100	1.7	4	0.35	81	0.80	9	0.61

Miconic	E	97	0.97	100	1.0	68	0.56	100	1.0	35	0.55	97	0.88	42	0.67
	M	37	0.56	100	0.98	65	0.54	100	0.97	18	0.33	94	0.86	29	0.37
	H	19	0.29	90	0.92	60	0.49	100	0.95	2	0.27	88	0.83	16	0.29

Spanner	E	73	1.1	78	0.86	22	0.65	94	1.1	31	0.65	87	0.98	57	0.82
	M	42	0.56	60	0.69	5	0.55	93	0.99	11	0.27	81	0.93	42	0.77
	H	3	0.18	42	0.61	0	-	89	0.91	0	-	62	0.79	12	0.45

Gripper	E	100	1.0	78	0.98	26	0.95	100	1.1	31	0.56	95	1.0	55	0.58
	M	56	0.85	54	0.91	12	0.67	100	0.99	23	0.40	92	0.93	43	0.41
	H	21	0.74	42	0.88	0	-	100	0.96	9	0.28	87	0.86	21	0.33

Visitall	E	69	1.3	94	0.96	92	1.1	93	1.1	72	1.2	91	1.1	52	0.64
	M	15	0.76	86	0.93	88	1.0	91	1.0	64	0.93	89	1.1	46	0.56
	H	0	0	64	0.81	78	0.99	88	1.1	44	0.67	83	1.2	39	0.54

Grid	E	74	0.89	52	0.81	20	0.38	100	0.91	21	0.56	79	0.87	17	0.54
	M	17	0.61	45	0.66	3	0.28	97	0.85	8	0.46	71	0.65	12	0.28
	H	0	0	21	0.60	0	-	92	0.74	0	-	54	0.53	0	-

Logistics	E	56	0.61	39	0.71	32	0.81	90	0.75	12	0.64	31	0.86	41	0.65
	M	7	0.21	22	0.55	9	0.45	76	0.65	3	0.49	25	0.54	21	0.49
	H	0	0	4	0.39	0	-	71	0.59	0	-	6	0.35	0	-

Rovers	E	64	0.99	67	0.96	21	0.35	87	1.0	22	0.75	44	0.81	33	0.67
	M	9	0.32	56	0.87	5	0.19	82	0.96	6	0.66	37	0.63	9	0.56
	H	0	0	31	0.64	0	-	77	0.97	0	-	19	0.57	0	-

Combined	E	79.1	0.98	76	0.98	43.5	0.67	95.5	1.04	33.5	0.69	78	0.93	40.2	0.67
	M	28.5	0.56	65.4	0.88	29.3	0.51*	92.2	1.01	18.4	0.50	72.7	0.80	27.8	0.51
	H	6.5	0.39*	48.5	0.78	22.1	0.58*	89.2	0.99	7.4	0.39*	60	0.73	12.1	0.44*

Table 4 (from paper): GABAR substantially outperforms SOTA LLMs, which fail at complex planning.

Domain	Diff	OpenAI-O3		Gemini-2.5-Pro		GABAR
Domain	Diff	C↑	P↑	C↑	P↑	C↑	P↑
Blocks	E	73	1.03	81	1.1	100	1.5
	M	41	0.95	47	0.86	100	1.6
	H	4	0.61	12	0.81	100	1.7

Miconic	E	56	0.81	79	0.86	100	1.0
	M	12	0.69	36	0.58	100	0.97
	H	0	-	12	0.51	100	0.95

Spanner	E	38	0.81	42	0.75	94	1.1
	M	13	0.77	10	0.64	93	0.99
	H	0	-	0	-	89	0.91

Combined	E	33.4	0.85	44.0	0.8	95.5	1.04
	M	11.6	0.77*	17.1	0.68*	92.2	1.01
	H	0.4	0.61*	1.5	0.51*	89.2	0.99

The tables below show that GABAR not only solves *more* problems (Coverage, C) but also finds solutions that are high quality (Plan Quality Ratio, P) and efficient (Plan Length, PL). Baselines that solve fewer problems often appear to have low plan lengths simply because they are only solving the easiest instances.

Table 6 (from paper): Detailed comparison including Plan Length (PL). GABAR's plan lengths are for a much higher coverage of problems.

Domain	Diff	GPL			ASNets			GRAPL			GABAR
Domain	Diff	C↑	P↑	PL↓	C↑	P↑	PL↓	C↑	P↑	PL↓	C↑	P↑	PL↓
Blocks	E	100	1.1	55	100	1.6	38	64	0.65	79	100	1.5	41
	M	45	0.68	137	100	1.5	79	48	0.44	214	100	1.6	76
	H	10	0.33	423	92	1.4	156	38	0.28	585	100	1.7	125

Miconic	E	97	0.97	167	100	1.0	162	68	0.56	252	100	1.0	160
	M	37	0.56	235	100	0.98	180	65	0.54	280	100	0.97	181
	H	19	0.29	448	90	0.92	209	60	0.49	329	100	0.95	202

Spanner	E	73	1.1	27	78	0.86	36	22	0.65	35	94	1.1	31
	M	42	0.56	61	60	0.69	55	5	0.55	50	93	0.99	44
	H	3	0.18	208	42	0.61	79	-	-	-	89	0.91	67

Gripper	E	100	1.0	82	78	0.98	76	26	0.95	61	100	1.1	78
	M	56	0.85	128	54	0.91	118	12	0.67	128	100	0.99	133
	H	21	0.74	139	42	0.88	131	-	-	-	100	0.96	156

Visitall	E	69	1.3	79	94	0.96	121	92	1.1	106	93	1.1	103
	M	15	0.76	188	86	0.93	194	88	1.0	181	91	1.0	179
	H	-	-	-	64	0.81	333	78	0.99	272	88	1.1	243

Grid	E	74	0.89	41	52	0.81	47	20	0.38	77	100	0.91	45
	M	17	0.61	66	45	0.66	73	3	0.28	143	97	0.85	63
	H	-	-	-	21	0.60	101	-	-	-	92	0.74	98

Logistics	E	56	0.61	117	39	0.71	101	32	0.81	88	90	0.75	127
	M	7	0.21	305	22	0.55	138	9	0.45	169	76	0.65	159
	H	-	-	-	4	0.39	217	-	-	-	71	0.59	232

Rovers	E	64	0.99	17	67	0.96	19	21	0.35	44	87	1.0	21
	M	9	0.32	78	56	0.87	36	5	0.19	125	82	0.96	33
	H	-	-	-	31	0.64	55	-	-	-	77	0.97	45

Combined	E	79.1	0.98	69	76.0	0.98	73	43.5	0.67	91	95.5	1.04	76
	M	28.5	0.56	148	65.4	0.88	111	29.3	0.51	172	92.2	1.01	108
	H	6.5	0.39*	265	48.5	0.78	158	22.1	0.58*	276	89.2	0.99	147

Key Findings from Ablations [Expand All]

Technical Details [Expand All]

Why This Matters [Expand All]

BibTeX


      @inproceedings{mangannavargraph,
        title={Graph Neural Network Based Action Ranking for Planning},
        author={Mangannavar, Rajesh Devaraddi and Lee, Stefan and Fern, Alan and Tadepalli, Prasad},
        booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems}
      }