Multi-Objective Constrained Reinforcement Learning for Joint Routing–MAC–Duty Cycling in Low-Power Wireless Sensor Networks

Title

Authors

Ghaida Muttashar Abdulsahib
College of Computer Engineering, University of Technology, IRAQ
Mohammed Awad Mohammed Ataelfadiel
Applied College, King Faisal University, Saudi Arabia

Abstract

Introduction: Wireless Sensor Networks (WSNs) face significant challenges in balancing energy efficiency, latency, and reliability while operating under severe resource constraints. Current methods either optimize network layers separately or use static cross-layer coordination, which doesn’t work well when conditions change.
Purpose: The aim of this study is to introduce a Constrained Multi-Objective Reinforcement Learning Model (CMORLM) for optimizing Joint Routing, Medium Access Control (MAC), and Duty Cycling Optimization (DCO) in low-power WSNs.
Methods: In this paper, we have suggested the CMORLM approach as a constrained Markov Decision Process (MDP) with three competing goals: lowering Energy Consumption (EC), lowering End-to-End-Latency (EEL), and raising Packet Delivery Ratio (PDR). There are strict limits on the amount of residual energy, the buffer size, and the Quality of Service (QoS) requirements. Lagrangian Constraint Handling (LCH) and multi-objective policy gradients are combined within the primal-dual optimization method. For routing, MAC, and DCO, the policy network uses a shared encoder with factorized heads. Federated Gradient Aggregation (FGA) is used for distributed learning across Sensor Nodes (SN).
Results: Testing in NS-3 shows that EC is 34.2% lower, EEL is 41.3% higher, and PDR is 16.5% higher than Traditional Layered Protocols (TLP). Network Lifetime (NL) goes up by 38.4%. The constraint violation rate (CVR) is still below 1%. This is 23 times less than the CMORLM that was suggested. Ablation studies show that joint optimization increases the EC by 44.7% over single-layer control.
Conclusion: The suggested CMORLM works well on networks with 50 to 200 nodes and can handle changes in traffic, node failures, and mobile sinks. To enable operator control over performance trade-offs through weight configuration, Pareto frontier analysis is performed.

Keywords

WSNs, reinforcement learning, multi-objective optimization, constrained Markov decision processes, cross-layer optimization, energy efficiency

Classification-JEL

C44, C61, L96, C63

Pages

197-229

How to Cite

Abdulsahib, G. M. A., & Awad Mohammed Ataelfadiel, M. (2026). Multi-Objective Constrained Reinforcement Learning for Joint Routing–MAC–Duty Cycling in Low-Power Wireless Sensor Networks. Advances in Decision Sciences, 30(2), 197-229.

https://doi.org/10.47654/v30y2026i2p197-229