Monotone optimal policies for left-skip-free Markov decision processes

S. Stidham and R.R. Weber. In Applied Probability and Stochastic Processes, Kluwer Academic Publishers, J. George Shanthikumar and Ushio Sumita eds, 13, 191-202, 1999.

In a previous paper (Stidham and Weber, 1989) we considered a variety of models for optimal control of the service rate in a queueing system, in which the objective is to minimize the limiting average expected cost per unit time. By standard techniques, we showed how to convert such a problem into an equivalent problem in which the objective is to minimize the expected total (undiscounted) cost until the first entrance into state zero. Under weak assumptions on the one-stage (service plus holding) costs and transition probabilities, we showed that an optimal policy is monotonic, that is, a larger service rate is used in larger states. In contrast to previous models in the literature on control of queues, we assumed that the holding cost was non-decreasing, but not necessarily convex, in the state. A common assumption in all the models was that the state transitions were skip free to the left: a one-step transition from state $i$ to a state $j<i-1$ is impossible. Many queueing models have this property, including all birth-death models, as well as a variety of M/GI/1-type models, including models with batch arrivals, phase-type service times, and LCFS-PR queue discipline.

Julian Keilson introduced the concept of skip-free transitions in a Markov chain in Keilson (1962). The significance of left-skip-free transitions in the queueing control models of \cite{Sti89} is that the problem of optimally moving the system from state $i$ to state $0$ can be decomposed into two separate problems: first, to move the system optimally from state $i$ to state $i-1$, and then to move the system optimally from state $i-1$ to state $0$. This decomposition was exploited in two ways in Stidham and Weber (1989) to prove monotonicity of an optimal policy: (1) to facilitate simple coupling or pairwise-switch arguments in the case of additive transitions; and (2) to construct a proof based on downward induction on the state variable in the case of non-additive transitions.

In the present paper we extend the analysis in Stidham and Weber (1989) to a general class of Markov decision processes with left-skip-free transitions.

back to list of papers