Behavior Modification Techniques
Lesson Five
REINFORCEMENT
SCHEDULES
At the turn of the century in
Perhaps, they thought, the questions that the professor asked where ones
that Hans had already solved. So they posed the problems. Hans pawed out the
answers.
They next concluded that somehow the professor must be telling the horse
the answers. The panel scrutinized the professor as the problems were presented
to Hans. The professor did nothing. Hans pawed out the answer. Still not
convinced, the professor was asked to move to a position where Hans could not
see him. A problem was given to Hans. The horse began to paw but he did not
stop at the correct number. Another problem was posed. Hans missed it again.
When Hans could not see the professor, he could not solve the mathematical
problems.
The men of science knew they were on to something. Unbeknownst to the
old professor, he had been communicating the correct answer to the horse. At
last the tale became obvious. When Hans pawed the correct number of times, the
professor's eyebrow twitched. Reading this almost invisible facial cue, Hans
learned that if he stopped pawing a lump of sugar would be his.
A pretty remarkable feat indeed, but even more amazing if you think
about the consistent performance by Hans. Trial after trial, demonstration
after demonstration, there was Hans supposedly pawing out the solutions to math
problems. We all should be so lucky to get our horses to behave in such a
consistent manner.
What system of behavioral manipulation did the professor unwittingly
use? Remember that Hans was rewarded with a lump of sugar after pawing the
correct number of times. The number of times Hans had to paw to solve the
problem and get the reward were randomized. For Hans,
each paw might be the one which would trigger the eyebrow and get the sugar.
This powerful reward schedule is known as variable ratio.
There
are basically two types of reward schedules; those that reward after a number
of attempts (ratio schedules), and those that reward after moments in time
(interval schedules).
The research on this subject of reward schedules has been exhaustive and
enlightening. There are basically two types of reward schedules
Ratio
Schedules: Reinforcement or
rewards for performance are delivered after the behavior has been attempted a
number of times.
Interval Schedules: Reinforcement or rewards for performance are
delivered after determined time intervals.
When training horses we are concerned primarily with ratio schedules. The
use of ratio reinforcement schedules to train and maintain behavior is the
secret to getting the most out of a horse.
There are two basic types of ratio schedules: fixed ratio schedules and variable ratio
schedules.
Rewarding on a fixed ratio schedules means that you determine the number
of times the behavior must be performed before the reward is presented. For example, you may reward the horse every
3rd time he performs the behavior correctly.
Using a variable ratio schedule, the reinforcement occurs after an
average number of responses but there is some unpredictability or variability
in the exact number needed. For example, you may decide that the horse should
be rewarded after 5 successful maneuvers.
On a variable ratio schedule, in a random fashion, you would reward
after 3 tries, then 6 tries, etc.
Several years ago, an experiment with a three-year-old Morgan mare
demonstrated some important principles of these reward schedules.
The mare was placed in a twelve-foot by twelve-foot stall with solid
walls and doors. A feed chute descended from the loft to a rubber bucket
attached to the wall. To one side of the bucket was a block of wood hinged over
a doorbell button. The first step was to teach Brilliance that flipping the
wood block - which would activate the electric doorbell button - would make
alfalfa pellets drop into the bucket.
In the beginning of her training whenever the mare turned in the
direction of the wood block, pellets would be given. Then she had to touch the
block with her nose. Then she had to flip it. Within forty-five minutes,
Brilliance could get pellets upon demand.
This is called shaping the behavior.
But, more about that in the next lesson.
Once the behavior of flipping the block to get the pellets was shaped,
the experiment was ready to begin.
For the first three days, the mare was rewarded for flicking the block
using a fixed ratio reinforcement schedule set at two. The goal was to determine
how many times during a fifteen-minute lab session Brilliance would flip the
wood block if she got pellets after each second toss. Brilliance did so
twenty-four times, averaging twelve handfuls of pellets each daily session.
For the next three days the ratio was set at seven -seven flips for one
reward. During the daily fifteen-minute lab sessions Brilliance worked the
hinged block about eleven times each day. Therefore she received on handful of
pellets after the seventh flip but the mare would not flip the lever an
additional seven times to get another reward.
The final three days the ratio was set at twelve. Brilliance only
flipped the lever about four times each session, and therefore, did not receive
any pellets. She had passed the point at which she was willing to work for
pellets.
This decreasing interest in maintaining behavior which will deliver a
reward is a well known phenomenon associated with fixed reward schedules. When
an animal knows what it takes to get the reward, it can decide whether the work
is worth the effort. Obviously, Brilliance thought alfalfa pellets were worth
two flips, but the cost became too great as the fixed ratio was increased.
Her judgment also was based on the value of the reward. To measure this
value, the experiment was repeated using a sugar cube as the reward. With a
fixed ratio of two, the mare got twenty-three sugar lumps in fifteen minutes.
At a ratio of seven, she would work for three. (The switch broke down before
the final ratio schedule of twelve was tested.) Sugar was definitely worth more
work.
When training horses, however, we do not want them to have the ability
to decide how much a reward is worth or how many times they will work to get
it. To avoid these problems, most horse
training involves variable ratio schedules during which a reward is given after
a varying number of tries.
Brilliance's behavior supports this theory. The mare was returned to the stall and for
three days was allowed to flip the block for alfalfa pellets on a variable
ration schedule. In other words, the
behavior was rewarded with pellets after a randomized sequence of attempts. Brilliance flipped the block of wood up to
twenty times in order to get her reward of a handful of pellets. No telling how many times she would have
flipped the block to get the more desirable lump of sugar!
The major advantage of this reward schedule is that horses like
Brilliance and Clever Hans do not know which attempt will be rewarded. Unable
to judge the relationship between the reward and the work, the horse will make
more attempts to get the reward. Behavior will be maintained for longer periods
between rewards.
This concept is vividly played out on the human species in casinos. Slot
machines work on a variable reward schedule. It is difficult to walk away when
the next quarter may trigger the next winning roll of the apples, oranges and
pears. To make this system even more powerful, the slot machines tease by
occasionally giving us a little taste of the reward - a handful of quarters.
Horse trainers should strive to reward their horses for proper behavior
on a completely randomized variable ratio schedule. Unfortunately, many attempts at using
this system are thwarted when the reward system subconsciously falls to a sequence
ratio, which is a take-off of fixed ratio.
In a sequence ratio, the reward is presented after a particular sequence
of tries. For example, a horse may be
rewarded after 2 correct responses, and then it will take 3, then 4 correct
responses to get the reward.
A favorite sequence of many horsemen seems to be 1,2, then 1,2,3, or 2,3, then 2,3,4. This kind of
cha-cha-cha sequence is very noticeable when a horse is being worked on a side
pass.
As a horse develops proficiency in the lateral crossover, many trainers
unknowingly develop a pattern. They may ask the horse to make three correct
lateral leg movements to the right before a pause is given. Then four more
steps to the right are accomplished before another break is presented. When
they change direction, six steps are taken to the left. Once
the horse has achieved that pattern, that part of the training session is
completed. If this patterned
sequence is done more than a couple of times, chances are the horse will know
what you are going to do before you do it.
If you always pause at a certain point during
training, but it is
not possible to pause at that place during competition, the horse may pause
anyway. If you always change strides in the same place, the horse may start to
change stride without any cue being given.
For the performance horse, learning patterns can be a definite drawback
to his potential. A horse trained by sequences of movements becomes less
pliable. He anticipates what is going to come next. If you always pause at a certain
point during training, but it
is not possible to pause at that place during competition, the
horse may pause anyway. If you always change strides in the same place, the
horse may start to change stride without any cue being given. If a horse knows
that every time he is run down the middle of the arena you are going to haul
back on his mouth for a killing stop, he may start scotching or jumping in
anticipation of the pain.
Horses are creatures of habit, and as trainers we can use this to our
advantage. There are times early in training when it is useful for a horse to
know what is expected of him, and to know that he will be rewarded for every
correct response. Once the behavior is shaped, however, the possibility of
winning in competition is enhanced when a horse wants to work and perform up to
the limits of his own ability. To make a horse all that he can be, horsemen
must understand how to effectively use reward systems. We must constantly be on
guard that we do not slip into a pattern distinguishable by the horse.
You now have the second key.