Behavior Modification Techniques

By Dr. Jim and Lynda McCall

Copyright © 2003

 

 

 

Lesson Five

 

REINFORCEMENT SCHEDULES

 

          At the turn of the century in Europe, a professor set out to see if he could teach arithmetic to a horse. Clever Hans, his star pupil, gained worldwide recognition as the smartest horse alive. As his reputation increased, the scientific community decided to send some of the best minds of the day to examine and test Hans' ability to do math. The professor put on a good show. He would ask the problem, the horse would paw out the answer. Hans was rewarded with a lump of sugar. The scientists were baffled.

 

          Perhaps, they thought, the questions that the professor asked where ones that Hans had already solved. So they posed the problems. Hans pawed out the answers.

 

          They next concluded that somehow the professor must be telling the horse the answers. The panel scrutinized the professor as the problems were presented to Hans. The professor did nothing. Hans pawed out the answer. Still not convinced, the professor was asked to move to a position where Hans could not see him. A problem was given to Hans. The horse began to paw but he did not stop at the correct number. Another problem was posed. Hans missed it again. When Hans could not see the professor, he could not solve the mathematical problems.

 

          The men of science knew they were on to something. Unbeknownst to the old professor, he had been communicating the correct answer to the horse. At last the tale became obvious. When Hans pawed the correct number of times, the professor's eyebrow twitched. Reading this almost invisible facial cue, Hans learned that if he stopped pawing a lump of sugar would be his.

 

          A pretty remarkable feat indeed, but even more amazing if you think about the consistent performance by Hans. Trial after trial, demonstration after demonstration, there was Hans supposedly pawing out the solutions to math problems. We all should be so lucky to get our horses to behave in such a consistent manner.

 

          What system of behavioral manipulation did the professor unwittingly use? Remember that Hans was rewarded with a lump of sugar after pawing the correct number of times. The number of times Hans had to paw to solve the problem and get the reward were randomized. For Hans, each paw might be the one which would trigger the eyebrow and get the sugar. This powerful reward schedule is known as variable ratio.

 

 

          There are basically two types of reward schedules; those that reward after a number of attempts (ratio schedules), and those that reward after moments in time (interval schedules).

 

 

 

           The research on this subject of reward schedules has been exhaustive and enlightening. There are basically two types of reward schedules

 

                    Ratio Schedules:  Reinforcement or rewards for performance are delivered after the behavior has been attempted a number of times.

 

                    Interval Schedules:  Reinforcement or rewards for performance are delivered after determined time intervals.

 

               

          When training horses we are concerned primarily with ratio schedules. The use of ratio reinforcement schedules to train and maintain behavior is the secret to getting the most out of a horse.

 

          There are two basic types of ratio schedules:  fixed ratio schedules and variable ratio schedules.

 

          Rewarding on a fixed ratio schedules means that you determine the number of times the behavior must be performed before the reward is presented.  For example, you may reward the horse every 3rd time he performs the behavior correctly. 

 

          Using a variable ratio schedule, the reinforcement occurs after an average number of responses but there is some unpredictability or variability in the exact number needed. For example, you may decide that the horse should be rewarded after 5 successful maneuvers.  On a variable ratio schedule, in a random fashion, you would reward after 3 tries, then 6 tries, etc.

 

          Several years ago, an experiment with a three-year-old Morgan mare demonstrated some important principles of these reward schedules.

 

          The mare was placed in a twelve-foot by twelve-foot stall with solid walls and doors. A feed chute descended from the loft to a rubber bucket attached to the wall. To one side of the bucket was a block of wood hinged over a doorbell button. The first step was to teach Brilliance that flipping the wood block - which would activate the electric doorbell button - would make alfalfa pellets drop into the bucket.

 

          In the beginning of her training whenever the mare turned in the direction of the wood block, pellets would be given. Then she had to touch the block with her nose. Then she had to flip it. Within forty-five minutes, Brilliance could get pellets upon demand.  This is called shaping the behavior.  But, more about that in the next lesson.

 

          Once the behavior of flipping the block to get the pellets was shaped, the experiment was ready to begin.

 

          For the first three days, the mare was rewarded for flicking the block using a fixed ratio reinforcement schedule set at two. The goal was to determine how many times during a fifteen-minute lab session Brilliance would flip the wood block if she got pellets after each second toss. Brilliance did so twenty-four times, averaging twelve handfuls of pellets each daily session.

 

          For the next three days the ratio was set at seven -seven flips for one reward. During the daily fifteen-minute lab sessions Brilliance worked the hinged block about eleven times each day. Therefore she received on handful of pellets after the seventh flip but the mare would not flip the lever an additional seven times to get another reward.

 

          The final three days the ratio was set at twelve. Brilliance only flipped the lever about four times each session, and therefore, did not receive any pellets. She had passed the point at which she was willing to work for pellets.

 

          This decreasing interest in maintaining behavior which will deliver a reward is a well known phenomenon associated with fixed reward schedules. When an animal knows what it takes to get the reward, it can decide whether the work is worth the effort. Obviously, Brilliance thought alfalfa pellets were worth two flips, but the cost became too great as the fixed ratio was increased.

 

          Her judgment also was based on the value of the reward. To measure this value, the experiment was repeated using a sugar cube as the reward. With a fixed ratio of two, the mare got twenty-three sugar lumps in fifteen minutes. At a ratio of seven, she would work for three. (The switch broke down before the final ratio schedule of twelve was tested.) Sugar was definitely worth more work.

 

          When training horses, however, we do not want them to have the ability to decide how much a reward is worth or how many times they will work to get it.  To avoid these problems, most horse training involves variable ratio schedules during which a reward is given after a varying number of tries.

 

          Brilliance's behavior supports this theory.  The mare was returned to the stall and for three days was allowed to flip the block for alfalfa pellets on a variable ration schedule.  In other words, the behavior was rewarded with pellets after a randomized sequence of attempts.  Brilliance flipped the block of wood up to twenty times in order to get her reward of a handful of pellets.  No telling how many times she would have flipped the block to get the more desirable lump of sugar!

 

          The major advantage of this reward schedule is that horses like Brilliance and Clever Hans do not know which attempt will be rewarded. Unable to judge the relationship between the reward and the work, the horse will make more attempts to get the reward. Behavior will be maintained for longer periods between rewards.

 

          This concept is vividly played out on the human species in casinos. Slot machines work on a variable reward schedule. It is difficult to walk away when the next quarter may trigger the next winning roll of the apples, oranges and pears. To make this system even more powerful, the slot machines tease by occasionally giving us a little taste of the reward - a handful of quarters.

 

 

 

          Horse trainers should strive to reward their horses for proper behavior on a completely randomized variable ratio schedule.       Unfortunately, many attempts at using this system are thwarted when the reward system subconsciously falls to a sequence ratio, which is a take-off of fixed ratio.  In a sequence ratio, the reward is presented after a particular sequence of tries.   For example, a horse may be rewarded after 2 correct responses, and then it will take 3, then 4 correct responses to get the reward.

 

          A favorite sequence of many horsemen seems to be 1,2,  then 1,2,3, or 2,3, then 2,3,4. This kind of cha-cha-cha sequence is very noticeable when a horse is being worked on a side pass.

 

          As a horse develops proficiency in the lateral crossover, many trainers unknowingly develop a pattern. They may ask the horse to make three correct lateral leg movements to the right before a pause is given. Then four more steps to the right are accomplished before another break is presented. When they change direction, six steps are taken to the left. Once the horse has achieved that pattern, that part of the training session is completed.   If this patterned sequence is done more than a couple of times, chances are the horse will know what you are going to do before you do it.       

 

 

           If you always pause at a certain point during training, but it  is not possible to pause at that place during competition, the horse may pause anyway. If you always change strides in the same place, the horse may start to change stride without any cue being given.

 

 

 

          For the performance horse, learning patterns can be a definite drawback to his potential. A horse trained by sequences of movements becomes less pliable. He anticipates what is going to come next. If you always pause at a certain point during training, but it  is not possible to pause at that place during competition, the horse may pause anyway. If you always change strides in the same place, the horse may start to change stride without any cue being given. If a horse knows that every time he is run down the middle of the arena you are going to haul back on his mouth for a killing stop, he may start scotching or jumping in anticipation of the pain.

 

          Horses are creatures of habit, and as trainers we can use this to our advantage. There are times early in training when it is useful for a horse to know what is expected of him, and to know that he will be rewarded for every correct response. Once the behavior is shaped, however, the possibility of winning in competition is enhanced when a horse wants to work and perform up to the limits of his own ability. To make a horse all that he can be, horsemen must understand how to effectively use reward systems. We must constantly be on guard that we do not slip into a pattern distinguishable by the horse. 

 

          You now have the second key.

 

 

Click here to take Quiz 5