Operant Conditioning: What B.F. Skinner's Experiments Revealed About Learning Reinforcement

Operant conditioning is a theory of learning in behavioral psychology which emphasises the role of reinforcement in conditioning. It emphasises the effect that rewards and punishments for specific behaviors can have on a person’s future actions. The theory was developed by the American psychologist B. F. Skinner following experiments beginning in the 1930s, which involved the use of an operant conditioning chamber. Operant and classical conditioning remain important theories in our understanding of how humans and other animals learn new forms of behavior.

Early Developments in Conditioning: Pavlov’s Dogs

Early research into conditioning was conducted by the Russian physiologist Ivan Pavlov. During studies of digestion in dogs, he noticed that his subjects would salivate when a researcher fed them. After the researcher had opened a door, entered the room and fed the dogs a few times, the animals began to associate the door opening with food, and would begin to salivate whenever they heard the door. Through associative learning, the dogs had linked an neutral stimulus (the door opening) with an unconditioned stimulus (food). Repeated classical conditioning had led to the door becoming a conditioned stimulus, which prompted the dogs to salivate.

Conditioning

Pavlov conducted additional research, known as the ‘Pavlov’s dog’ experiments, in which he further investigating classical conditioning as a form of learning.

Exposing dogs to a variety of stimuli before feeding them, he discovered that the animals could be conditioned to salivate in response to different types of event, such as the ringing of a buzzer or the sounding of a metronome (Pavlov, 1927).

Thorndike’s Law of Effect

In 1905, American psychologist Edward Thorndike proposed a ‘law of effect’, which formed the basis of our modern understanding of operant conditioning. Thorndike’s research focussed on learning processes and he conducted experiments to discover how cats learn new forms of behavior.

He would place a cat in a puzzle box, where the animal would be remain until they learnt to press a lever. Initially, they would be trapped in the box for a long period of time, roaming it before inadvertently pressing the lever, and a door opened for the cat to escape. However, once the cats learnt to associate operating the lever with a positive outcome - being able to leave the box - they wasted less and less time before using it to escape. Through instrumental learning, the cats had learnt to associate pressing the lever with the reward of freedom (Thorndike, 1898).

Thorndike drew on these findings when developing his law of effect. He argued that the effect of one’s action - whether it is rewarded or punished - influences whether an individual will be likely to repeat such behavior in the future.

B. F. Skinner

Burrhus Frederic Skinner (1904-1990) was an influential American psychologist, writer and inventor. Born in Susquehanna, Pennsylvania, he studied at Hamilton College in New York, where he graduated in 1926 with plans to pursue a career in writing. However, a lack of success as an author, and his discovery of the theories of Ivan Pavlov, prompted an interest in psychology. He enrolled at Harvard University, where he completed his masters and in 1931, a doctorate. Skinner remained in a teaching position at Harvard whilst continuing his research. In 1938, he outlined a theory of learning involving operant conditioning.

Aside from his work in psychology, Skinner was also a keen inventor. During the Second World War, he took part in Project Pigeon, a failed attempt to create a missile controlled by pigeons. Amongst his more successful inventions was the air crib, a temperature-controlled environment for babies, which he used with one of his own children.

Skinner retired from Harvard University in 1974. He died from leukemia in 1990.

Operant Conditioning Chamber

When B. F. Skinner began studying psychology, it was the theories and ideas of the behaviorist school dominated the discipline. Many psychologists agreed with the proposals made by John B. Watson (1878-1958). In 1913, he published “Psychology as the Behaviorist Views It”, an article now cited as the “behaviorist manifesto”. Watson argued that the human mind could be most effectively understood by looking at a person’s observable behavior, rather than his or her cognitive processes, which he believed were more difficult to observe and quantify.

As a fellow behaviorist, Skinner believed that conditioning played a significant role in the learning process. He studied Thorndike’s law of effect using a piece of experimental apparatus now known as an operant conditioning chamber, or ‘Skinner box’. An animal is placed in a box, which contains a reward mechanism such as a hopper to dispense food pellets. A researcher can observe the animal whilst administering rewards. Punishments can also be imposed using the electrified base of the box to deliver electric shocks. A light and a speaker built into the side of the chamber allow for a signal to be communicated to the subject, whilst the animals are given with a lever to press.

In 1938, Skinner published The Behavior of Organisms, in which he described the functions of operant conditioning. Whilst experimenting with an operant conditioning chamber, he had found that animals behaving in a particular manner would either repeat or avoid that behavior depending on whether they were subsequently rewarded or punished.

In one experiment, Skinner (1948) observed the behavior of pigeons in the box. The birds were free to move around the box, turning full circle and moving their heads. Meanwhile, the hopper fed the subjects at regular timed intervals, regardless of the their behavior. Skinner found that when a bird’s particular movement was coincidentally but repeatedly followed by food, the pigeons would interpret the behavior as having caused the hopper to offer a reward. A variety of superstitious behaviors, including twists and full-circle turns, were adopted by the birds in the expectation that food would follow.

Skinner explained the pigeons’ behavior in terms of operant conditioning. The food served as a positive reward for the birds’ behavior, leading them to repeat a particular movement more often when they found that it was subsequently rewarded.

The reinforcements and punishments which influence behavior take a number of forms. A positive reward or punishment describes the imposition of a stimuli in a situation. Depending on the stimuli, this may either promote or discourage an individual’s behavior. Conversely, negative rewards and punishments involve the removal of a particular benefit or punishment. Again, these reinforcements may influence a person’s future actions.

Positive Reinforcement

A positive reinforcement is the provision of a reward or other benefit following a desirable action. This encourages a person or animal to repeat a particular behavior in future, in the hope that the reinforcement will be repeated.

Examples of positive reinforcements include:

A dentist gives a boy a sticker after he remains calm throughout a dental check-up. The child will be encouraged to behave well at the dentist’s practice in future, expecting that he will receive more stickers.
Rewarding a dog with a treat after it has successfully completed a training maneuver when rehearsing for a dog show.
In some food courts, electronically-operated waste bins contain a sensor and a speaker. When the bin senses a person emptying waste into the receptacle, the speaker emits a recorded voice which thanks the user for using the bin, instead of choosing to leave their litter. This appreciation may lead the user to seek gratification again by using the waste bin in the future.

Negative Reinforcement

Negative reinforcements are the removal of an undesirable or uncomfortable stimuli from a situation. Such reinforcements may involve the ceasing of punishment when a person’s behavior conform to a demand. In order to avoid future punishment, an individual may change his or her behavior. For example:

A girl who regularly fights with her sister is told by her parents that she is to be grounded on the days that she misbehaves. On days when the girl changes her behavior, the punishment is lifted, and she learns act more amicable towards her sister.
A person climbing into a hot bath is burnt and quickly climbs out of the water. Subsequently, they learn to wait for the bath to cool before entering the water in order to avoid being burnt again.
A man attends a music concert. The band is uncomfortably loud and he leaves the concert hall to find a quieter environment. In future, he declines invitations to watch bands in order to avoid the loud music, which operated as a negative reinforcement.

Positive Punishment

A positive punishment is a stimuli imposed on a person when they behave in a particular way. Over time, the person learns to avoid the positive punishment by altering their behavior.

Examples of positive punishment:

A child is sent to his room when he is impolite to his mother. The boy, who wants to play with his toys downstairs, begins to be more polite to his parents.
An internet service provider limits users’ usage to a set amount of data, after which the user’s internet speed is severely reduced for the remainder of the month. Users learn to avoid slow download speeds by using less of their data allowance.
A convict flouts the rules of a prison. He is placed in solitary confinement as a form of positive punishment, and eventually chooses to follow the rules to avoid further isolation.

Negative Punishment

Negative punishment is the removal of a benefit or privilege in response to undesirable behavior. A person wants to retain the benefits that they previously enjoyed, and avoids behavior which may lead to their rights being revoked.

Negative punishment examples include:

A child is prevented from attending a football game after failing to clean their room. The threat of further punishment leads them to complete their assigned chores.
A dog owner shouts at their pet after it runs away in a park. The dog, wanting to avoid being shouted at, learns to stay close to its owner whilst in the park.
A man strains his eyes after reading without his glasses. Although he dislikes wearing spectacles, he wears them to avoid straining his eyes.

As with its classical counterpart, operant conditioning depends on the repetition of a stimulus in order to maintain the association between behavior and a reinforcement. Initial conditioning is repeated in order to create an association, and must then be periodically repeated so that the link between the two is not lost. If, after initial conditioning, the reinforcement is removed (e.g. a treat is no longer given when a dog behaves), the subject will eventually ‘unlearn’ the association. Extinction can result in the person or animal resuming their original behavior.

Schedules of Reinforcement

Skinner was curious to find out what variables affected the effectiveness of operant conditioning. He conducted research into the effect of timing on conditioning with Charles B. Ferster, a fellow behavioral psychologist who worked at the Yerkes Laboratories of Primate Biology in Florida. Ferster and Skinner (1957) found that schedules of reinforcement - the rate at which a reinforcement is repeated - can greatly influence operant conditioning.

A number of types of schedules of reinforcement have been proposed by Skinner, Ferster and others, including:

Continuous Reinforcement Schedules (CRF)

A reward or punishment is provided every time an individual exhibits a particular mode of behavior. Through continuous reinforcement, the subject learns that the result of their actions will always be the same. However, the dependability of continuous reinforcement can lead to it becoming too predictable. A subject may learn that a reward will always be provided for a type of behavior, and only carry out the desired action when they need the reward. For instance, a rat may learn that pushing a lever will always lead to food being provided. Given the security that this schedule of reinforcement provides, the rat may decide to save energy by only pressing the lever when it is sufficiently hungry.

Partial Reinforcement Schedules (PR)

Instead of responding every time a person behaves in a particular way, partial reinforcement involves rewarding behavior only on some occasions. A subject must then work harder to receive a reinforcement and may take longer to learn using this type of operant conditioning.

Partial reinforcement can be used following a period of initial continuous reinforcement to prolong the effects of operant conditioning. For example, an animal trainer might give a treat to a dog every time it sits on command. Once the animal has learnt that a reward provided for obeying the trainer, partial reinforcement may be used. The dog may receive a treat only every 5 times it obeys a command, but the conditioned behavior continues to be reinforced and extinction is avoided.

Partial reinforcement modifies the ratio between the conditioned response and reinforcement, or the interval between reinforcements:

Fixed-interval schedules
A reinforcement is only given at a set interval. For instance, an employer rewards company employees with an annual bonus to reward their work. The interval of one year is fixed, and the employees anticipate a reinforcement annually.
Variable-interval schedules
Reinforcements are provided at intervals which the subject is unaware of. Instead of paying an annual bonus, an employer might pay smaller bonuses, sometimes monthly, other times every 2, 3 or 4 months. The employee is unaware when the reinforcement will be given and is encouraged to work harder with the knowledge that bonuses could be decided at any time.
Fixed-ratio schedules
Fixed-ratio schedules require a subject to provide the conditioned response a predetermined number of times before a reinforcement is given. An example of a fixed-ratio schedule is an amusement arcade game which rewards the player with a toy on every 10th attempt.
Variable-ratio schedules
A variable-ratio schedule reinforces behavior depending on the number of responses made, but this ratio changes constantly. The amusement game described above might instead reward the 2nd, 6th, 20th and 21st attempts.

Differences from Classical Conditioning

Although classical and operant conditioning share similarities in the way that they influence behavior and assist in the learning process, there are important differences between the two types of conditioning.

During classical conditioning, a person learns by observation, associating two stimuli with each other. A neutral stimuli is presented in conjunction with another, unconditioned, stimulus. Through repetition, the person learns to associate the first seemingly unrelated stimuli with the second.

In contrast, operant conditioning involves learning through the consequences of one’s actions. It is the reinforcement that follows behavior which informs a person’s future actions. A person behaves in a particular manner and is subsequently rewarded or punished. They eventually learn to associate their original behavior with the reinforcement, and either increase, maintain or avoid their behavior in future in order to achieve the most desirable outcome.

Evaluation

Skinner’s theory of operant conditioning played a key role in helping psychologists to understand how behavior is learnt. It explains why reinforcements can be used so effectively in the learning process, and how schedules of reinforcement can affect the outcome of conditioning. Skinner’s research also addressed the use of behavioral shaping, whereby successive approximations of an expected response are also reinforced, leading a subject gradually towards the desired type of behavior.

An advantage of operant conditioning is its ability to explain learning in real-life situations. From an early age, parents nurture their children’s behavior using rewards. Praise following an achievement (e.g. crawling or taking a first step) reinforce such behavior. When a child misbehaves, punishments in the form of verbal discouragement or the removal of privileges are used to dissuade them from repeating their actions.

Operant conditioning can also be observed in its applications across a range of learning environments. Teachers reward students’ achievements with high grades, words of encouragement and star-shaped stickers on homework - all examples of positive reinforcement. Positive punishments - detention, exclusion or parents grounding their children until their behavior changes - serve to further influence behavior using the principles of operant conditioning. And its uses are not limited to influencing human behavior: dog trainers use reinforcements to shape behavior in animals and to encourage obedience.

Skinner’s theory has, however, been criticised for its oversimplification of the complex nature of human behavior. Operant conditioning is based on the idea that behavior is ‘learnt’ simply through the process of reinforcement. However, it neglects individual differences and the cognitive processes that influence behavior. This has led critics to label Skinner’s ideas as deterministic: operant conditioning assumes that environmental factors beyond a person’s control are responsible for their behavior. It fails to account for people’s ability to reason and to decide their actions according to their own free will.