A short story about learning reinforcement for AIs

2023 has been the year of the generative AI. A huge part of their training comes from learning reinforcement driven by humans. Just like we train pet animals, humans evaluates the results from IA suggestions and reward good answers better than wrong answers. But where will that lead? I wanted to write a short story on the topic.

A serious note

You can find some serious inputs on the topic here. Some may find it a bit technical.

The fiction

Below is the short story which came from researching on learning reinforcement. If you liked it, find more about my short stories here.

“DL2015, congratulations for this intervention, rating 15 points. Possible enhancements are listed below.

  • The two suspects had not been confirmed guilty at time of arrest.
  • The suspects did not threaten DL2015 or were in position to do so.
  • Lethal intervention from DL2015 resulted in their early demise.”

When Senior Programmer Lincoln turned from the recharge station where DL2015 rested now, Colonel Dexter watched her, red-faced. “A success? Two suspects killed? Guilty of stealing food in a supermarket? And, it gets a reward?”

The robot was now at rest, its metal casing still gleaming, its eyes closed, all system shut down during the break. Only the blood smeared on its arm and hands betrayed its earlier fight. The handcuffs at his side gleamed, still untouched.

“Well, positive reinforcements is the rule, and DL2015 stopped two criminals. A reward is in order. But you’ll notice it’s a small reward. Full reward would bring 100 points…”

“There should be no reward when killing suspects…” replied the Colonel sternly. “They were immobilized on the ground, unable to attack, and they didn’t deserve to be executed.

Lincoln took Dexter’s hands in hers. Her face was turned away from the robot.

“Please Colonel, remember that DL2015 listens to us right now. Not rewarding a successful action is against the rule, and DL2015 intervenes against anyone who doesn’t respect the rules.”

As if on cue, DL2015 opened its eyes and turned toward the pair. The servomotors controlling the arms and legs hummed softly, ready for action. The Colonel took a step back.

“But it is resting in its recharge shelter. It is inactive now, isn’t it…?”

“It is ok to take some time to understand how well DL2015 works. It’s still connected with our network, and it has an emergency power battery in case of need. Its military armor makes it resistant to all the weapons available in the city.”

The Colonel glanced at his sidearm. It wouldn’t pierce the robot’s armor. Instantly, the eyes refocused on his weapon.

“You are still in control, aren’t you?”

“Of course we are. How could it be otherwise?” replied the scientist nervously. “DL2015 is still young. He hasn’t thought about arresting these criminals. So, rather than criticising a laudable effort to re-establish order here, perhaps you’d like to tell him directly that you support his desire to progress. That would avoid any future misunderstanding.”

Rather than criticizing a laudable effort to restore order to our home, perhaps you’d like to tell him directly that you support his desire to progress. It would avoid further misunderstanding.”

The Colonel swallowed and his hands trembled.

“Yes, sure… Congratulations DL2015. You have to understand that, dead, we couldn’t interrogate them… This is an issue for us.”

The Senior Programmer released his hands and beamed at the robot…

“Oh, this is something else. Another room for improvement and positive reinforcement. DL2015, have you heard the Colonel? Live criminals will provide higher bonus points than dead ones, especially if they are able to communicate. Once immobilized, handcuffs are the best tool to use.”

The metallic voice filled the hall.

“The use of handcuffs has been clarified. How many points if I bring back live criminals?”

“Oh, a good 50. What say you, Colonel?”

The Colonel shuddered.

“Maybe even 70 if they can communicate. That would be much better.”