Typical learning approaches rely on a dataset with outputs curated by humans (an expensive and limiting process). Given a set of inputs and outputs, the traditional learning model (e.g., decision tree, support vector machine, baseline or convolutional neural network) can infer the relationship between the two. Because of human-in-the-loop limitation, a popular state-of-the-art approach is reinforcement learning used by Alphabet DeepMind and other groups;
recent success of Alpha Go is built on a reinforcement learning approach. Reinforcement learning is
based on learning from examples and collecting feedback through a self-supervising mechanism. Trial-and-error learning is about gathering more data to improve
the model, with unsupervised learning referring to learing that does not require human-generated training data. The differentiating factor of
reinforcement learning is that there is another independent model that will attempt to evaluate success or failure as a human would.
For example, playing Super Mario: one model is controlling video console inputs and playing while learning from trying again and again. A completely different
model is looking at the screen (or other input sources) and deciding whether or not the first model is succeeding or failing (based on points or "game over"
message). And that second model represents the "supervision" part. Both models would have to be accurate because if supervision is incorrect then the first
model will learn poorly, but they are utterly independent. However, for some tasks it may not be possible to replace human input with a model, particularly as the generating model must learn the distribution of the data (i.e., a description that allows to generate more data).
Once the model is trained, inference tends to be a relatively separate step that uses the previously constructed model. Even in reinforcement learning, after multiple stages of back-and-forth training are applied, a final model is generated which will then produce inferenced output. In our approach we take a more integrated and holistic method, combining inference and training into a continuous process.