Introduction to Generative AI
Introduction to Generative AI
What’s AI and ML?

Artificial Intelligence (AI)
Theory and development of computer systems able to perform tasks normally requiring human intelligence.
AI refers to the field of computer science that is focused on creating systems capable of performing tasks that typically require human intelligence. These tasks include things like understanding natural language, recognizing patterns, solving problems, and making decisions. The goal of AI is to create systems that can function independently and adaptively, meaning they can learn from experience and adjust to new inputs.
Machine Learning (ML)
Gives computer the ability to learn without explicit programming.
Machine Learning is a subset of AI, and it’s essentially a method of training algorithms to learn and make predictions or decisions based on data. Rather than manually coding software routines with a specific set of instructions to accomplish a particular task, the machine is “trained” using large amounts of data and algorithms that give it the ability to learn how to perform the task. This learning process could be supervised (where the correct answers are provided to guide the learning), unsupervised (where the algorithm learns patterns and information without a specific output in mind), or reinforced (where the algorithm learns via trial and error).
Example
For instance, an AI model could be programmed to recognize cats in images by using machine learning algorithms. Instead of explicitly programming the AI to recognize specific features of a cat (like whiskers, a tail, or ears), a machine learning model would be “trained” on a large dataset of cat images. Over time, the model would “learn” the common patterns and features associated with cats, and it would become proficient at recognizing cats in images it has never seen before.
Deep Learning
Imagine trying to learn a new recipe. At first, you’d focus on basic elements, like identifying ingredients and prepping them. As you get better, you start understanding more complex aspects, like how flavors blend and how to adjust cooking times for the best result. Deep learning is similar; it’s a type of machine learning that breaks down tasks into simpler, smaller parts that are easier to understand. Just like you learned to cook, deep learning models learn to solve problems by building up a layered understanding of the problem, from the simple to the complex.
Further Explanation about Deep Learning
Deep Learning is a subset of Machine Learning, which in turn is a subset of Artificial Intelligence. The ‘deep’ in deep learning isn’t a reference to any kind of deeper understanding achieved by the approach; rather, it stands for the idea of successive layers of representations. How deep (complex) these layers go depends on the model.
Imagine you’re trying to recognize what’s in an image. A simple way to do this would be to directly compare the image to a reference. But there are too many variables – light, size, orientation, etc. – making this approach impractical.
This is where deep learning shines. A deep learning model learns to identify features from the data itself. Going back to our image example, the first layer of a deep learning model might learn to understand pixels. The next layer could identify edges and corners by combining pixels. A layer after that could recognize simple geometric shapes, and so on. Each layer learns to extract increasingly complex features.
This layered approach allows deep learning models to handle complex data and tasks. For instance, they can power advanced artificial intelligence applications like voice-enabled TV remotes, voice assistants, autonomous vehicles, and more.
While the principles of deep learning are relatively simple, training deep learning models requires large amounts of data and computing power. This is because these models learn directly from raw data and need to process a lot of information to draw accurate conclusions.
In a nutshell, deep learning provides a way for machines to learn from experience and understand the world in terms of a hierarchy of concepts, with each concept defined through its relation to simpler ones.
Artificial Neural Networks (ANNs)
These are the ‘brains’ behind deep learning. Imagine a network of workers in a factory, each doing a small part of a larger job. Like these workers, an artificial neural network has many interconnected ‘nodes’ or ‘neurons’. Each node takes in information, does a little job on it, and passes it on. Just like each worker in a factory contributes to the final product, each node in the network contributes to the final decision or prediction. The “layers” of nodes allow the network to learn from simple to complex patterns.
Semi-supervised Learning
Imagine learning to ride a bike with an instructor initially guiding you. But once you’ve grasped the basics, you continue learning on your own by simply riding around. Semi-supervised learning is like this. A model is first trained on a small amount of “labeled” data (data with answers or correct outcomes), like learning with an instructor. Then, it continues learning from a large amount of “unlabeled” data (data without answers), like learning on your own. This method helps the model to understand the fundamentals from the labeled data and then improve its skills and adapt to a variety of situations using the unlabeled data.
Generative AI (Subset of Deep Learning)
Think of Generative AI as a talented artist that can create new, original works. But instead of paint and a canvas, it uses data and algorithms. It learns from a set of data, understands the patterns, and then creates new data that is similar to what it has learned.
Large language models (Subset of Deep Learning)
These are like very good writers or translators. They have been trained on a huge amount of text data, so they know a lot about language, grammar, facts about the world, and can generate human-like text. Imagine having read thousands of books and then being able to write a new, unique story that sounds like it could come from one of those books.
Deep learning models, or machine learning models in general, can be divided into two types generative and discriminative.
Generative model
This is a type of model in machine learning that can generate new data. For example, after seeing hundreds of pictures of dogs, a generative model could create a new image that looks like a dog, even though it’s not an actual picture of a real dog.
Discriminative model
This is another type of model in machine learning. But instead of creating new data, it’s good at telling data apart. If you show it a picture, it can tell you whether the picture is of a dog or a cat. It’s like a very good judge that can make distinctions based on what it has learned.
Simple Example
Imagine you have a box full of pictures of cats and dogs, and you pick one picture randomly.
Discriminative model:
Let’s say you show this picture to a friend (who represents our discriminative model). If your friend has seen many pictures of cats and dogs before, they can probably tell you whether the picture you’re holding is a cat or a dog. That’s what a discriminative model does – given some input (the picture), it learns to predict the output (is it a cat or a dog?). It does this by understanding the relationship between the input and output – for example, dogs usually have longer noses, cats have sharper ears, etc.
Generative model:
Now, imagine you have another friend who’s an artist (representing the generative model). If you tell this friend, “Draw me a dog”, they can create a brand new image of a dog based on their understanding of what dogs look like. That’s what a generative model does – it understands the structure and patterns in the data it’s seen (pictures of dogs and cats), and it can generate new data that fits those patterns (a new picture of a dog). It does this by learning the joint probability of the input and output – for instance, if it’s a dog, what are the chances it has long ears, or a wagging tail, etc.
Further Example
In the world of machine learning, “x” usually refers to the input data and “y” to the output or the label we want to predict.
Discriminative model:
The discriminative model learns the conditional probability distribution P(y|x). This means it learns to estimate the probability of a label (y) given some input data (x).
For example, let’s say x is a bunch of features describing an animal, such as the shape of its ears, the length of its tail, whether it barks or meows, and so on. And let’s say y is the label “dog” or “cat”. The discriminative model learns from a bunch of such labeled examples (x, y pairs) and figures out how to differentiate between dogs and cats based on the features x. So if we feed the model a new set of features (a new animal), it can predict whether it’s a dog or a cat.
Generative model:
On the other hand, the generative model learns the joint probability distribution P(x, y). This means it understands how the input data and the labels co-occur together.
Going back to our example, the generative model doesn’t just learn to differentiate between dogs and cats. It learns what a dog typically “looks like” in terms of the features (x) and what a cat typically “looks like”. Given a label y (“dog” or “cat”), it can generate a plausible set of features x (create a new description of an animal that aligns with typical descriptions of dogs or cats). In other words, it can create new data.
So, if the generative model has learned well, and you ask it for a “dog”, it could generate a set of features that describe a dog even though this specific dog was not in the training data. It’s generating new data based on what it has learned about the structure of the input and output.
Additional Info
Not generative AI when “Y” is a:
- number
- discrete
- class
- probability

Is generative AI when “Y” is a:
- Natural Language
- Image
- Audio

More Details about Generative AI
What is Generative AI?
- GenAl is a type of Artificial Intelligence that creates new content based on what it has learned from existing content.
- The process of learning from existing content is called training and results in the creation of a statistical model.
- When given a prompt, GenAl uses this statistical model to predict what an expected response might be-and this generates new content.
2 Generative Models
Generative Language Models:
Generative language models learn about patterns in language through training data.
Then, given some text, they predict what comes next.
Generative Image Models:
Generative image models produce new images using techniques like diffusion.
Then, given a prompt or related imagery, they transform random noise into images or generate images from prompts.
Transformers
Let’s think of Transformers as a smart robot translator in a United Nations conference. This translator doesn’t just translate sentences word by word. Instead, it understands the entire context of a sentence or a paragraph before making translations, which helps in maintaining the actual meaning of the content. This way, it can deal with the complexity and nuances of different languages. This is a shift from traditional translation methods and it’s why it revolutionized natural language processing.
Encoder and Decoder
In our translator robot, the “encoder” and “decoder” are two main components. The encoder is like the “listening ear” of the robot. It takes in the sentence in the original language, understands the full context, and creates a kind of “summary” that captures the meaning.
The decoder is like the “speaking mouth” of the robot. It takes the “summary” created by the encoder and constructs the translation in the target language, word by word, while keeping the overall context in mind.
Hallucinations
Sometimes, our translator robot can make mistakes. For example, it might start inventing phrases or words that don’t make sense in the translation, similar to a person “hallucinating” things that aren’t real.
These hallucinations could happen due to various reasons:
- Not enough training: The robot has not heard enough sentences to fully learn the languages.
- Noisy data: The sentences the robot learned from were full of errors or slangs, confusing the robot.
- Lack of context: The robot didn’t get enough information to understand the sentence.
- Lack of constraints: The robot was allowed too much freedom in its translations.
These hallucinations can lead to funny, nonsensical, or even incorrect translations, which is why researchers are constantly working on ways to reduce them.