Deep Learning (DL)
Deep Learning is a subset of Machine Learning that uses neural networks with multiple layers (deep architectures) to automatically learn representations from data. It is widely used for computer vision, natural language processing, speech recognition, and reinforcement learning.
Key Concepts in Deep Learning
- Neuron: Basic computation unit of a neural network.
- Activation Function: Determines output of a neuron (e.g., ReLU, Sigmoid, Tanh).
- Layers: Input, Hidden, Output layers forming a network.
- Forward Propagation: Input passes through layers to produce output.
- Loss Function: Measures error between predicted and actual output.
- Backpropagation: Algorithm to update weights using gradients to minimize loss.
- Optimizer: Method to adjust weights (e.g., SGD, Adam).
- Epochs & Batch Size: Number of complete passes through training data and subset of samples processed at a time.
Deep Learning Workflow
- Prepare dataset (images, text, audio, etc.).
- Preprocess data (normalize, encode, resize).
- Define neural network architecture (layers, activation functions).
- Choose loss function and optimizer.
- Train model using forward and backward propagation.
- Evaluate model on validation/test data.
- Tune hyperparameters or improve architecture if needed.
Example: Simple Deep Neural Network (MNIST)
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
# Load MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train/255.0, x_test/255.0
y_train, y_test = to_categorical(y_train), to_categorical(y_test)
# Build model
model = Sequential([
Flatten(input_shape=(28,28)),
Dense(128, activation='relu'),
Dense(64, activation='relu'),
Dense(10, activation='softmax')
])
# Compile model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
# Train model
model.fit(x_train, y_train, epochs=5, batch_size=32, validation_split=0.2)
# Evaluate model
loss, acc = model.evaluate(x_test, y_test)
print("Test Accuracy:", acc)
Deep Learning Architectures
- Feedforward Neural Network (FNN/MLP): One-directional flow, used for tabular data.
- Convolutional Neural Network (CNN): Extracts spatial features from images.
- Recurrent Neural Network (RNN/LSTM): Captures sequential dependencies in text, speech, time-series.
- Autoencoders: Feature learning, dimensionality reduction, anomaly detection.
- GANs: Two-network system for generating realistic data.
Summary
- Deep Learning uses multi-layer neural networks to learn representations automatically.
- Feedforward networks for general tasks, CNNs for images, RNNs/LSTMs for sequential data.
- Training involves forward propagation, loss calculation, backpropagation, and optimization.
- Advanced models: Autoencoders, GANs, Transformers for NLP.
Neural Networks (NN)
Neural Networks are computing systems inspired by the human brain. They consist of neurons (nodes) organized in layers to process input data, learn patterns, and make predictions.
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
# Simple NN for binary classification
model = Sequential([
Dense(8, input_dim=4, activation='relu'),
Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
Convolutional Neural Networks (CNN)
CNNs are specialized neural networks for processing grid-like data such as images. They use convolutional layers to automatically detect spatial hierarchies of features.
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
from tensorflow.keras.models import Sequential
model = Sequential([
Conv2D(32, (3,3), activation='relu', input_shape=(28,28,1)),
MaxPooling2D((2,2)),
Flatten(),
Dense(64, activation='relu'),
Dense(10, activation='softmax')
])
Recurrent Neural Networks (RNN)
RNNs are used for sequential data such as text, time series, or speech. They have loops to maintain memory of previous inputs.
from tensorflow.keras.layers import SimpleRNN, Dense
from tensorflow.keras.models import Sequential
model = Sequential([
SimpleRNN(50, input_shape=(10,1), activation='tanh'),
Dense(1, activation='sigmoid')
])
Generative Adversarial Networks (GANs)
GANs consist of two neural networks: a Generator that creates fake data and a Discriminator that tries to distinguish fake from real data. They compete in a game to improve data generation.
# Pseudocode Example
# Generator creates images, Discriminator evaluates them
# Training updates both networks iteratively
Reinforcement Learning (RL)
RL teaches agents to make decisions in an environment to maximize cumulative rewards. The agent learns from trial and error.
import gym
env = gym.make("CartPole-v1")
state = env.reset()
done = False
while not done:
action = env.action_space.sample()
state, reward, done, info = env.step(action)
Natural Language Processing (NLP)
NLP is a branch of Artificial Intelligence that enables computers to understand, interpret, and generate human language. It combines linguistics, computer science, and machine learning.
NLP is widely used in:
- Text classification (spam detection, sentiment analysis)
- Machine translation (Google Translate)
- Chatbots and virtual assistants (Alexa, Siri, ChatGPT)
- Speech-to-text and text-to-speech applications
- Named Entity Recognition (NER) and question answering
NLP Workflow
- Collect text or speech data.
- Text preprocessing (cleaning, tokenization, stemming, lemmatization).
- Feature extraction (Bag of Words, TF-IDF, Word Embeddings).
- Choose ML/DL algorithm (Naive Bayes, SVM, RNN, Transformer).
- Train the model and evaluate performance.
- Deploy model for prediction or analysis.
Text Preprocessing
Before using text in ML models, we clean and convert it into a numerical format.
Example: Tokenization and Lowercasing
import nltk
from nltk.tokenize import word_tokenize
nltk.download('punkt')
text = "Natural Language Processing is amazing!"
tokens = word_tokenize(text.lower())
print(tokens)
# Output: ['natural', 'language', 'processing', 'is', 'amazing', '!']
Example: Stopword Removal
from nltk.corpus import stopwords
nltk.download('stopwords')
stop_words = set(stopwords.words('english'))
filtered_tokens = [w for w in tokens if w not in stop_words]
print(filtered_tokens)
# Output: ['natural', 'language', 'processing', 'amazing', '!']
Example: Lemmatization
from nltk.stem import WordNetLemmatizer
nltk.download('wordnet')
lemmatizer = WordNetLemmatizer()
lemmatized = [lemmatizer.lemmatize(w) for w in filtered_tokens]
print(lemmatized)
# Output: ['natural', 'language', 'processing', 'amazing', '!']
Feature Extraction
Convert text into numeric vectors to use in ML algorithms.
1. Bag of Words (BoW)
from sklearn.feature_extraction.text import CountVectorizer corpus = ["I love AI", "AI is the future"] vectorizer = CountVectorizer() X = vectorizer.fit_transform(corpus) print(vectorizer.get_feature_names_out()) print(X.toarray())
2. TF-IDF
from sklearn.feature_extraction.text import TfidfVectorizer vectorizer = TfidfVectorizer() X = vectorizer.fit_transform(corpus) print(vectorizer.get_feature_names_out()) print(X.toarray())
3. Word Embeddings (Word2Vec / GloVe)
from gensim.models import Word2Vec sentences = [["natural","language","processing"], ["deep","learning","is","fun"]] model = Word2Vec(sentences, vector_size=50, window=3, min_count=1, workers=1) print(model.wv['natural']) # vector representation of 'natural'
Popular NLP Algorithms
1. Naive Bayes
Used for text classification (spam detection, sentiment analysis).
from sklearn.naive_bayes import MultinomialNB from sklearn.feature_extraction.text import CountVectorizer corpus = ["I love AI", "AI is boring", "I enjoy NLP", "I hate bugs"] labels = [1,0,1,0] # 1=positive, 0=negative vectorizer = CountVectorizer() X = vectorizer.fit_transform(corpus) model = MultinomialNB() model.fit(X, labels) print(model.predict(vectorizer.transform(["I love NLP"])))
2. Support Vector Machine (SVM)
Works well for text classification with TF-IDF features.
from sklearn.svm import SVC from sklearn.feature_extraction.text import TfidfVectorizer corpus = ["I love AI", "AI is boring", "I enjoy NLP", "I hate bugs"] labels = [1,0,1,0] vectorizer = TfidfVectorizer() X = vectorizer.fit_transform(corpus) model = SVC(kernel='linear') model.fit(X, labels) print(model.predict(vectorizer.transform(["AI is amazing"])))
3. RNN / LSTM
Used for sequential NLP tasks like sentiment analysis, text generation.
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense
model = Sequential([
Embedding(input_dim=5000, output_dim=64, input_length=10),
LSTM(64),
Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
4. Transformers (BERT, GPT)
State-of-the-art models for NLP tasks: text classification, Q&A, summarization.
from transformers import pipeline
classifier = pipeline("sentiment-analysis")
result = classifier("I love learning NLP")
print(result)
# Output: [{'label': 'POSITIVE', 'score': 0.999}]
Computer Vision (CV)
Computer Vision is a field of Artificial Intelligence that enables computers to interpret and understand visual data from the world, such as images and videos. The goal is to automatically extract meaningful information from images or videos and make decisions.
Applications of Computer Vision
- Image classification (e.g., recognize cats, dogs, objects)
- Object detection (e.g., detect pedestrians, cars)
- Image segmentation (e.g., medical image analysis)
- Face recognition (security systems, authentication)
- Autonomous vehicles (self-driving cars)
- Augmented reality & robotics
- Optical Character Recognition (OCR) - convert images of text to digital text
Computer Vision Workflow
- Collect image or video data.
- Preprocess images (resize, normalize, grayscale).
- Feature extraction (edges, shapes, textures, or embeddings).
- Choose model (traditional CV algorithms or deep learning models like CNN).
- Train, evaluate, and optimize the model.
- Deploy for real-world tasks (detection, classification, segmentation).
Image Preprocessing Examples
Preprocessing is crucial to improve model performance and reduce noise.
1. Reading and displaying images (OpenCV)
import cv2
# Load image
img = cv2.imread('image.jpg')
# Resize image
img_resized = cv2.resize(img, (224,224))
# Convert to grayscale
gray = cv2.cvtColor(img_resized, cv2.COLOR_BGR2GRAY)
cv2.imshow("Grayscale Image", gray)
cv2.waitKey(0)
cv2.destroyAllWindows()
2. Image normalization for deep learning
import numpy as np img_array = np.array(img_resized, dtype='float32') / 255.0 print(img_array.shape, img_array.min(), img_array.max())
3. Edge Detection (Canny)
edges = cv2.Canny(gray, threshold1=100, threshold2=200)
cv2.imshow("Edges", edges)
cv2.waitKey(0)
cv2.destroyAllWindows()
Popular Computer Vision Algorithms
1. Image Classification (CNN)
Classify images into predefined categories.
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
model = Sequential([
Conv2D(32, (3,3), activation='relu', input_shape=(64,64,3)),
MaxPooling2D(2,2),
Conv2D(64, (3,3), activation='relu'),
MaxPooling2D(2,2),
Flatten(),
Dense(128, activation='relu'),
Dense(10, activation='softmax') # 10 classes
])
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
2. Object Detection (YOLO / SSD)
Locate and classify multiple objects in an image.
# Using pre-trained YOLOv5 with PyTorch
import torch
model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)
img_path = 'image.jpg'
results = model(img_path)
results.show() # Display detection
3. Image Segmentation (U-Net)
Assign each pixel in an image to a class (useful for medical images, autonomous driving).
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D, concatenate from tensorflow.keras.models import Model inputs = Input((128,128,1)) c1 = Conv2D(16, (3,3), activation='relu', padding='same')(inputs) p1 = MaxPooling2D((2,2))(c1) c2 = Conv2D(32, (3,3), activation='relu', padding='same')(p1) u1 = UpSampling2D((2,2))(c2) outputs = Conv2D(1, (1,1), activation='sigmoid')(u1) model = Model(inputs=[inputs], outputs=[outputs]) model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
4. Face Recognition (OpenCV + Dlib)
import dlib
from imutils import face_utils
import cv2
detector = dlib.get_frontal_face_detector()
img = cv2.imread('face.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
faces = detector(gray)
for rect in faces:
(x, y, w, h) = face_utils.rect_to_bb(rect)
cv2.rectangle(img, (x,y), (x+w, y+h), (0,255,0), 2)
cv2.imshow("Faces", img)
cv2.waitKey(0)
cv2.destroyAllWindows()
5. Optical Character Recognition (OCR)
import pytesseract
from PIL import Image
img = Image.open('text_image.jpg')
text = pytesseract.image_to_string(img)
print(text)
Speech Recognition (SR)
Speech Recognition is a field of Artificial Intelligence and Computer Science that allows machines to understand, process, and interpret human speech. The goal is to convert spoken language into text or take actions based on voice commands.
Applications of Speech Recognition
- Voice assistants (Siri, Alexa, Google Assistant)
- Transcription services (convert speech to text)
- Voice-controlled devices and home automation
- Call center automation & voice commands
- Language learning apps
- Medical dictation software
Speech Recognition Workflow
- Collect audio data (speech recordings).
- Preprocess audio (denoise, normalize, convert to spectrograms).
- Feature extraction (MFCC, spectrograms, chroma features).
- Choose model (traditional HMM/GMM or deep learning models like RNN/LSTM/Transformer).
- Train, evaluate, and optimize the model.
- Convert speech to text or perform actions based on recognition.
Audio Preprocessing Examples
Preprocessing is crucial to clean audio and extract meaningful features.
1. Loading and playing audio (Librosa)
import librosa
import librosa.display
import matplotlib.pyplot as plt
audio_path = 'speech.wav'
y, sr = librosa.load(audio_path, sr=None) # y = waveform, sr = sampling rate
plt.figure(figsize=(10,4))
librosa.display.waveshow(y, sr=sr)
plt.title("Waveform")
plt.show()
2. Extracting MFCC features
mfccs = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=13)
print(mfccs.shape)
plt.figure(figsize=(10,4))
librosa.display.specshow(mfccs, sr=sr, x_axis='time')
plt.colorbar()
plt.title("MFCC")
plt.show()
3. Convert audio to spectrogram
spect = librosa.stft(y)
spect_db = librosa.amplitude_to_db(abs(spect))
plt.figure(figsize=(10,4))
librosa.display.specshow(spect_db, sr=sr, x_axis='time', y_axis='hz')
plt.colorbar()
plt.title("Spectrogram")
plt.show()
Popular Speech Recognition Algorithms
1. Hidden Markov Models (HMM)
HMM models speech as a sequence of states and probabilities of transitions.
- Traditional method for continuous speech recognition.
- Works well with phonemes and small vocabularies.
- Often combined with Gaussian Mixture Models (GMM-HMM).
2. Deep Neural Networks (DNN / CNN / RNN)
Use deep learning to extract features and model temporal dependencies in speech.
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
model = Sequential([
LSTM(128, input_shape=(timesteps, features), return_sequences=True),
LSTM(64),
Dense(num_classes, activation='softmax')
])
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
3. Connectionist Temporal Classification (CTC)
CTC loss is used for training models where alignment between input (audio) and output (text) is unknown.
# Example: LSTM + CTC for speech-to-text # Outputs probability distribution over characters # Loss function: tf.keras.backend.ctc_batch_cost
4. Transformer Models
Use self-attention to model long-range dependencies in audio sequences.
- Examples: Wav2Vec2, Whisper (OpenAI)
- State-of-the-art performance in large vocabulary speech recognition
5. End-to-End Deep Speech Models
- Input: raw audio or spectrograms
- Output: text transcription directly
- DeepSpeech (Mozilla), Jasper, QuartzNet are popular implementations
Python Example: Speech-to-Text (Google Speech Recognition)
import speech_recognition as sr
r = sr.Recognizer()
with sr.AudioFile('speech.wav') as source:
audio = r.record(source)
# Recognize speech using Google Web Speech API
try:
text = r.recognize_google(audio)
print("Transcription:", text)
except sr.UnknownValueError:
print("Google Speech Recognition could not understand audio")
except sr.RequestError as e:
print("Could not request results; {0}".format(e))
Image Recognition (IR)
Image Recognition is a field of Computer Vision that allows machines to identify objects, people, scenes, or patterns in images. It converts visual data into actionable information, enabling computers to "see" and interpret the visual world.
Applications of Image Recognition
- Face recognition (security & authentication)
- Object detection in autonomous vehicles
- Medical imaging (tumor detection, X-ray analysis)
- Product search in e-commerce (image-based search)
- Social media tagging (automatic photo tagging)
- Industrial inspection (defect detection)
Image Recognition Workflow
- Collect image dataset (e.g., CIFAR-10, MNIST, ImageNet).
- Preprocess images (resize, normalize, augment).
- Extract features (manually or via CNN).
- Choose model (traditional ML or deep learning CNNs).
- Train the model on labeled images.
- Evaluate performance using metrics (accuracy, precision, recall).
- Deploy model for prediction on new images.
Image Preprocessing Examples
Preprocessing images improves model performance and generalization.
1. Loading and displaying images (Matplotlib & OpenCV)
import cv2
import matplotlib.pyplot as plt
image = cv2.imread('cat.jpg')
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
plt.imshow(image_rgb)
plt.title("Original Image")
plt.show()
2. Resize and normalize images
image_resized = cv2.resize(image_rgb, (224, 224)) # resize to 224x224
image_normalized = image_resized / 255.0 # normalize pixels to [0,1]
plt.imshow(image_normalized)
plt.title("Resized & Normalized Image")
plt.show()
3. Data augmentation (Keras)
from tensorflow.keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator(
rotation_range=20,
width_shift_range=0.1,
height_shift_range=0.1,
horizontal_flip=True
)
# Apply augmentation on a single image
image_batch = image_normalized.reshape((1, 224, 224, 3))
aug_iter = datagen.flow(image_batch)
aug_image = next(aug_iter)[0]
plt.imshow(aug_image)
plt.title("Augmented Image")
plt.show()
Popular Image Recognition Algorithms
1. Convolutional Neural Networks (CNNs)
CNNs automatically extract spatial features from images using convolution layers.
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
model = Sequential([
Conv2D(32, (3,3), activation='relu', input_shape=(224,224,3)),
MaxPooling2D(2,2),
Conv2D(64, (3,3), activation='relu'),
MaxPooling2D(2,2),
Flatten(),
Dense(128, activation='relu'),
Dense(10, activation='softmax') # for 10-class classification
])
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
2. Transfer Learning
Use pre-trained models (like VGG16, ResNet, Inception) to recognize images efficiently.
from tensorflow.keras.applications import VGG16 from tensorflow.keras.models import Model from tensorflow.keras.layers import Dense, Flatten base_model = VGG16(weights='imagenet', include_top=False, input_shape=(224,224,3)) x = Flatten()(base_model.output) x = Dense(128, activation='relu')(x) predictions = Dense(10, activation='softmax')(x) model = Model(inputs=base_model.input, outputs=predictions) model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
3. Object Detection
Identifies and localizes objects in images with bounding boxes.
- YOLO (You Only Look Once) – real-time detection
- Faster R-CNN – region-based CNN detection
- SSD (Single Shot Detector) – efficient single-shot detection
4. Semantic Segmentation
Classifies each pixel in an image into categories.
- U-Net – popular for medical imaging
- DeepLab – captures multi-scale context
5. Image Classification Pipelines
# Example: predict digit using MNIST dataset
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Flatten, Dense
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train, X_test = X_train/255.0, X_test/255.0
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
model = Sequential([
Flatten(input_shape=(28,28)),
Dense(128, activation='relu'),
Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=5)