Deep Learning

Deep Learning (DL)

Deep Learning is a subset of Machine Learning that uses neural networks with multiple layers (deep architectures) to automatically learn representations from data. It is widely used for computer vision, natural language processing, speech recognition, and reinforcement learning.

Key idea: Deep neural networks can automatically extract features from raw data, reducing the need for manual feature engineering.

Key Concepts in Deep Learning

Neuron: Basic computation unit of a neural network.
Activation Function: Determines output of a neuron (e.g., ReLU, Sigmoid, Tanh).
Layers: Input, Hidden, Output layers forming a network.
Forward Propagation: Input passes through layers to produce output.
Loss Function: Measures error between predicted and actual output.
Backpropagation: Algorithm to update weights using gradients to minimize loss.
Optimizer: Method to adjust weights (e.g., SGD, Adam).
Epochs & Batch Size: Number of complete passes through training data and subset of samples processed at a time.

Deep Learning Workflow

Prepare dataset (images, text, audio, etc.).
Preprocess data (normalize, encode, resize).
Define neural network architecture (layers, activation functions).
Choose loss function and optimizer.
Train model using forward and backward propagation.
Evaluate model on validation/test data.
Tune hyperparameters or improve architecture if needed.

Example: Simple Deep Neural Network (MNIST)

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

# Load MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train/255.0, x_test/255.0
y_train, y_test = to_categorical(y_train), to_categorical(y_test)

# Build model
model = Sequential([
    Flatten(input_shape=(28,28)),
    Dense(128, activation='relu'),
    Dense(64, activation='relu'),
    Dense(10, activation='softmax')
])

# Compile model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train model
model.fit(x_train, y_train, epochs=5, batch_size=32, validation_split=0.2)

# Evaluate model
loss, acc = model.evaluate(x_test, y_test)
print("Test Accuracy:", acc)

Deep Learning Architectures

Feedforward Neural Network (FNN/MLP): One-directional flow, used for tabular data.
Convolutional Neural Network (CNN): Extracts spatial features from images.
Recurrent Neural Network (RNN/LSTM): Captures sequential dependencies in text, speech, time-series.
Autoencoders: Feature learning, dimensionality reduction, anomaly detection.
GANs: Two-network system for generating realistic data.

Summary

Deep Learning uses multi-layer neural networks to learn representations automatically.
Feedforward networks for general tasks, CNNs for images, RNNs/LSTMs for sequential data.
Training involves forward propagation, loss calculation, backpropagation, and optimization.
Advanced models: Autoencoders, GANs, Transformers for NLP.

Neural Networks (NN)

Neural Networks are computing systems inspired by the human brain. They consist of neurons (nodes) organized in layers to process input data, learn patterns, and make predictions.

Key idea: Automatic feature extraction and prediction from data.

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Simple NN for binary classification
model = Sequential([
    Dense(8, input_dim=4, activation='relu'),
    Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

Convolutional Neural Networks (CNN)

CNNs are specialized neural networks for processing grid-like data such as images. They use convolutional layers to automatically detect spatial hierarchies of features.

from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
from tensorflow.keras.models import Sequential

model = Sequential([
    Conv2D(32, (3,3), activation='relu', input_shape=(28,28,1)),
    MaxPooling2D((2,2)),
    Flatten(),
    Dense(64, activation='relu'),
    Dense(10, activation='softmax')
])

Recurrent Neural Networks (RNN)

RNNs are used for sequential data such as text, time series, or speech. They have loops to maintain memory of previous inputs.

from tensorflow.keras.layers import SimpleRNN, Dense
from tensorflow.keras.models import Sequential

model = Sequential([
    SimpleRNN(50, input_shape=(10,1), activation='tanh'),
    Dense(1, activation='sigmoid')
])

Generative Adversarial Networks (GANs)

GANs consist of two neural networks: a Generator that creates fake data and a Discriminator that tries to distinguish fake from real data. They compete in a game to improve data generation.

# Pseudocode Example
# Generator creates images, Discriminator evaluates them
# Training updates both networks iteratively

Reinforcement Learning (RL)

RL teaches agents to make decisions in an environment to maximize cumulative rewards. The agent learns from trial and error.

import gym
env = gym.make("CartPole-v1")
state = env.reset()
done = False
while not done:
    action = env.action_space.sample()
    state, reward, done, info = env.step(action)

Natural Language Processing (NLP)

NLP is a branch of Artificial Intelligence that enables computers to understand, interpret, and generate human language. It combines linguistics, computer science, and machine learning.

Key idea: Allow machines to read, analyze, and respond to human language.

NLP is widely used in:

Text classification (spam detection, sentiment analysis)
Machine translation (Google Translate)
Chatbots and virtual assistants (Alexa, Siri, ChatGPT)
Speech-to-text and text-to-speech applications
Named Entity Recognition (NER) and question answering

NLP Workflow

Collect text or speech data.
Text preprocessing (cleaning, tokenization, stemming, lemmatization).
Feature extraction (Bag of Words, TF-IDF, Word Embeddings).
Choose ML/DL algorithm (Naive Bayes, SVM, RNN, Transformer).
Train the model and evaluate performance.
Deploy model for prediction or analysis.

Preprocessing and feature extraction are crucial for good NLP performance.

Text Preprocessing

Before using text in ML models, we clean and convert it into a numerical format.

Example: Tokenization and Lowercasing

import nltk
from nltk.tokenize import word_tokenize
nltk.download('punkt')

text = "Natural Language Processing is amazing!"
tokens = word_tokenize(text.lower())
print(tokens)
# Output: ['natural', 'language', 'processing', 'is', 'amazing', '!']

Example: Stopword Removal

from nltk.corpus import stopwords
nltk.download('stopwords')

stop_words = set(stopwords.words('english'))
filtered_tokens = [w for w in tokens if w not in stop_words]
print(filtered_tokens)
# Output: ['natural', 'language', 'processing', 'amazing', '!']

Example: Lemmatization

from nltk.stem import WordNetLemmatizer
nltk.download('wordnet')

lemmatizer = WordNetLemmatizer()
lemmatized = [lemmatizer.lemmatize(w) for w in filtered_tokens]
print(lemmatized)
# Output: ['natural', 'language', 'processing', 'amazing', '!']

Feature Extraction

Convert text into numeric vectors to use in ML algorithms.

1. Bag of Words (BoW)

from sklearn.feature_extraction.text import CountVectorizer

corpus = ["I love AI", "AI is the future"]
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(corpus)
print(vectorizer.get_feature_names_out())
print(X.toarray())

2. TF-IDF

from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(corpus)
print(vectorizer.get_feature_names_out())
print(X.toarray())

3. Word Embeddings (Word2Vec / GloVe)

from gensim.models import Word2Vec

sentences = [["natural","language","processing"], ["deep","learning","is","fun"]]
model = Word2Vec(sentences, vector_size=50, window=3, min_count=1, workers=1)
print(model.wv['natural'])  # vector representation of 'natural'

Popular NLP Algorithms

1. Naive Bayes

Used for text classification (spam detection, sentiment analysis).

from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer

corpus = ["I love AI", "AI is boring", "I enjoy NLP", "I hate bugs"]
labels = [1,0,1,0]  # 1=positive, 0=negative

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(corpus)

model = MultinomialNB()
model.fit(X, labels)
print(model.predict(vectorizer.transform(["I love NLP"])))

2. Support Vector Machine (SVM)

Works well for text classification with TF-IDF features.

from sklearn.svm import SVC
from sklearn.feature_extraction.text import TfidfVectorizer

corpus = ["I love AI", "AI is boring", "I enjoy NLP", "I hate bugs"]
labels = [1,0,1,0]

vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(corpus)

model = SVC(kernel='linear')
model.fit(X, labels)
print(model.predict(vectorizer.transform(["AI is amazing"])))

3. RNN / LSTM

Used for sequential NLP tasks like sentiment analysis, text generation.

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense

model = Sequential([
    Embedding(input_dim=5000, output_dim=64, input_length=10),
    LSTM(64),
    Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

4. Transformers (BERT, GPT)

State-of-the-art models for NLP tasks: text classification, Q&A, summarization.

from transformers import pipeline

classifier = pipeline("sentiment-analysis")
result = classifier("I love learning NLP")
print(result)
# Output: [{'label': 'POSITIVE', 'score': 0.999}]

Computer Vision (CV)

Computer Vision is a field of Artificial Intelligence that enables computers to interpret and understand visual data from the world, such as images and videos. The goal is to automatically extract meaningful information from images or videos and make decisions.

Key idea: Enable machines to "see", recognize, and process visual content.

Applications of Computer Vision

Image classification (e.g., recognize cats, dogs, objects)
Object detection (e.g., detect pedestrians, cars)
Image segmentation (e.g., medical image analysis)
Face recognition (security systems, authentication)
Autonomous vehicles (self-driving cars)
Augmented reality & robotics
Optical Character Recognition (OCR) - convert images of text to digital text

Computer Vision Workflow

Collect image or video data.
Preprocess images (resize, normalize, grayscale).
Feature extraction (edges, shapes, textures, or embeddings).
Choose model (traditional CV algorithms or deep learning models like CNN).
Train, evaluate, and optimize the model.
Deploy for real-world tasks (detection, classification, segmentation).

Image Preprocessing Examples

Preprocessing is crucial to improve model performance and reduce noise.

1. Reading and displaying images (OpenCV)

import cv2

# Load image
img = cv2.imread('image.jpg')
# Resize image
img_resized = cv2.resize(img, (224,224))
# Convert to grayscale
gray = cv2.cvtColor(img_resized, cv2.COLOR_BGR2GRAY)

cv2.imshow("Grayscale Image", gray)
cv2.waitKey(0)
cv2.destroyAllWindows()

2. Image normalization for deep learning

import numpy as np

img_array = np.array(img_resized, dtype='float32') / 255.0
print(img_array.shape, img_array.min(), img_array.max())

3. Edge Detection (Canny)

edges = cv2.Canny(gray, threshold1=100, threshold2=200)
cv2.imshow("Edges", edges)
cv2.waitKey(0)
cv2.destroyAllWindows()

Popular Computer Vision Algorithms

1. Image Classification (CNN)

Classify images into predefined categories.

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

model = Sequential([
    Conv2D(32, (3,3), activation='relu', input_shape=(64,64,3)),
    MaxPooling2D(2,2),
    Conv2D(64, (3,3), activation='relu'),
    MaxPooling2D(2,2),
    Flatten(),
    Dense(128, activation='relu'),
    Dense(10, activation='softmax')  # 10 classes
])
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

2. Object Detection (YOLO / SSD)

Locate and classify multiple objects in an image.

# Using pre-trained YOLOv5 with PyTorch
import torch

model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)
img_path = 'image.jpg'
results = model(img_path)
results.show()  # Display detection

3. Image Segmentation (U-Net)

Assign each pixel in an image to a class (useful for medical images, autonomous driving).

from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D, concatenate
from tensorflow.keras.models import Model

inputs = Input((128,128,1))
c1 = Conv2D(16, (3,3), activation='relu', padding='same')(inputs)
p1 = MaxPooling2D((2,2))(c1)
c2 = Conv2D(32, (3,3), activation='relu', padding='same')(p1)
u1 = UpSampling2D((2,2))(c2)
outputs = Conv2D(1, (1,1), activation='sigmoid')(u1)
model = Model(inputs=[inputs], outputs=[outputs])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

4. Face Recognition (OpenCV + Dlib)

import dlib
from imutils import face_utils
import cv2

detector = dlib.get_frontal_face_detector()
img = cv2.imread('face.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
faces = detector(gray)

for rect in faces:
    (x, y, w, h) = face_utils.rect_to_bb(rect)
    cv2.rectangle(img, (x,y), (x+w, y+h), (0,255,0), 2)

cv2.imshow("Faces", img)
cv2.waitKey(0)
cv2.destroyAllWindows()

5. Optical Character Recognition (OCR)

import pytesseract
from PIL import Image

img = Image.open('text_image.jpg')
text = pytesseract.image_to_string(img)
print(text)

Speech Recognition (SR)

Speech Recognition is a field of Artificial Intelligence and Computer Science that allows machines to understand, process, and interpret human speech. The goal is to convert spoken language into text or take actions based on voice commands.

Key idea: Enable machines to "listen" and understand human speech.

Applications of Speech Recognition

Voice assistants (Siri, Alexa, Google Assistant)
Transcription services (convert speech to text)
Voice-controlled devices and home automation
Call center automation & voice commands
Language learning apps
Medical dictation software

Speech Recognition Workflow

Collect audio data (speech recordings).
Preprocess audio (denoise, normalize, convert to spectrograms).
Feature extraction (MFCC, spectrograms, chroma features).
Choose model (traditional HMM/GMM or deep learning models like RNN/LSTM/Transformer).
Train, evaluate, and optimize the model.
Convert speech to text or perform actions based on recognition.

Audio Preprocessing Examples

Preprocessing is crucial to clean audio and extract meaningful features.

1. Loading and playing audio (Librosa)

import librosa
import librosa.display
import matplotlib.pyplot as plt

audio_path = 'speech.wav'
y, sr = librosa.load(audio_path, sr=None)  # y = waveform, sr = sampling rate
plt.figure(figsize=(10,4))
librosa.display.waveshow(y, sr=sr)
plt.title("Waveform")
plt.show()

2. Extracting MFCC features

mfccs = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=13)
print(mfccs.shape)
plt.figure(figsize=(10,4))
librosa.display.specshow(mfccs, sr=sr, x_axis='time')
plt.colorbar()
plt.title("MFCC")
plt.show()

3. Convert audio to spectrogram

spect = librosa.stft(y)
spect_db = librosa.amplitude_to_db(abs(spect))
plt.figure(figsize=(10,4))
librosa.display.specshow(spect_db, sr=sr, x_axis='time', y_axis='hz')
plt.colorbar()
plt.title("Spectrogram")
plt.show()

Popular Speech Recognition Algorithms

1. Hidden Markov Models (HMM)

HMM models speech as a sequence of states and probabilities of transitions.

Traditional method for continuous speech recognition.
Works well with phonemes and small vocabularies.
Often combined with Gaussian Mixture Models (GMM-HMM).

2. Deep Neural Networks (DNN / CNN / RNN)

Use deep learning to extract features and model temporal dependencies in speech.

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense

model = Sequential([
    LSTM(128, input_shape=(timesteps, features), return_sequences=True),
    LSTM(64),
    Dense(num_classes, activation='softmax')
])
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

RNNs/LSTMs are very popular for sequential data like speech signals.

3. Connectionist Temporal Classification (CTC)

CTC loss is used for training models where alignment between input (audio) and output (text) is unknown.

# Example: LSTM + CTC for speech-to-text
# Outputs probability distribution over characters
# Loss function: tf.keras.backend.ctc_batch_cost

4. Transformer Models

Use self-attention to model long-range dependencies in audio sequences.

Examples: Wav2Vec2, Whisper (OpenAI)
State-of-the-art performance in large vocabulary speech recognition

5. End-to-End Deep Speech Models

Input: raw audio or spectrograms
Output: text transcription directly
DeepSpeech (Mozilla), Jasper, QuartzNet are popular implementations

Python Example: Speech-to-Text (Google Speech Recognition)

import speech_recognition as sr

r = sr.Recognizer()
with sr.AudioFile('speech.wav') as source:
    audio = r.record(source)

# Recognize speech using Google Web Speech API
try:
    text = r.recognize_google(audio)
    print("Transcription:", text)
except sr.UnknownValueError:
    print("Google Speech Recognition could not understand audio")
except sr.RequestError as e:
    print("Could not request results; {0}".format(e))

This example uses a cloud API to convert speech to text. For offline processing, use models like DeepSpeech or Wav2Vec2.

Image Recognition (IR)

Image Recognition is a field of Computer Vision that allows machines to identify objects, people, scenes, or patterns in images. It converts visual data into actionable information, enabling computers to "see" and interpret the visual world.

Key idea: Enable machines to automatically detect and classify objects in images.

Applications of Image Recognition

Face recognition (security & authentication)
Object detection in autonomous vehicles
Medical imaging (tumor detection, X-ray analysis)
Product search in e-commerce (image-based search)
Social media tagging (automatic photo tagging)
Industrial inspection (defect detection)

Image Recognition Workflow

Collect image dataset (e.g., CIFAR-10, MNIST, ImageNet).
Preprocess images (resize, normalize, augment).
Extract features (manually or via CNN).
Choose model (traditional ML or deep learning CNNs).
Train the model on labeled images.
Evaluate performance using metrics (accuracy, precision, recall).
Deploy model for prediction on new images.

Image Preprocessing Examples

Preprocessing images improves model performance and generalization.

1. Loading and displaying images (Matplotlib & OpenCV)

import cv2
import matplotlib.pyplot as plt

image = cv2.imread('cat.jpg')
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
plt.imshow(image_rgb)
plt.title("Original Image")
plt.show()

2. Resize and normalize images

image_resized = cv2.resize(image_rgb, (224, 224))  # resize to 224x224
image_normalized = image_resized / 255.0  # normalize pixels to [0,1]
plt.imshow(image_normalized)
plt.title("Resized & Normalized Image")
plt.show()

3. Data augmentation (Keras)

from tensorflow.keras.preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator(
    rotation_range=20,
    width_shift_range=0.1,
    height_shift_range=0.1,
    horizontal_flip=True
)
# Apply augmentation on a single image
image_batch = image_normalized.reshape((1, 224, 224, 3))
aug_iter = datagen.flow(image_batch)
aug_image = next(aug_iter)[0]
plt.imshow(aug_image)
plt.title("Augmented Image")
plt.show()

Popular Image Recognition Algorithms

1. Convolutional Neural Networks (CNNs)

CNNs automatically extract spatial features from images using convolution layers.

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

model = Sequential([
    Conv2D(32, (3,3), activation='relu', input_shape=(224,224,3)),
    MaxPooling2D(2,2),
    Conv2D(64, (3,3), activation='relu'),
    MaxPooling2D(2,2),
    Flatten(),
    Dense(128, activation='relu'),
    Dense(10, activation='softmax')  # for 10-class classification
])
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

CNNs are the backbone of modern image recognition tasks. Convolution layers capture spatial features like edges, textures, and patterns.

2. Transfer Learning

Use pre-trained models (like VGG16, ResNet, Inception) to recognize images efficiently.

from tensorflow.keras.applications import VGG16
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Flatten

base_model = VGG16(weights='imagenet', include_top=False, input_shape=(224,224,3))
x = Flatten()(base_model.output)
x = Dense(128, activation='relu')(x)
predictions = Dense(10, activation='softmax')(x)
model = Model(inputs=base_model.input, outputs=predictions)
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

Transfer learning is faster and requires less data because the model already learned features from large datasets.

3. Object Detection

Identifies and localizes objects in images with bounding boxes.

YOLO (You Only Look Once) – real-time detection
Faster R-CNN – region-based CNN detection
SSD (Single Shot Detector) – efficient single-shot detection

4. Semantic Segmentation

Classifies each pixel in an image into categories.

U-Net – popular for medical imaging
DeepLab – captures multi-scale context

5. Image Classification Pipelines

# Example: predict digit using MNIST dataset
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Flatten, Dense

(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train, X_test = X_train/255.0, X_test/255.0
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

model = Sequential([
    Flatten(input_shape=(28,28)),
    Dense(128, activation='relu'),
    Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=5)