Glossary

0-9

0-shot learning 1-shot learning 2-stage detector 3D convolution 4D data 5G + AI 6DoF pose estimation 7D representation 8-bit quantization 9-layer network

A

AGI / Artificial General Intelligence Algorithm Artificial Intelligence (AI)Attention Autoencoder

B

Backpropagation Batch Normalization BERT Bias Boosting

C

Chatbot Classifier / Classification Clustering CNN / Convolutional Neural Network Cross-Validation

D

Data Augmentation Deep Learning Deepfake Deterministic Model Discriminative Model

E

Embedding Encoder Ensemble Learning Epoch Explainable AI (XAI)

F

Feature Extraction Fine-tuning Forward Propagation Foundation Model Fusion / Multimodal Fusion

G

GAN / Generative Adversarial Network Generative AI Gradient Descent Graph Neural Network (GNN)Grounding

H

Hallucination Heuristic Hidden Layer Hierarchical Model Hyperparameter

I

Imbalanced Data Instance / Sample Instruction tuning Intelligence Amplification / Augmentation Interpretability

J

JAX Jittering Joint Embedding JSONL / JSON-lines Juxtaposition

K

K-means Clustering K-Shot Learning Kernel Trick KL Divergence (Kullback–Leibler Divergence)Knowledge Distillation

L

Large Language Model (LLM)Latent Variable Learning Rate Loss Function LSTM / Long Short-Term Memory

M

Machine Learning (ML)Meta-learning Model Multi-head Attention Multimodal / Multimodality

N

Neural Network NLP / Natural Language Processing NLU / Natural Language Understanding Normalization Novelty Detection / Anomaly Detection

O

Objective Function One-hot Encoding Online Learning Optimizer Overfitting

P

Parameter Policy / Reinforcement Learning Policy Pooling Pretraining Prompt

Q

Q-learning Quality Estimation Quantization Query Queue / Buffer

R

Regularization Reinforcement Learning (RL)Representation Learning Retrieval Augmented Generation (RAG)RNN / Recurrent Neural Network

S

Sampling Self-Supervised Learning Sequence Modeling Softmax Supervised Learning

T

Tokenizer Training Data Transfer Learning Transformer Tuning / Hyperparameter Tuning

U

U-Net Uncertainty Estimation Underfitting Universal Approximation Theorem Unsupervised Learning

V

Validation Set Vanishing / Exploding Gradient Variational Autoencoder (VAE)Vector Embedding Vision Transformer (ViT)

W

Weak Supervision Weight Decay Whitening / Whitening Transformation Word Embedding Workflow

X

X-axis / feature axis XAI / Explainable AI XLM XLNet XOR problem

Y

Y-axis / feature axis Y-transform / YUV YAGNI (You Aren't Gonna Need It)Yield (model yield / throughput)Yoga of AI

Z

Z-score Normalization Zero-centric / Zero-bias initialization Zero-gradient phenomenon Zero-shot Learning / Zero-shot inference Zygosity in augmentation

What is KL Divergence (Kullback–Leibler Divergence)

Kullback-Leibler Divergence (KL Divergence) is a fundamental concept in information theory and statistics that quantifies the difference between two probability distributions. It is widely used in fields such as machine learning, statistics, and information retrieval. The smaller the KL Divergence value, the more similar the two distributions are; conversely, a larger value indicates greater divergence.

The formula for KL Divergence is defined as:
D_{KL}(P || Q) = ∑ P(i) log(P(i)/Q(i)), where P and Q are two probability distributions. KL Divergence is non-negative for non-negative probability distributions and equals zero only when P and Q are identical. A notable property of KL Divergence is its asymmetry; D_{KL}(P || Q) is not equal to D_{KL}(Q || P).

In practice, KL Divergence is commonly used for model evaluation, training generative models, and information compression. For instance, optimization algorithms in machine learning may minimize KL Divergence to align the model's predicted distribution with the actual data distribution.

Looking ahead, with the advancement of deep learning and big data technologies, KL Divergence may be combined with other information metrics to create more complex models for processing high-dimensional data.

The advantages of KL Divergence include its mathematical simplicity and ease of computation. However, it is sensitive to zero-probability events, which can lead to unstable results. Care must be taken to ensure that the input probability distributions are valid.