Demystifying Hidden Markov Models (HMM): How AI Predicts the Unseen!

Hello! Let's understand this excellent lecture on Hidden Markov Models (HMMs) using simple language and real-life examples. Don't worry, like a senior, I am breaking down every topic for you. Let's get started!

Introduction and Applications of HMMs (Slides 1–2)

In these slides, Professor Steven Salzberg introduces us to Hidden Markov Models (HMMs) and showcases their amazing real-world applications.

What it is: In simple terms, an HMM is a statistical model that represents systems where the actual conditions or States of the system cannot be seen directly (i.e., they remain Hidden), but we can observe certain outputs or symptoms generated from those states from the outside.
How it works: Imagine a friend is inside a house, and instead of coming outside, they only show a colored card through the window. You don't see when they move between rooms (like the bedroom or kitchen) (Hidden States), but by looking at the cards they show (Observations), you try to guess where they might be. HMM solves this exact logic mathematically.
Why it matters: This is incredibly important in real life. Some major applications shown in the slides are:
1. Real-time Continuous Speech Recognition: When we give voice commands to our phones (like Siri or Google Assistant), our voice or sound waves are the Observations, and the words we intended to say are the Hidden States. The world's biggest speech recognition software is built on HMMs.
2. Gene Finding: In bioinformatics, HMMs are used to find hidden 'genes' within DNA sequences. Famous tools like GENSCAN and GlimmerHMM run based on this.
3. Prediction of Protein Structure: It helps in predicting what the 3D structure of a protein will look like.

[!NOTE] IMPORTANT NOTES FOR NOTEBOOK Concept: Hidden Markov Models (HMMs) & Applications Key Point 1: HMM is a statistical tool used to model processes where the true states are hidden, but they emit visible observations. Key Point 2: Widely used in computational biology (Gene finding via GENSCAN, TwinScan) and computer science (Speech recognition). Advantage: Excellent for modeling sequential or time-series data where underlying factors are unobservable. Disadvantage: Highly dependent on historical data and assumes the future state depends only on the current state.

What is an HMM & HMM Notation (Slides 3–4)

Here, the core components of an HMM and its mathematical Notations are discussed.

What it is: An HMM primarily consists of two things—a set of States and a set of Transitions (paths to move from one state to another).
How it works:
- Transition Probability ($a_{ij}$): There is a specific probability of moving from one state to another. For example, if it rains today, what is the probability of having a sunny day tomorrow?
- Emission/Output Probability ($b_{ij}(k)$): The probability of generating a specific output or symbol while being in a specific state. For example, if someone is sad, they are more likely to cry, and if they are happy, they are more likely to laugh. Here 'sad/happy' is the State and 'cry/laugh' is the Output Symbol.
Notation:
- ${s}$: Set of all states.
- $S_I$: Initial States.
- $S_F$: Final States.
- $a_{ij}$: Probability of transitioning from state $i$ to state $j$.
- $b_{ij}(k)$: Probability of emitting symbol $k$ while transitioning from state $i$ to $j$.
Why it matters: Without these components, an HMM has no mathematical existence. When solving real-life problems, we must use these notations to set up the equations.

[!NOTE] IMPORTANT NOTES FOR NOTEBOOK Concept: Components and Mathematical Notation of HMM Key Point 1: Comprises state transition probabilities ($a_{ij}$) and symbol emission probabilities ($b_{ij}(k)$). Key Point 2: Emissions can be mathematically attached either to the transitions between states or directly to the states themselves. Advantage: Standardizes complex real-world sequential problems into clear parameters (States, Transitions, Emissions). Disadvantage: As the number of states increases, the notation matrix grows quadratically ($N^2$), increasing computational overhead.

HMM Example - Casino Coin & DNA (Slides 5–12)

To understand HMMs in the easiest way, the slides provide excellent examples involving a Casino Coin and a DNA Sequence.

[ Fair Coin State ]   --- Transition Probabilities --->   [ Unfair/Biased Coin State ]
      |                                                                 |
Emits: Heads (0.5) / Tails (0.5)                                  Emits: Heads (0.8) / Tails (0.2)

What it is: Imagine gambling at a casino. The dealer has two coins—one Fair Coin (where Heads and Tails have an equal 50% chance) and one Unfair/Biased Coin (where the chance of getting Heads is much higher to cheat, say 80%). The dealer secretly switches coins under the table, so you cannot see which coin is being used. You only observe the output sequence: $HTHHTTT...$
How it works:
- The States are two: Fair (F) and Unfair (U). These are hidden, hence Hidden States.
- The Observation Symbols are: $H$ (Heads) and $T$ (Tails).
- The dealer might start with the Fair coin, then suddenly switch to the Unfair one. The mechanism of moving from one coin to another is the State Transition.
- The same rule applies in bioinformatics. Suppose we have a DNA sequence $AAACCC$. HMM can determine which biological state (like exon or intron) these nucleotides are coming from.
Why it matters: The main motivation of this casino example is—just by looking at the sequence of Heads and Tails and applying mathematics, we can tell exactly when the casino dealer cheated (i.e., when they used the Unfair coin)!

[!NOTE] IMPORTANT NOTES FOR NOTEBOOK Concept: The Dishonest Casino & DNA Sequence Representation Key Point 1: The choices of coins (Fair vs. Unfair) act as the Hidden States, while the game results ($H$ or $T$) act as the Emissions/Observations. Key Point 2: In DNA applications, nucleotide tokens (A, C, G, T) represent observations, and gene structures (coding vs. non-coding regions) represent hidden states. Advantage: Makes abstract probability theories tangible and directly models the concept of "unobservable intent/cause." Disadvantage: Real-world scenarios often have highly complex noise, making it harder to distinguish subtle changes between states.

Properties of an HMM (Slide 13)

This slide discusses some incredibly important Properties of HMMs. An HMM is essentially a First-order Markov process.

What it is: A first-order Markov process means—the probability of transitioning to a future state depends only on the current state, and not on any past history.
How it works: In terms of equations, what state $s_t$ will be at time $t$ depends solely on $s_{t-1}$ (what the state was just before). Suppose whether you will go to watch a movie tomorrow depends entirely on your mood today; it has no connection to what your mood was over the past week. Additionally, time in HMMs is considered Discrete (e.g., time steps 1, 2, 3...).
Why it matters: Because of this property, complex HMM calculations become much easier. If we had to remember the entire past history, computer memory and processing power would be exhausted. This assumption allows us to run massive algorithms easily.

[!NOTE] IMPORTANT NOTES FOR NOTEBOOK Concept: Markov Property & Constraints in HMM Key Point 1: First-order Markov Chain Rule: The probability of state $s_t$ depends exclusively on the immediate prior state $s_{t-1}$. Key Point 2: Time progresses in discrete steps, meaning transitions happen at distinct intervals ($t, t+1$). Advantage: Simplifies probability computations drastically, avoiding combinatorial explosion. Disadvantage: "Memoryless" property limits its use in scenarios where long-term historical context deeply alters future outputs.

Three Classic HMM Problems & Basic Facts (Slides 14–20)

This section discusses the 3 core classic problems of HMMs and some Basic Facts. Whenever you design any HMM system, you must face these 3 problems.

The 3 Classic Problems:
1. Evaluation: We have a model and a specific output sequence. The question is, what is the probability or score of this output sequence being generated by this model? (This is solved using the Forward Algorithm). Example: Checking how closely a new sequence matches a protein family via a score.
2. Decoding: We have a model and an output sequence. Now we must find, what was the most likely state sequence in the background that produced this output? Example: Identifying exactly where a gene begins and ends in a DNA sequence.
3. Learning (Training): We only have data (output sequence), but the model probabilities are unknown (Untrained HMM). Tuning or training the model's parameters by observing the data is called learning.
Basic Facts:
- The sum of probabilities for all transition lines or edges exiting from a single state must always be 1.
- The sum of all output probabilities associated with a link/edge will also always be 1.
- $a_{ij}$ is a conditional probability. It means, given that the model was in state $i$ at time $t$, what is the probability of moving to state $j$ at time $t+1$.
Why it matters: Knowing these problems and rules is crucial because they are the foundation of HMM applications. When we do speech or DNA recognition, we are essentially coding solutions to these exact problems.

[!NOTE] IMPORTANT NOTES FOR NOTEBOOK Concept: The Three Fundamental HMM Problems & Foundational Probability Axioms Key Point 1: The three problems are Evaluation (Scoring matching probability), Decoding (Uncovering the hidden sequence path), and Learning (Training weights from raw outputs). Key Point 2: Sum of all exiting state transition probabilities must equal 1 ($\sum_j a_{ij} = 1$). Advantage: Provides a complete structural framework to solve scoring, pattern finding, and machine learning tasks. Disadvantage: Learning parameters from completely untrained HMMs often runs into local optima issues (e.g., in Baum-Welch algorithm).

Solving Evaluation: The Forward Algorithm & Trellis (Slides 21–25)

Now we will see the solution to the first problem—which is Evaluation. To solve this, the Forward Algorithm is used, and a grid called a Trellis is built to perform calculations.

What it is: The Forward Algorithm is a dynamic programming method that calculates the total probability of obtaining a specific output from an HMM model.
How it works:
- In the slide's example, there are two states: $S_1$ (Initial state) and $S_2$ (Final state).
- Over time, a grid or Trellis is formed. According to the slide's diagram, at each step or time, the probabilities of reaching a state are multiplied and added to move forward.
- Step 1 (Time 1): Calculating early probabilities, two paths yielded values of 0.48 and 0.20 respectively.
- Step 2 (Time 2): Using these values and multiplying them with transition and emission probabilities, the values for the next state are found. As shown in the slide:
  - One path's value is: $0.0576 + 0.018 = 0.0756$
  - Another path's value is: $0.126 + 0.096 = 0.222$
- In this way, by using the results of the previous step at every step, the total probability of the final output is determined.
Why it matters: If we did not use this grid or Trellis and instead tried to manually calculate all separate paths, a large sequence would cause millions of path calculations, crashing the computer. The Forward Algorithm uses dynamic programming to do this flawlessly in the blink of an eye.

[!NOTE] IMPORTANT NOTES FOR NOTEBOOK Concept: Forward Algorithm and Trellis Execution Key Point 1: Utilizes a matrix structure called a Trellis to recursively accumulate sub-path emission and transition probabilities over time. Key Point 2: Solves the evaluation problem in polynomial time $O(N^2T)$ instead of exponential time $O(N^T)$ where $N$ is states and $T$ is sequence length. Advantage: Extremely fast and optimal for scoring data compatibility against a specific HMM profile. Disadvantage: Suffers from numerical underflow if probabilities are multiplied iteratively without log-scale transformations.

Forward Algorithm: Equations & Recursive Formulation (Slides 26–30)

In this final part of the lecture, the mathematical equations of the Forward Algorithm and its Recursive Formulation are shown.

What it is: The total probability of a sequence $y$ (of length $T$) being emitted by an HMM is—the sum of probabilities of all possible paths that could produce that sequence.
How it works:
- Disjoint Paths Fact: Since we can only walk down one specific path at a time, all possible paths are mutually exclusive (Disjoint). Because of this, we can simply add their probabilities.
- Markov Assumptions: To simplify calculations, two important assumptions are used:
  1. Transition Probability Rule: The decision to move to the next state depends solely on the current state.
  2. Output Probability Rule: Generating an output at any time depends solely on the transition taken at that time.
- Recursive Step: At every step, we only need to keep track of 3 things—current output ($y_t$), current state ($x_t$), and next state ($x_{t+1}$). With these 3 variables, the entire calculation is completed recursively.
Why it matters: Because of this recursive formula, we can implement the whole algorithm in code by just writing a simple for loop or recursive function. This is the key to converting mathematical theory into software code.

[!NOTE] IMPORTANT NOTES FOR NOTEBOOK Concept: Mathematical Formulae & Recursion in Forward Algorithm Key Point 1: Total sequence emission probability is the sum of all individual path execution probabilities because all paths are disjoint events. Key Point 2: At any recursive iteration step $t$, the formula uniquely requires tracking only three token variables: $y_t$, $x_t$, and $x_{t+1}$. Advantage: Allows elegant programmatic implementation using standard iterative loops, saving enormous operational memory. Disadvantage: Requires strict alignment with the Markov assumptions; if the physical system defies these assumptions, the mathematical output fails.

We have successfully finished the entire lecture! I hope Hidden Markov Models are now clear to you.

হ্যালো! Hidden Markov Models (HMMs) এর ওপর এই চমৎকার লেকচারটি সহজ ভাষায় এবং বাস্তব জীবনের উদাহরণের মাধ্যমে বুঝে নেওয়া যাক। একদম চিন্তা কোরো না, একজন সিনিয়রের মতো আমি তোমাকে প্রতিটি টপিক ভেঙে ভেঙে বুঝিয়ে দিচ্ছি। চলো শুরু করা যাক!

Introduction and Applications of HMMs (Slides 1–2)

এই স্লাইডগুলোতে প্রফেসর স্টিভেন সালজবার্গ আমাদের Hidden Markov Models (HMMs) এর সাথে পরিচয় করিয়ে দিয়েছেন এবং বাস্তব দুনিয়ায় এর দারুণ সব ব্যবহার দেখিয়েছেন।

HMM কী? (What it is): সহজ কথায়, HMM হলো একটি পরিসংখ্যানগত বা স্ট্যাটিস্টিক্যাল মডেল (statistical model), যা এমন সব সিস্টেমকে রিপ্রেজেন্ট করে যেখানে সিস্টেমের আসল অবস্থা বা States গুলো সরাসরি দেখা যায় না (অর্থাৎ লুকানো বা Hidden থাকে), কিন্তু সেই অবস্থা থেকে তৈরি হওয়া কিছু আউটপুট বা লক্ষণ আমরা বাইরে থেকে দেখতে পাই।
এটি কীভাবে কাজ করে? (How it works): ধরো, তোমার এক বন্ধু ঘরের ভেতর আছে এবং সে বাইরে বের না হয়ে জানালা দিয়ে শুধু একটি করে রঙিন কার্ড দেখাচ্ছে। সে কখন কোন ঘরের ভেতর যাচ্ছে (বেডরুম নাকি কিচেন) তা তুমি দেখছ না (Hidden States), কিন্তু তার দেখানো কার্ডগুলো দেখে (Observations) তুমি আন্দাজ করার চেষ্টা করছ সে এখন কোথায় থাকতে পারে। HMM ঠিক এই লজিকটি গাণিতিকভাবে সমাধান করে।
এর গুরুত্ব কী? (Why it matters): বাস্তব জীবনে এটি অসম্ভব রকমের গুরুত্বপূর্ণ। স্লাইডে এর কিছু প্রধান ব্যবহার দেখানো হয়েছে:
1. রিয়েল-টাইম কন্টিনিউয়াস স্পিচ রিকগনিশন (Real-time Continuous Speech Recognition): আমরা যখন ফোনে ভয়েস কমান্ড দিই (যেমন Siri বা Google Assistant), তখন আমাদের মুখের আওয়াজ বা সাউন্ড ওয়েভ হলো Observation, আর আমরা কোন শব্দটা বলতে চেয়েছি সেটা হলো Hidden State। পৃথিবীর বড় বড় স্পিচ রিকগনিশন সফটওয়্যার HMM-এর ওপর ভিত্তি করে তৈরি।
2. জিন ফাইন্ডিং (Gene Finding): বায়োইনফরমেটিক্সে DNA সিকোয়েন্সের ভেতর লুকিয়ে থাকা 'জিন' (Genes) খুঁজে বের করতে HMM ব্যবহার করা হয়। GENSCAN, GlimmerHMM এর মতো বিখ্যাত টুলগুলো এর ওপর ভিত্তি করেই চলে।
3. প্রোটিন স্ট্রাকচার প্রেডিকশন (Prediction of Protein Structure): কোনো প্রোটিনের থ্রি-ডি গঠন কেমন হবে তা অনুমান করতে এটি সাহায্য করে।

[!NOTE] IMPORTANT NOTES FOR NOTEBOOK Concept: Hidden Markov Models (HMMs) & Applications Key Point 1: HMM is a statistical tool used to model processes where the true states are hidden, but they emit visible observations. Key Point 2: Widely used in computational biology (Gene finding via GENSCAN, TwinScan) and computer science (Speech recognition). Advantage: Excellent for modeling sequential or time-series data where underlying factors are unobservable. Disadvantage: Highly dependent on historical data and assumes the future state depends only on the current state.

What is an HMM & HMM Notation (Slides 3–4)

এখানে HMM-এর মূল উপাদান এবং এর গাণিতিক সংকেত বা Notation গুলো আলোচনা করা হয়েছে।

HMM-এর মূল উপাদান (What it is): একটি HMM মূলত দুটি জিনিস নিয়ে গঠিত—একগুচ্ছ States (অবস্থা) এবং একগুচ্ছ Transitions (এক অবস্থা থেকে অন্য অবস্থায় যাওয়ার পথ)।
এটি কীভাবে কাজ করে? (How it works):
- Transition Probability ($a_{ij}$): এক স্টেট থেকে অন্য স্টেটে যাওয়ার একটি নির্দিষ্ট সম্ভাবনা বা প্রোবাবিলিটি থাকে। যেমন, আজকে বৃষ্টি হলে কালকে রোদ হওয়ার সম্ভাবনা কতটুকু?
- Emission/Output Probability ($b_{ij}(k)$): একটি নির্দিষ্ট স্টেটে থাকা অবস্থায় কোনো একটা আউটপুট বা সিম্বল তৈরি করার সম্ভাবনা। যেমন, মন খারাপ থাকলে মানুষের কান্নার সম্ভাবনা বেশি, আর মন ভালো থাকলে হাসার সম্ভাবনা বেশি। এখানে 'মন খারাপ/ভালো' হলো State আর 'কান্না/হাসি' হলো Output Symbol।
গাণিতিক সংকেত (Notation):
- ${s}$: সমস্ত স্টেটের সেট।
- $S_I$: শুরুর স্টেট বা Initial States।
- $S_F$: শেষের স্টেট বা Final States।
- $a_{ij}$: স্টেট $i$ থেকে স্টেট $j$-তে যাওয়ার সম্ভাবনা।
- $b_{ij}(k)$: স্টেট $i$ থেকে $j$-তে যাওয়ার সময় $k$ নামক সিম্বলটি আউটপুট দেওয়ার সম্ভাবনা।
এর গুরুত্ব কী? (Why it matters): এই উপাদানগুলো ছাড়া HMM-এর কোনো গাণিতিক অস্তিত্ব নেই। আমরা যখন কোনো রিয়েল-লাইফ প্রবলেম সলভ করব, তখন এই নোটেশনগুলো ব্যবহার করেই সমীকরণ (Equations) সাজাতে হবে।

[!NOTE] IMPORTANT NOTES FOR NOTEBOOK Concept: Components and Mathematical Notation of HMM Key Point 1: Comprises state transition probabilities ($a_{ij}$) and symbol emission probabilities ($b_{ij}(k)$). Key Point 2: Emissions can be mathematically attached either to the transitions between states or directly to the states themselves. Advantage: Standardizes complex real-world sequential problems into clear parameters (States, Transitions, Emissions). Disadvantage: As the number of states increases, the notation matrix grows quadratically ($N^2$), increasing computational overhead.

HMM Example - Casino Coin & DNA (Slides 5–12)

HMM-কে সবচেয়ে সহজে বোঝার জন্য স্লাইডে একটি চমৎকার Casino Coin (ক্যাসিনোর কয়েন) এবং DNA Sequence এর উদাহরণ দেওয়া হয়েছে।

[ Fair Coin State ]   --- Transition Probabilities --->   [ Unfair/Biased Coin State ]
      |                                                                 |
Emits: Heads (0.5) / Tails (0.5)                                  Emits: Heads (0.8) / Tails (0.2)

উদাহরণটি কী? (What it is): ধরো একটি ক্যাসিনোতে জুয়া খেলা হচ্ছে। ডিলারের কাছে দুটি কয়েন আছে—একটি Fair Coin (যেখানে হেড এবং টেইল পড়ার সম্ভাবনা সমান, ৫০%) এবং একটি Unfair/Biased Coin (যেখানে চিটিং করার জন্য হেড পড়ার সম্ভাবনা অনেক বেশি, ধরা যাক ৮০%)। ডিলার টেবিলের নিচে লুকিয়ে কয়েন পরিবর্তন করে, তাই তুমি দেখতে পাচ্ছ না সে কখন কোন কয়েন ব্যবহার করছে। তুমি শুধু দেখতে পাচ্ছ আউটপুট সিকোয়েন্স: $HTHHTTT...$
এটি কীভাবে কাজ করে? (How it works):
- এখানে States হলো দুটি: Fair (F) এবং Unfair (U)। এগুলো লুকানো থাকে, তাই এগুলো Hidden States।
- Observation Symbols হলো: $H$ (Heads) এবং $T$ (Tails)।
- ডিলার হয়তো Fair কয়েন দিয়ে খেলা শুরু করল, তারপর হঠাৎ চাল চেলে Unfair কয়েনে চলে গেল। এই এক কয়েন থেকে অন্য কয়েনে যাওয়ার ব্যবস্থাপনাই হলো State Transition।
- বায়োইনফরমেটিক্সের ক্ষেত্রেও একই নিয়ম খাটে। ধরো DNA সিকোয়েন্স আছে $AAACCC$। এই নিউক্লিওটাইডগুলো কোন বায়োলজিক্যাল স্টেট (যেমন: এক্সন বা ইনট্রন) থেকে আসছে, তা HMM বের করতে পারে।
এর গুরুত্ব কী? (Why it matters): ক্যাসিনোর এই উদাহরণের মূল উদ্দেশ্য বা মোটিভেশন হলো—শুধু মাত্র হেড আর টেইলের সিকোয়েন্স দেখে গণিত খাটিয়ে বলে দেওয়া যে, ঠিক কোন কোন সময়ে ক্যাসিনোর ডিলারটি চিটিং করেছিল (অর্থাৎ কখন সে Unfair কয়েন ব্যবহার করেছিল)!

[!NOTE] IMPORTANT NOTES FOR NOTEBOOK Concept: The Dishonest Casino & DNA Sequence Representation Key Point 1: The choices of coins (Fair vs. Unfair) act as the Hidden States, while the game results ($H$ or $T$) act as the Emissions/Observations. Key Point 2: In DNA applications, nucleotide tokens (A, C, G, T) represent observations, and gene structures (coding vs. non-coding regions) represent hidden states. Advantage: Makes abstract probability theories tangible and directly models the concept of "unobservable intent/cause." Disadvantage: Real-world scenarios often have highly complex noise, making it harder to distinguish subtle changes between states.

Properties of an HMM (Slide 13)

এই স্লাইডে HMM-এর কিছু অত্যন্ত গুরুত্বপূর্ণ বৈশিষ্ট্য বা Properties আলোচনা করা হয়েছে। HMM মূলত একটি First-order Markov process।

বৈশিষ্ট্যটি কী? (What it is): ফার্স্ট-অর্ডার মার্কভ প্রসেস মানে হলো—ভবিষ্যতের কোনো স্টেটে যাওয়ার সম্ভাবনা বর্তমান স্টেটের ওপর নির্ভর করে, কিন্তু তার অতীতের কোনো ইতিহাসের ওপর নির্ভর করে না।
এটি কীভাবে কাজ করে? (How it works): সমীকরণের ভাষায়, $s_t$ (সময় $t$-তে স্টেট কী হবে) তা শুধুমাত্র $s_{t-1}$ (তার ঠিক আগের স্টেট কী ছিল) এর ওপর নির্ভর করে। ধরা যাক, তুমি আগামীকাল সিনেমা দেখতে যাবে কি না, তা শুধুমাত্র আজকে তোমার মুড কেমন তার ওপর নির্ভর করছে, গত এক সপ্তাহে তোমার মুড কেমন ছিল তার সাথে এর কোনো সম্পর্ক নেই। এছাড়া, HMM-এ সময়কে Discrete বা বিচ্ছিন্ন ধরা হয় (যেমন: টাইম স্টেপ ১, ২, ৩...)।
এর গুরুত্ব কী? (Why it matters): এই বৈশিষ্ট্যের কারণে HMM-এর জটিল হিসাব-নিকাশ অনেক সহজ হয়ে যায়। যদি আমাদের পুরো অতীতের ইতিহাস মনে রাখতে হতো, তবে কম্পিউটারের মেমোরি ও প্রসেসিং পাওয়ার শেষ হয়ে যেত। এই অনুমিতি বা অ্যাসাম্পশনের কারণেই আমরা বড় বড় অ্যালগরিদম সহজে চালাতে পারি।

[!NOTE] IMPORTANT NOTES FOR NOTEBOOK Concept: Markov Property & Constraints in HMM Key Point 1: First-order Markov Chain Rule: The probability of state $s_t$ depends exclusively on the immediate prior state $s_{t-1}$. Key Point 2: Time progresses in discrete steps, meaning transitions happen at distinct intervals ($t, t+1$). Advantage: Simplifies probability computations drastically, avoiding combinatorial explosion. Disadvantage: "Memoryless" property limits its use in scenarios where long-term historical context deeply alters future outputs.

Three Classic HMM Problems & Basic Facts (Slides 14–20)

এই বিভাগে HMM-এর ৩টি মূল ক্লাসিক সমস্যা এবং কিছু মৌলিক সত্য (Basic Facts) নিয়ে আলোচনা করা হয়েছে। যেকোনো HMM সিস্টেম ডিজাইন করতে গেলে এই ৩টি সমস্যার মুখোমুখি হতেই হবে।

৩টি ক্লাসিক সমস্যা (The 3 Classic Problems):
1. Evaluation (মূল্যায়ন): আমাদের কাছে একটি মডেল এবং একটি নির্দিষ্ট আউটপুট সিকোয়েন্স আছে। এখন প্রশ্ন হলো, এই মডেলটি থেকে ওই আউটপুট সিকোয়েন্সটি তৈরি হওয়ার সম্ভাবনা বা স্কোর কত? (এটি সমাধান করা হয় Forward Algorithm দিয়ে)। যেমন: কোনো প্রোটিন ফ্যামিলির সাথে একটি নতুন সিকোয়েন্স কতটুকু মিলছে তা স্কোরের মাধ্যমে দেখা।
2. Decoding (ডিকোডিং): আমাদের কাছে মডেল এবং আউটপুট সিকোয়েন্স আছে। এখন আমাদের বের করতে হবে, এই আউটপুটটি তৈরি করার জন্য ব্যাকগ্রাউন্ডে স্টেটের সবচেয়ে সম্ভাব্য সিকোয়েন্স (Most likely state sequence) কোনটি ছিল? যেমন: DNA সিকোয়েন্সে ঠিক কোথায় জিন শুরু আর কোথায় শেষ হয়েছে তা চিহ্নিত করা।
3. Learning (শিক্ষা/ট্রেনিং): আমাদের কাছে শুধু ডেটা (আউটপুট সিকোয়েন্স) আছে, কিন্তু মডেলের প্রোবাবিলিটিগুলো জানা নেই (Untrained HMM)। ডেটা দেখে দেখে মডেলের প্যারামিটারগুলো টিউন বা ট্রেন করাই হলো লার্নিং।
মৌলিক কিছু নিয়ম (Basic Facts):
- যেকোনো একটি স্টেট থেকে যতগুলো ট্রানজিশন লাইন বা এজ (Edge) বের হয়ে যাবে, তাদের সম্ভাবনার যোগফল সবসময় ১ (1) হতে হবে।
- একটি এজ বা লিঙ্কের সাথে যুক্ত সমস্ত আউটপুট প্রোবাবিলিটির যোগফলও সবসময় ১ (1) হবে।
- $a_{ij}$ হলো একটি কন্ডিশনাল প্রোবাবিলিটি বা শর্তাধীন সম্ভাবনা। এর মানে হলো, সময় $t$-তে মডেলটি $i$ স্টেটে ছিল এই শর্তে $t+1$ সময়ে $j$ স্টেটে যাওয়ার সম্ভাবনা কত।
এর গুরুত্ব কী? (Why it matters): এই সমস্যাগুলো এবং নিয়মগুলো জানা জরুরি কারণ এগুলোই HMM অ্যাপ্লিকেশনের মূল ভিত্তি। আমরা যখন স্পিচ বা ডিএনএ রিকগনিশন করি, তখন মূলত এই সমস্যাগুলোরই কোডিং সমাধান বের করি।

[!NOTE] IMPORTANT NOTES FOR NOTEBOOK Concept: The Three Fundamental HMM Problems & Foundational Probability Axioms Key Point 1: The three problems are Evaluation (Scoring matching probability), Decoding (Uncovering the hidden sequence path), and Learning (Training weights from raw outputs). Key Point 2: Sum of all exiting state transition probabilities must equal 1 ($\sum_j a_{ij} = 1$). Advantage: Provides a complete structural framework to solve scoring, pattern finding, and machine learning tasks. Disadvantage: Learning parameters from completely untrained HMMs often runs into local optima issues (e.g., in Baum-Welch algorithm).

Solving Evaluation: The Forward Algorithm & Trellis (Slides 21–25)

এখন আমরা প্রথম সমস্যাটির সমাধান দেখব—যা হলো Evaluation। এটি সমাধান করার জন্য ব্যবহার করা হয় Forward Algorithm এবং এই অ্যালগরিদমে হিসাব নিকাশ করার জন্য একটি গ্রিড বা জালি তৈরি করা হয়, যাকে বলে Trellis।

এটি কী? (What it is): ফরওয়ার্ড অ্যালগরিদম হলো একটি ডাইনামিক প্রোগ্রামিং (Dynamic Programming) পদ্ধতি, যা একটি HMM মডেল থেকে কোনো নির্দিষ্ট আউটপুট পাওয়ার মোট সম্ভাবনা হিসাব করে।
এটি কীভাবে কাজ করে? (How it works):
- স্লাইডের উদাহরণে দুটি স্টেট দেওয়া আছে: $S_1$ (শুরুর স্টেট) এবং $S_2$ (শেষের স্টেট)।
- সময়ের সাথে সাথে একটি গ্রিড বা Trellis তৈরি করা হয়। স্লাইডের ডায়াগ্রাম অনুযায়ী, প্রতিটি স্টেপ বা টাইম-এ স্টেটে পৌঁছানোর প্রোবাবিলিটিগুলো গুণ এবং যোগ করে সামনে এগিয়ে নেওয়া হয়।
- ধাপ ১ (Time 1): শুরুর দিকের সম্ভাবনা হিসাব করে দেখা গেল দুটি পাথের মান এসেছে যথাক্রমে 0.48 এবং 0.20।
- ধাপ ২ (Time 2): এই মানগুলোকে ব্যবহার করে এবং ট্রানজিশন ও এমিশন প্রোবাবিলিটি গুণ করে পরবর্তী স্টেটের মান বের করা হয়। যেমন স্লাইডে দেখানো হয়েছে:
  - একটি পাথের মান আসে: $0.0576 + 0.018 = 0.0756$
  - অন্য পাথের মান আসে: $0.126 + 0.096 = 0.222$
- এভাবে প্রতিটি ধাপে পূর্ববর্তী ধাপের ফলাফল ব্যবহার করে চূড়ান্ত আউটপুটের মোট সম্ভাবনা বের করা হয়।
এর গুরুত্ব কী? (Why it matters): যদি আমরা এই গ্রিড বা Trellis ব্যবহার না করে ম্যানুয়ালি সব আলাদা আলাদা রাস্তার (Paths) হিসাব করতে যেতাম, তবে সিকোয়েন্স বড় হলে কোটি কোটি পাথের হিসাব করতে গিয়ে কম্পিউটার ক্র্যাশ করত। ফরওয়ার্ড অ্যালগরিদম ডাইনামিক প্রোগ্রামিং ব্যবহার করে চোখের পলকে এই হিসাব নিখুঁতভাবে করে ফেলে।

[!NOTE] IMPORTANT NOTES FOR NOTEBOOK Concept: Forward Algorithm and Trellis Execution Key Point 1: Utilizes a matrix structure called a Trellis to recursively accumulate sub-path emission and transition probabilities over time. Key Point 2: Solves the evaluation problem in polynomial time $O(N^2T)$ instead of exponential time $O(N^T)$ where $N$ is states and $T$ is sequence length. Advantage: Extremely fast and optimal for scoring data compatibility against a specific HMM profile. Disadvantage: Suffers from numerical underflow if probabilities are multiplied iteratively without log-scale transformations.

Forward Algorithm: Equations & Recursive Formulation (Slides 26–30)

লেকচারের এই শেষ অংশে ফরওয়ার্ড অ্যালগরিদমের গাণিতিক সমীকরণ এবং এর Recursive Formulation (পুনরাবৃত্তিমূলক সূত্র) দেখানো হয়েছে।

সমীকরণগুলো কী? (What it is): কোনো একটি সিকোয়েন্স $y$ (যার দৈর্ঘ্য $T$) একটি HMM দ্বারা নির্গত হওয়ার মোট সম্ভাবনা হলো—এমন সমস্ত সম্ভাব্য পাথের (Paths) প্রোবাবিলিটির যোগফল, যা ওই সিকোয়েন্সটি তৈরি করতে পারে।
এটি কীভাবে কাজ করে? (How it works):
- Disjoint Paths Fact: যেহেতু আমরা একবারে কেবল একটিমাত্র নির্দিষ্ট পথ ধরেই হাঁটতে পারি, তাই সমস্ত সম্ভাব্য পথগুলো পরস্পরের থেকে আলাদা (Disjoint)। আর এই কারণে আমরা সরলভাবে তাদের প্রোবাবিলিটিগুলো যোগ করতে পারি।
- Markov Assumptions: হিসাব সহজ করার জন্য দুটি গুরুত্বপূর্ণ অনুমিতি ব্যবহার করা হয়:
  1. Transition Probability Rule: পরবর্তী স্টেটে যাওয়ার সিদ্ধান্ত কেবল বর্তমান স্টেটের ওপর নির্ভরশীল।
  2. Output Probability Rule: যেকোনো সময়ে আউটপুট তৈরি হওয়াটা কেবল সেই সময়ে নেওয়া ট্রানজিশনের ওপর নির্ভর করে।
- Recursive Step: প্রতিটি ধাপে আমাদের কেবল ৩টি জিনিস মাথায় রাখতে হয়—বর্তমান আউটপুট ($y_t$), বর্তমান স্টেট ($x_t$), এবং পরবর্তী স্টেট ($x_{t+1}$)। এই ৩টি ভ্যারিয়েবল দিয়ে রিকার্সিভলি পুরো হিসাবটি সম্পন্ন করা হয়।
এর গুরুত্ব কী? (Why it matters): এই রিকার্সিভ ফর্মুলার কারণেই আমরা কোডিংয়ে একটি সিম্পল for লুপ বা রিকার্সিভ ফাংশন লিখে পুরো অ্যালগরিদমটি ইমপ্লিমেন্ট করে ফেলতে পারি। এটিই গাণিতিক তত্ত্বকে সফটওয়্যার কোডে রূপান্তর করার চাবিকাঠি।

[!NOTE] IMPORTANT NOTES FOR NOTEBOOK Concept: Mathematical Formulae & Recursion in Forward Algorithm Key Point 1: Total sequence emission probability is the sum of all individual path execution probabilities because all paths are disjoint events. Key Point 2: At any recursive iteration step $t$, the formula uniquely requires tracking only three token variables: $y_t$, $x_t$, and $x_{t+1}$. Advantage: Allows elegant programmatic implementation using standard iterative loops, saving enormous operational memory. Disadvantage: Requires strict alignment with the Markov assumptions; if the physical system defies these assumptions, the mathematical output fails.

পুরো লেকচারটি আমরা সফলভাবে শেষ করলাম! আশা করি Hidden Markov Model এখন তোমার কাছে পরিষ্কার।

কোনো topic নিয়ে আরো বিস্তারিত জানতে চাও?

Demystifying Hidden Markov Models (HMM): How AI Predicts the Unseen! Hidden Markov Models (HMM): এআই কীভাবে অদেখা জিনিস অনুমান করে?

Introduction and Applications of HMMs (Slides 1–2)

What is an HMM & HMM Notation (Slides 3–4)

HMM Example - Casino Coin & DNA (Slides 5–12)

Properties of an HMM (Slide 13)

Three Classic HMM Problems & Basic Facts (Slides 14–20)

Solving Evaluation: The Forward Algorithm & Trellis (Slides 21–25)

Forward Algorithm: Equations & Recursive Formulation (Slides 26–30)

Introduction and Applications of HMMs (Slides 1–2)

What is an HMM & HMM Notation (Slides 3–4)

HMM Example - Casino Coin & DNA (Slides 5–12)

Properties of an HMM (Slide 13)

Three Classic HMM Problems & Basic Facts (Slides 14–20)

Solving Evaluation: The Forward Algorithm & Trellis (Slides 21–25)

Forward Algorithm: Equations & Recursive Formulation (Slides 26–30)

Demystifying Hidden Markov Models (HMM): How AI Predicts the Unseen!

Hidden Markov Models (HMM): এআই কীভাবে অদেখা জিনিস অনুমান করে?