BI - LSTM ‘Long-Short Term Memories’

Questions

What are Long-Short Term Memories (LSTMs)?
- Long-Short Term Memories (LSTMs) are a type of recurrent neural network architecture that is designed to address the vanishing/exploding gradient problem and improve the network’s ability to capture long-term dependencies in sequential data.
- At their core, LSTMs are similar to traditional recurrent neural networks in that they use feedback connections to maintain an internal memory of previous inputs.
  ==However, LSTMs include several additional components, including memory cells and gating mechanisms, that allow the network to selectively store and retrieve information over multiple time steps==.
- The key component of an LSTM is the memory cell, which is a self-contained unit that can store information over time.
  The memory cell is controlled by three gating mechanisms: the input gate, the forget gate, and the output gate.
  - ==The input gate controls how much new information is added to the memory cell==.
  - ==The forget gate controls how much information is retained from the previous memory state==.
  - ==Finally, the output gate controls how much information is read out from the memory cell to generate the network’s output==.
- Together, these gating mechanisms allow LSTMs to selectively store and retrieve information over multiple time steps, making them well-suited to tasks that require the network to remember long-term dependencies in sequential data.
- LSTMs have been used successfully in a wide range of applications, including speech recognition, natural language processing, and time series prediction.
  Their ability to capture long-term dependencies and avoid the vanishing/exploding gradient problem has made them a popular choice for modeling complex sequential data in deep learning.

—————————————————————

Online Resources

Youtube ‘Long Short-Term Memory (LSTM), Clearly Explained’ by ‘StatQuest’

—————————————————————

Slides with Notes

NOTE: The previous hidden state $h_{t - 1}$ and the input $x_{t}$ form a vector together that is then passed to 3 sigmoid and 1 tanh, in the GIF it is pretty cleared, but just to avoid confusion we note that $h_{t - 1}$ and $x_{t}$ are stacked together.

FORMULAS: $\overset{x}{ˉ}_{t} = [h_{t}, x_{t}]$ : complete input vector $f_{t} \propto σ (\overset{x}{ˉ}_{t})$ : forget gate $i_{t} \propto σ (\overset{x}{ˉ}_{t})$ : input gate $\overset{c}{ˉ}_{t} \propto tanh (\overset{x}{ˉ}_{t})$ : candidate gate or read gate These are not the exact formula (in fact this are dependencies: $\propto$ ), at each passage the LSTM cell adds a bias and multiplies the input with a vector of parameters $\overset{ˉ}{θ}$ .

Instead these are exact formulas: $c_{t} = f_{t} ⊙ c_{t - 1} + i_{t} ⊙ \overset{c}{ˉ}_{t}$ : new cell state (where $⊙$ is element wise multiplication) $h_{t} = tanh (c_{t}) ⊙ σ (\overset{x}{ˉ}_{t})$ : new hidden state

🪴 Quartz 4.0

Explorer

BI - LSTM ‘Long-Short Term Memories’

Questions

—————————————————————

Online Resources

—————————————————————

Slides with Notes

Graph View

Backlinks