182. Attention Head
A component of the attention mechanism in transformer models that processes input data to focus on different parts of the sequence, enabling the model to capture various relationships.
Previous181. Adversarial TrainingNext183. Bidirectional Encoder Representations from Transformers (BERT)
Last updated