200. Self-Attention

A mechanism that allows models to weigh the importance of different words in a sequence relative to each other, crucial for capturing long-range dependencies in data. ​

Last updated