A neural network architecture that uses self-attention mechanisms, allowing models to weigh the importance of different parts of the input data, particularly effective in processing sequential data like text.
Last updated 9 months ago