Expand ↗
Page list (942)

Scaled Dot-Product Attention

Vaswani et al.: attention weights = softmax(QK^T / √d_k) V. The single-head primitive underlying multi-head attention.

In this vault

Last changed by zetl · stable 5d · history

Backlinks