The position of each token in a sequence is encoded using the following formula and then added on top of the token's embedding vector. $$PE_{(pos, 2i)} = sin(pos ...