It looks like 2028 at the earliest before demand subsides and costs come down from the AI boom.
Abstract: Processing-In-Memory (PIM) architectures alleviate the memory bottleneck in the decode phase of large language model (LLM) inference by performing operations like GEMV and Softmax in memory.