Subquadratic claims it has overcome critical architecture/ hardware bottleneck limiting the scaling and speed of Large Language Models

https://www.technologyreview.com/2026/06/19/1139313/a-startup-claims-it-broke-through-a-bottleneck-thats-holding-back-llms/

"quadratic attention makes LLMs increasingly slow, expensive, power-hungry as context lengths grow, overcome replacing dense attention with dynamic sparse attention selectively tracking only most relevant token relationships, on the fly... up to 56X faster handling data-heavy tasks with competitive accuracy... processes massive context windows up to 12 million tokens (12X greater capacity) at tiny fraction of cost... needs validation: built by swapping attention mechanisms on top of existing open-source model"

Comments