Subquadratic claims it has overcome critical architecture/ hardware bottleneck limiting the scaling and speed of Large Language Models
"quadratic attention makes LLMs increasingly slow, expensive, power-hungry as context lengths grow, overcome replacing dense attention with dynamic sparse attention selectively tracking only most relevant token relationships, on the fly... up to 56X faster handling data-heavy tasks with competitive accuracy... processes massive context windows up to 12 million tokens (12X greater capacity) at tiny fraction of cost... needs validation: built by swapping attention mechanisms on top of existing open-source model"
Comments
Post a Comment