Long Context Language Model

Cryptopolitan on MSN

DeepSeek V4 rumored to outperform ChatGPT and Claude in long-context coding

February, is rumored to outperform ChatGPT and Claude in long-context coding, targeting elite-level coding tasks.

New Llama 4 AI Model 10 Million Token Context Window

Meta has unveiled Llama 4, its latest artificial intelligence model, designed to redefine the boundaries of AI technology. This advanced model comes in two distinct variants—Maverick and Scout—each ...

VentureBeat

DeepSeek drops open-source model that compresses text 10x through images, defying conventions

DeepSeek, the Chinese artificial intelligence research company that has repeatedly challenged assumptions about AI development costs, has released a new model that fundamentally reimagines how large ...

TechCrunch

DeepSeek releases ‘sparse attention’ model that cuts API costs in half

Researchers at DeepSeek on Monday released a new experimental model called V3.2-exp, designed to have dramatically lower inference costs when used in long-context operations. DeepSeek announced the ...

InfoQ

Gemma 3 Supports Vision-Language Understanding, Long Context Handling, and Improved Multilinguality

A monthly overview of things you need to know as an architect or aspiring architect. Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with ...

9to5Mac

Apple trained a large language model to efficiently understand long-form video

Apple researchers have developed an adapted version of the SlowFast-LLaVA model that beats larger models at long-form video analysis and understanding. Here’s what that means. Very basically, when an ...

TechCrunch

Anthropic’s Claude AI model can now handle longer prompts

Anthropic is increasing the amount of information that enterprise customers can send to Claude in a single prompt, part of an effort to attract more developers to the company’s popular AI coding ...

Semiconductor Engineering

HW-Aligned Sparse Attention Architecture For Efficient Long-Context Modeling (DeepSeek et al.)

A new technical paper titled “Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention” was published by DeepSeek, Peking University and University of Washington.

Some results have been hidden because they may be inaccessible to you

Show inaccessible results