MiniMax-M3 Launches with 1-Million-Token Context Window and Multimodal Capabilities

MiniMax has released its latest model, MiniMax-M3, bringing a context window of 1 million tokens and support for multiple data types. The model is designed to handle long-form documents and complex queries that mix text, images, and other inputs — all in one go.

A 1-Million-Token Context Window

The most prominent feature of MiniMax-M3 is its ability to process up to one million tokens at once. That's roughly the equivalent of three full-length novels in a single pass, allowing the model to maintain coherence across very long inputs without losing track of earlier information. For tasks like legal document review, scientific paper analysis, or long chat histories, this kind of capacity could be a major advantage.

Sparse Attention for Efficiency

To make that large context window practical, MiniMax-M3 uses sparse attention. Instead of computing relationships between every pair of tokens — which would be prohibitively expensive at this scale — the model focuses on the most relevant connections. This selective approach cuts down on computation while still capturing the essential dependencies across the sequence, enabling the model to run faster and use fewer resources than a dense attention mechanism would.

Together AI's Role in Deployment

MiniMax didn't build the inference stack alone. Together AI provided optimizations to make large-scale deployment efficient. Their infrastructure is tuned to handle the demands of a 1-million-token model, handling memory management and parallelization so that users can actually run MiniMax-M3 without needing a supercomputer. The partnership means the model is ready for production use from the start, not just a research demonstration.

What Multimodality Means for Users

Beyond the long context, MiniMax-M3 also handles multiple modalities — text, images, and likely other formats. That means a user could feed it a long report with charts, diagrams, and written analysis, and get a single coherent response that understands all parts together. The model isn't limited to just text; it can reason across different types of data in the same session, which opens up use cases in design review, medical imaging combined with patient records, or technical documentation that includes screenshots.

MiniMax-M3 is now available through the company's platform and through Together AI's inference service. The model's performance on standard benchmarks hasn't been disclosed yet, but the combination of massive context and multimodality positions it as a contender in the increasingly crowded field of large language models.

A 1-Million-Token Context Window

Sparse Attention for Efficiency

Together AI's Role in Deployment

What Multimodality Means for Users

Related Articles