Nucleus-Image is open source, with 17B parameter inference only activating 2B, surpassing Imagen4 without post-training benchmarks

robot
Abstract generation in progress

ME News Report, April 16 (UTC+8), according to Beating Monitoring, the Nucleus AI team released the text-to-image model Nucleus-Image, simultaneously open-sourcing the model weights, training code, and training dataset. The license is Apache 2.0, allowing commercial use. The model adopts a sparse mixture of experts (MoE) diffusion transformer architecture, with a total of 17 billion parameters, distributed across 64 routing experts per layer. During inference, only about 2 billion parameters are activated, significantly reducing inference costs compared to dense models of similar size.

On three standard benchmarks, Nucleus-Image matches or even surpasses closed-source leading models: GenEval score of 0.87, on par with Qianwen Image Model, with the spatial position sub-item (0.85) ranking first among all comparison models; DPG-Bench score of 88.79, ranking first overall; OneIG-Bench score of 0.522, surpassing Google Imagen4 (0.515) and Recraft V3 (0.502). All these results are based solely on pure pretraining, without DPO, reinforcement learning, or human preference tuning.

Nucleus AI official states this is “the first fully open-source MoE diffusion model at this quality level.” The training data was crawled extensively from the internet, filtered, deduplicated, and aesthetically scored multiple times, resulting in 700 million images and 1.5 billion image-text pairs. Training proceeded in three stages, gradually increasing resolution from 256 to 1024, totaling 1.7 million steps.

The text encoder uses Qwen3-VL-8B-Instruct, called via the diffusers library, with built-in cross-denoising step text KV cache, further reducing inference overhead.

For developers needing local deployment of image generation, the design of 17B parameters with only 2B active means consumer-grade GPUs can run it. Complete open-source release (weights + training code + dataset) is relatively rare—most open-source image models only release weights, with datasets and training details still closed, which is one of the main bottlenecks for reproducible research in text-to-image generation.

(Source: BlockBeats)

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin