ME News Report, April 16 (UTC+8), according to Beating Monitoring, the Nucleus AI team released the text-to-image model Nucleus-Image, simultaneously open-sourcing the model weights, training code, and training dataset. The license is Apache 2.0, allowing commercial use. The model adopts a sparse mixture of experts (MoE) diffusion transformer architecture, with a total of 17 billion parameters, distributed across 64 routing experts per layer. During inference, only about 2 billion parameters are activated, significantly reducing inference costs compared to dense models of similar size.

On three standard benchmarks, Nucleus-Image matches or even surpasses closed-source leading models: GenEval score of 0.87, on par with Qianwen Image Model, with the spatial position sub-item (0.85) ranking first among all comparison models; DPG-Bench score of 88.79, ranking first overall; OneIG-Bench score of 0.522, surpassing Google Imagen4 (0.515) and Recraft V3 (0.502). All these results are based solely on pure pretraining, without DPO, reinforcement learning, or human preference tuning.

Nucleus AI official states this is “the first fully open-source MoE diffusion model at this quality level.” The training data was crawled extensively from the internet, filtered, deduplicated, and aesthetically scored multiple times, resulting in 700 million images and 1.5 billion image-text pairs. Training proceeded in three stages, gradually increasing resolution from 256 to 1024, totaling 1.7 million steps.

The text encoder uses Qwen3-VL-8B-Instruct, called via the diffusers library, with built-in cross-denoising step text KV cache, further reducing inference overhead.

For developers needing local deployment of image generation, the design of 17B parameters with only 2B active means consumer-grade GPUs can run it. Complete open-source release (weights + training code + dataset) is relatively rare—most open-source image models only release weights, with datasets and training details still closed, which is one of the main bottlenecks for reproducible research in text-to-image generation.

(Source: BlockBeats)

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
GatePreIPOsLaunchesWithSpaceX
150.74K Popularity
#
Gate13thAnniversaryLive
413.94K Popularity
#
US-IranTalksVSTroopBuildup
771.29K Popularity
#
CryptoMarketRecovery
97.3K Popularity
#
WCTCTradingChallengeShare8MUSDT
625.25K Popularity

Sitemap

Nucleus-Image is open source, with 17B parameter inference only activating 2B, surpassing Imagen4 without post-training benchmarks

Trending Topics

GatePreIPOsLaunchesWithSpaceX

Gate13thAnniversaryLive

US-IranTalksVSTroopBuildup

CryptoMarketRecovery

WCTCTradingChallengeShare8MUSDT

Pin