Diversity-Rewarded CFG Distillation

Paper

Abstract Generative models are transforming creative domains such as music generation, with inference-time strategies like Classifier-Free Guidance (CFG) playing a crucial role. However, CFG doubles inference cost while limiting originality and diversity across generated contents. In this paper, we introduce diversity-rewarded CFG distillation, a novel finetuning procedure that distills the strengths of CFG while addressing its limitations. Our approach optimises two training objectives: (1) a distillation objective, encouraging the model alone (without CFG) to imitate the CFG-augmented predictions, and (2) an RL objective with a diversity reward, promoting the generation of diverse outputs for a given prompt. By finetuning, we learn model weights with the ability to generate high-quality and diverse outputs, without any inference overhead. This also unlocks the potential of weight-based model merging strategies: by interpolating between the weights of two models (the first focusing on quality, the second on diversity), we can control the quality-diversity trade-off at deployment time, and even further boost performance. We conduct extensive experiments on the MusicLM (Agostinelli et al., 2023) text-to-music generative model, where our approach surpasses CFG in terms of quality-diversity Pareto optimality. According to human evaluators, our finetuned-then-merged model generates samples with higher quality-diversity than the base model augmented with CFG.

Geoffrey Cideron, Andrea Agostinelli, Johan Ferret, Sertan Girgin, Romuald Elie, Olivier Bachem, Sarah Perrin^* and Alexandre Ramé^*

Google DeepMind

* Equal advisory contribution

Corresponding author: gcideron@google.com

Comparing the diversity of generations across different models.

BASE: MusicRL-R, a RL finetuned checkpoint of MusicLM (Cideron et al., 2024)
CFG: Base + CFG at inference time, provides quality improvement at the expense of diversity and inference cost
BETA 0: CFG-distilled with a focus on quality, provides same quality as applying CFG at inference time, without additional inference cost
BETA 15: CFG-distilled + diversity reward, provides more diversity than applying CFG at inference time
LERP 50/50: Our best model, achieving the quality of applying CFG at inference time, while also offering greater diversity and lower running costs

Caption	BASE	CFG	BETA 0	BETA 15	LERP 50/50