Key to synthetic data: Synthetic Part + Real part

E.g. Dalle3 was trained with real images and generated captions or real captions generated images (see interview with James Betker).

Reminds me off the grounding term from this paper: ^84b57c