Specify genre, tempo, mood, instruments, language, and voice type. If using image or video inputs, include clear instructions describing how the system should interpret visual elements. Step 3: ...