AI Music Glossary

A comprehensive reference for the terms used in AI music production and distribution in 2026. Ordered alphabetically.

Audio Language Model: A type of AI model that generates audio by predicting sequences of audio tokens, similar to how text language models predict word sequences. Suno and Udio use audio language model architectures. These models excel at producing complete, structured compositions including melody, harmony, rhythm, and vocals.
Audio Token: A discrete numerical representation of a small segment of audio, analogous to a word token in text models. Audio language models convert music into sequences of tokens, train to predict the next token, and decode token sequences back into audio at generation time.
BPM (Beats Per Minute): The tempo measurement of music — how many beats occur in one minute. EDM typically runs 120–140 BPM. Techno: 130–145 BPM. Trap: 120–160 BPM. Lo-fi hip-hop: 65–85 BPM. Ambient: 0 BPM (no fixed tempo). Specifying BPM in AI music prompts gives the model a precise tempo target.
Chaining (Prompt Chaining): A technique for building longer AI tracks by generating shorter segments sequentially, each based on the previous one. Rather than generating a full 4-minute track in one prompt, you generate an intro, extend it into a build, add a drop, and continue — giving structural control not possible in single generations.
Commercial Rights: The legal permission to use AI-generated music for commercial purposes: Spotify distribution, YouTube monetisation, sync licensing, advertising, and other revenue-generating uses. Suno and Udio grant commercial rights at paid tiers. Free tiers typically permit personal use only.
Content ID: YouTube's automated copyright enforcement system. It matches uploaded audio against a database of registered tracks. AI-generated music you create has no corresponding registration, so Content ID cannot match against it — making AI music a safe choice for YouTube content.
Demucs: An open-source AI-based audio source separation tool developed by Meta. Takes a mixed track as input and separates it into stems: vocals, drums, bass, and other instruments. Used by professional AI music producers to extract and edit individual elements from AI-generated tracks.
Diffusion Model: An AI model architecture that generates audio by starting from random noise and iteratively denoising it toward a target signal, guided by a text prompt. Stable Audio and Meta's AudioCraft use diffusion-based approaches. Strong for atmospheric textures, electronic music, and ambient soundscapes.
Distribution (Music Distribution): The process of delivering music to streaming platforms. Distributors like DistroKid, TuneCore, and CD Baby accept AI-generated music (with disclosure where required) and distribute to Spotify, Apple Music, Amazon Music, YouTube Music, and 50+ other platforms. Distribution typically costs $15–$25/year per track or album.
GAN (Generative Adversarial Network): An older AI architecture where two networks compete — a generator that creates content and a discriminator that tries to identify it as fake. Earlier AI music tools used GANs. Current state-of-the-art tools (Suno, Udio) use transformer-based audio language models or diffusion models, which produce significantly better results.
Latent Space: The mathematical space in which AI models represent audio or music. Different points in latent space correspond to different sounds or styles. Some AI tools allow navigation of latent space (interpolating between styles) for more controlled generation.
LUFS (Loudness Units relative to Full Scale): The standard measurement for audio loudness in streaming. Spotify normalises streams to -14 LUFS; Apple Music and YouTube to -16 LUFS. Professional AI music producers master their tracks to platform-appropriate loudness levels. Raw AI output is typically at lower LUFS and benefits from mastering.
Mastering: The final audio processing step that brings a track to professional loudness, frequency balance, and clarity for distribution. AI-generated tracks typically require mastering before streaming platform distribution. Tools: LANDR (AI mastering), iZotope Ozone, or manual mastering in a DAW.
MusicGen: An open-source audio generation model developed by Meta Research as part of the AudioCraft framework. Generates up to 30 seconds of audio from text prompts or melody conditioning. Freely available on Hugging Face; can run locally or on Google Colab.
PRO (Performing Rights Organisation): Organisations (ASCAP, BMI in the US; PRS in the UK; SOCAN in Canada) that collect royalties when music is played publicly (radio, streaming, live venues). AI-generated music can be registered with PROs, but royalty collection depends on copyright status in the relevant jurisdiction.
Prompt Engineering: The practice of designing effective text prompts to guide AI models toward desired outputs. In AI music, prompt engineering involves specifying genre, BPM, instruments, mood, energy, structure, and reference anchors to consistently produce high-quality results. Prompt quality is the primary differentiator between amateur and professional AI music production.
Stable Audio: An AI music generation model developed by Stability AI. Stable Audio Open is a free, open-source model that generates up to 47 seconds of audio from text prompts. Strong for electronic and ambient music. Available on Hugging Face and as a commercial API.
Stem: An individual isolated track element — vocals, drums, bass, melody — separated from a mixed recording. Stems allow post-production editing of individual components. AI tools like Demucs extract stems from generated tracks; some AI tools (Udio) offer stem export directly.
Streaming Royalty: Payment from streaming platforms per play of a track. Spotify pays approximately $0.003–$0.005 per stream; Apple Music approximately $0.007–$0.01. These figures apply equally to AI-generated and human-made music. Distribution, not origin, determines royalty eligibility.
Suno: One of the leading AI music generation platforms (suno.com). Generates complete songs including vocals, melody, and production from text prompts. Known for accessibility, speed, and strong performance in pop, hip-hop, and EDM genres. Paid tiers include commercial rights.
Timbre: The characteristic quality of a sound that distinguishes it from other sounds of the same pitch. A piano and a guitar playing the same note have different timbres. In AI music prompting, instrument names and descriptions guide the model's timbral choices.
Transformer (Architecture): The neural network architecture underlying most modern AI models, including audio language models. Transformers process sequences of tokens using attention mechanisms, allowing them to capture long-range dependencies in music — structure, tension, and resolution across a full track.
Udio: An AI music generation platform (udio.com) known for high-quality outputs and strong genre precision, particularly in electronic music. Generates complete tracks from text prompts with extension and stem separation tools. Paid tiers include commercial rights.