Designing for "Vibes"

Text-To-Music AI Generation Interface Offers New Options for Content Creators

August 19, 2025

Screenshot: The AI interface uses image and transcript data, as well as the "vibe" parameters set by the creator, to generate sames of music to match the mood of the video. — The AI interface uses image and transcript data as well as the "vibe" parameters set by the creator, to generate sames of music to match the mood of the video.

Anyone who has ever worked to carefully craft a video, whether to commemorate a family event or immortalize an amusing moment with a pet, knows that getting all of the elements just right can be a challenge. While a 30-second TikTok might be able to take advantage of trending sounds, many content creators struggle with the amount of time it takes to find background music that meets the tone they’re looking for.

New work led by researchers from Carnegie Mellon University’s Center for Transformational Play may eliminate the need for time-consuming musical data searches across multiple data fields and instead allow editors to search by “vibes.”

“Video content creators really care about the music they use in their videos, even if it’s background music. The problem is the music search process in existing platforms can be complicated,” said Noor Hammad. Hammad is a PhD student studying Human-Computer Interaction at CMU and recently collaborated with Adobe on research developing a new kind of text-to-music artificial intelligence interface.

“In existing music search options, it’s difficult to articulate music preferences in terms of specific music descriptors, such as genre, instruments, tempo, or the components of a song,” Hammad said. “Also a lot of the music creators want is not freely available for use, they would have to purchase it. Stock music platforms have a limited library. These problems result in people reusing music or just settling for something, even if they don’t want that song.”

A formative study with video creators indicated that all of the participants faced challenges in articulating and iterating on musical preferences and described music as “vibes,” rather than with explicit musical vocabulary. Guided by these insights, Hammad and her collaborators developed a creative assistant for music generation nicknamed “VibeClip.” This AI-supported interface allows content creators to express music in terms of the emotional resonance or “vibe” that they’re looking for.

The assistant begins with part of a video clip selected by a creator. The system uses the video’s transcript and visual data as the initial input. Creators can then modify or rewrite the description as they wish.

“For example, rather than specifying you’re looking for a Lo-Fi song with 30 beats per minute and a guitar, you could type: ‘I want a song that makes me feel chill, like I’m on a beach somewhere,’ and the system would figure out that what the descriptions is and what genres and other information to search for,” Hammad said.

The result from the “vibe” search provides several AI-generated musical tracks that can be previewed along with the video to determine which one best meets the creator’s needs.

The Center for Transformational Play’s Director, Jessica Hammer, served as a faculty advisor on the project. Hammer said that she knows that how generative AI is used raises ethical questions, so she appreciates that the research team considered the ethics of both what data the system used and how the assistant is designed to augment creatives.

“This is an example of AI being a partner for a human creator. It’s not trying to replace what the human creator is doing but rather to help them realize their vision,” Hammer said.

Hammad and her co-researchers presented their work last month in Portugal at the 2025 ACM Designing Interactive Systems (DIS) conference. The international event is focused on the future of interactive design and human-computer interaction. Hammad said the reception to the team’s work was extremely positive and generated a lot of interest among creatives who are excited about the possibility of one day having a tool like VibeClip integrated into their editing platforms.

“The goal of this project isn’t to replace stock musicians. It’s to offer content creators a means of controllable music specification.” Hammad said. “Even if you’re not a music expert you’re still opinionated about the kind of music you want. With VibeClip we’re able to give people the vocabulary to express that opinion.”

Watch the 2:27 trailer about VibeClip

For More Information
CMU Center for Transformational Play
Maila Rible | 336-906-0103 | mrible@cmu.edu

Related People
Noor Hammad, Jessica Hammer

Research Areas

Human-Centered AI Games and Play