Prim, Tadi | KEEP

Description

This work focuses on combining multiple different technologies to produce a scalable, full-stack music generation and sharing application meant to be deployed to a cloud environment while keeping operating costs as low as possible. The key feature of this a…

This work focuses on combining multiple different technologies to produce a scalable, full-stack music generation and sharing application meant to be deployed to a cloud environment while keeping operating costs as low as possible. The key feature of this app is that it allows users to generate tracks from scratch by providing a text description, or customize existing tracks by supplying both an audio file and a track description. Users will be able to share these tracks with other users, via this app, so that they can collaborate with others and jumpstart their creative process, allowing creators to produce more content for their fans. A web app was developed; Contak. This application requires a database, REST API, object storage, music generation artificial intelligence models, and a web application (GUI) to interact with the user. In order to define the best music generation model, a small exploratory study was conducted to compare the quality of different music generation models, including MusicGen, MusicLM, and Riffusion. Results found that the MusicGen model, selected for this work, outperformed the competing models: MusicLM and Riffusion. This exploratory study includes rankings of the three models based on how well each one adhered to a text description of a track. The purpose was to test the hypothesis that MusicGen produces higher quality music that adheres to text descriptions better than other models because it encodes audio at a higher bit rate (32 kHz). While the web app generates high quality tracks with above average text adherence, the main limitation of this work is the response time needed to generate tracks from existing audio using the currently available backend infrastructure, as this can take up to 7 minutes to complete. In the future, this app can be deployed to a cloud environment with GPU acceleration to improve response times and throughput. Additionally, new methods of input besides text and audio input can be implemented using MIDI instructions and the Magenta music model, providing increased track generation precision for advanced music creators with MIDI experience.

Date Created

2023-12

Agent

Author (aut): Zamora, Michael
Thesis director: Chavez Echeagaray, Maria
Committee member: Prim, Tadi
Committee member: Day, Kimberly
Contributor (ctb): Barrett, The Honors College
Contributor (ctb): Computer Science and Engineering Program