Full metadata
Title
Full-stack web application that utilizes AI to create music from a short text description.
Description
This work focuses on combining multiple different technologies to produce a scalable, full-stack music generation and sharing application meant to be deployed to a cloud environment while keeping operating costs as low as possible. The key feature of this app is that it allows users to generate tracks from scratch by providing a text description, or customize existing tracks by supplying both an audio file and a track description. Users will be able to share these tracks with other users, via this app, so that they can collaborate with others and jumpstart their creative process, allowing creators to produce more content for their fans. A web app was developed; Contak. This application requires a database, REST API, object storage, music generation artificial intelligence models, and a web application (GUI) to interact with the user. In order to define the best music generation model, a small exploratory study was conducted to compare the quality of different music generation models, including MusicGen, MusicLM, and Riffusion. Results found that the MusicGen model, selected for this work, outperformed the competing models: MusicLM and Riffusion. This exploratory study includes rankings of the three models based on how well each one adhered to a text description of a track. The purpose was to test the hypothesis that MusicGen produces higher quality music that adheres to text descriptions better than other models because it encodes audio at a higher bit rate (32 kHz). While the web app generates high quality tracks with above average text adherence, the main limitation of this work is the response time needed to generate tracks from existing audio using the currently available backend infrastructure, as this can take up to 7 minutes to complete. In the future, this app can be deployed to a cloud environment with GPU acceleration to improve response times and throughput. Additionally, new methods of input besides text and audio input can be implemented using MIDI instructions and the Magenta music model, providing increased track generation precision for advanced music creators with MIDI experience.
Date Created
2023-12
Contributors
- Zamora, Michael (Author)
- Chavez Echeagaray, Maria (Thesis director)
- Prim, Tadi (Committee member)
- Day, Kimberly (Committee member)
- Barrett, The Honors College (Contributor)
- Computer Science and Engineering Program (Contributor)
Topical Subject
Resource Type
Extent
26 pages
Copyright Statement
In Copyright
Primary Member of
Peer-reviewed
No
Open Access
No
Series
Academic Year 2023-2024
Handle
https://hdl.handle.net/2286/R.2.N.190271
System Created
- 2023-11-17 07:13:34
System Modified
- 2023-11-21 05:39:03
- 11 months 3 weeks ago
Additional Formats