You’ve Been Listening to AI Music for 25 Years

Stylized cover image for the article. — Photo by Florian Schmetz on Unsplash.

As end-of-year song lists circulated, I checked the industry-standard Billboard Hot 100 to see what topped 2025. I listened to the top five, and all of them except one sounded to my trained ears absolutely perfect. So perfect that something felt not-quite-right. The exception was Shaboozey’s “A Bar Song (Tipsy),” a good old-fashioned drinking anthem melding country and hip-hop.

2025 Billboard Hot 100 Top 5

No.	Title	Artist(s)
1	Die with a Smile	Lady Gaga and Bruno Mars
2	Luther	Kendrick Lamar and SZA
3	A Bar Song (Tipsy)	Shaboozey
4	Lose Control	Teddy Swims
5	Birds of a Feather	Billie Eilish

So I brought each song into Apple’s Logic Pro and ran a tempo analysis. Four of the five top songs of 2025 showed zero variance in time across the entire song. None. I thought the powerful tempo algorithm was broken until I ran the same test on the top five songs from 50 years ago.

1975 Billboard Hot 100 Top 5

No.	Title	Artist(s)
1	Love Will Keep Us Together	Captain & Tennille
2	Rhinestone Cowboy	Glen Campbell
3	Philadelphia Freedom	Elton John
4	Before the Next Teardrop Falls	Freddy Fender
5	My Eyes Adored You	Frankie Valli

Every single song in the 1975 cohort showed substantial tempo variation. The number one hit, Captain & Tennille’s “Love Will Keep Us Together” featuring legendary session drummer Hal Blaine, registered 101 tempo transitions across the track. The tempo ranged from 124 to 142 BPM, and only 31% of the song stayed within 1 BPM of its mean.

Hal Blaine’s herky-jerky 16th-note snare kick pattern at the end of each chorus would NEVER make it into a final mix of a pop song in 2025. Not even debatable. It would be “corrected,” aligned, and flattened. Why is temporal complexity now treated as error, and machine time as the norm?

Compare that to the top song of 2025: Lady Gaga and Bruno Mars’s “Die With a Smile” has zero tempo transitions. The tempo sits at 79 BPM for the entire song. There is no temporal variation at all. It is “perfect,” and that perfection is the problem.

Not only is every single articulation from every instrument including the voices 100% perfectly aligned to the grid, every single pixel of this video has been rendered lifeless — or what I have identified as Taxidermy Chic: embalmed images to accompany embalmed music which tries in vain to represent an age when music was still embodied. Dead with a smile indeed.

Chart comparing tempo deviation in ‘Die With a Smile’ (flat line) and ‘Love Will Keep Us Together’ (oscillating line) across the duration of each song. — Tempo deviation reveals the difference between embodied performance and grid-based “correction” and assembly.

The Modern Condition

For most people, it does not feel like society is functioning particularly well right now. Public life feels tense, brittle even. Small frictions escalate quickly. Patience for disagreement, delay, or adjustment seems in shorter and shorter supply. I believe a big part of the reason has nothing to do with the usual suspects: social media, cable news, or politics, but with something far more ordinary: the music we listen to every day.

That claim may sound implausible. I entered the music industry in the early 1990s as a session drummer and lived through the transition to digital recording, having played on hundreds of albums, including a number one hit with Sheryl Crow. I also spent 17 years as a music analyst at Pandora working on the Music Genome Project. But it wasn’t until I left the industry, earned a master’s in public administration, and started working in city hall that I recognized what I’d been seeing all along: the same negotiation that disappears when you lock music to a grid is the negotiation that democratic life requires.

This argument may seem to be about taste, nostalgia, or whether music used to be better than it is today. However, I am not interested in genre, lyrics, stylistic trends, or questions of artistic merit. The claim I am making is structural. It concerns the formal conditions under which music is produced and the kind of temporal experience those conditions make available.

The Rationality of Music

To understand why this matters, we need a way of thinking about music not as “mere” entertainment, as the logical positivists of the Vienna Circle argued, but as a form of knowledge. This is where the philosopher Susanne K. Langer becomes essential.

Langer argued that music is a presentational symbol. Unlike language, which points to external objects (“that is a tree”), music presents the forms of lived experience itself. She described this with a phrase that deserves to be taken literally: music presents the morphology of felt life. Feeling, for Langer, is epistemological, a legitimate mode of knowing that is not opposed to rationality.

Because music is a public object, and because it uniquely presents the unfolding forms of feeling that structure our experience over time, it allows private interior life to appear in shared form, available for contemplation, where it can be recognized, resisted, affirmed, or negotiated with others. In Langer’s formulation, music objectifies felt life and subjectivizes our experience of the world.

Beat Detective and Our Flattened World

Langer was writing before digital audio workstations became ubiquitous, before producers and engineers could, with a few clicks, analyze an entire performance, slice it into hundreds of micro-segments, and algorithmically realign those fragments to an abstract, machine-legible grid. Tools like Pro Tools’ Beat Detective, introduced 25 years ago, are positioned as instruments to “correct” performance irregularities. But they do more than correct. They integrate the logics of efficiency, standardization, and quantification directly into the artistic process.

This confusion of correction with care flattens and erases the embodied, coordinated negotiations through which humans inhabit time together, replacing them with a mechanically idealized model of perfection.

This is the crisis of gridification.

Coordination Without Negotiation Is Compliance

The long-standing intuition that music functions as a universal language, something that reaches across cultures, histories, and social difference, rests precisely on what Langer identified: its capacity to present the morphology of felt life in a shared, public form.

But we are not machines. We do not experience the world, or learn how to live together, through the logics of machinic optimization. We are embodied beings whose lives unfold through overlapping, imperfect, and negotiated temporal processes.

When popular music consistently presents coordination without negotiation, it no longer objectifies lived temporal experience. Under these conditions, listeners are trained, track by track, to experience deviation as error and difference as failure. That training does not remain confined to music. It carries over into how we encounter other people, shaping our tolerance for delay, disagreement, and the ongoing work of mutual adjustment that democratic coordination requires.

A recent study by the music streaming service Deezer found that only 3% of listeners could reliably distinguish AI-generated music from human-produced tracks. This result should not be surprising. For more than two decades, the professional music industry’s adoption of grid-based, algorithmic production tools has been laying the conditions under which machine-generated music could seamlessly replace human performance.

A Bar Song (Tipsy)

Intrigued by why Shaboozey’s “A Bar Song (Tipsy)” was the lone exception in my tempo analysis, I went looking for how it was made. A detailed production account in Sound On Sound offers a revealing answer. The song was first assembled as a rough voice memo: Shaboozey singing over a guitar part, captured in real time. When the producers later attempted to re-record the vocal and guitar, they found that the timing and feel of the original performance could not be reproduced. As Sean Cook recounts in Sound On Sound, later takes “didn’t have the same magic,” so the original vocal and guitar were preserved, and the rest of the track was built around that live moment.

Shaboozey — “A Bar Song (Tipsy)”. The lone exception in the 2025 top five.

Rather than forcing the performance into mechanical alignment, the production worked outward from it, layering guitars, handclaps, gang vocals, room sounds, and incidental non-musical textures (like opening a can of White Claw and recording the fizz and the pour) as they unfolded together in shared time. The result was not looseness for its own sake, but the recovery of something largely absent from contemporary pop production: multiple, concurrent layers of temporal negotiation preserved rather than “corrected.”

The first time the song was played publicly at a small showcase at Winston House in Venice Beach using unfinished backing tracks, something telling happened. As the producers recount, the energy in the room shifted almost immediately. People stopped what they were doing, moved toward the stage, and asked “what song is this?”

What drew people in was the sudden presence of something largely absent from contemporary popular music: lived, negotiated time made audible. In Langer’s terms, the song worked because it presented the morphology of felt life. Listeners were not being guided toward a prescribed emotion; they were encountering the forms of their own feeling objectified before them. Responses of that kind, immediate, bodily, and shared, are precisely what gridification forecloses. They are also what democratic coordination depends on.

Coda: Two and a Half Decades of AI Music?

At this point, some readers may object that none of the songs I’ve described were made by AI. That is true in a narrow sense and beside the point in every other one.

The relevant transformation did not begin when machines started generating songs, and I am not talking about artists who deliberately work with sequencers or fully electronic outfits. It began when musical performance was subjected to the logics of modern instrumental rationality, treated as something to be analyzed, discretized, optimized, and rebuilt according to abstract ideals of precision and control. This became possible because twentieth-century thought had already reduced music to “mere emotion,” stripping it of epistemic validity as a symbolic form. Once feeling was no longer understood as a mode of knowledge, performance ceased to be action unfolding in time and revealing aspects of reality that discursive language cannot articulate, and instead became data to be corrected. At that point, the distinction between a human performance and a machine-generated one became aesthetic rather than structural.

Grid-based production of human performances, formalized with the introduction of Beat Detective in Pro Tools TDM 5.1 in 2001, established the algorithmic logic that generative AI now simply automates at scale.

This is what I mean by having listened to AI music for 25 years. Not that machines were secretly composing pop songs in the early 2000s, but that, at meaningful scale, human performance was increasingly subjected to the machine logic of the grid. In the process, embodied temporal action was lifted out of lived duration and reconstituted within an idealized temporal model that no human being actually inhabits and no lived reality can sustain.

For Langer, music is the paradigmatic presentational symbol through which we come to know the logic of feeling and the shape of lived time. When grid-based production erases the conditions under which that symbol emerges, music loses its epistemic force, and with it one of the primary ways we learn to recognize complexity, contingency, and one another.

The Raw Data

The complete tempo measurements and methodology supporting this essay are published as a dataset on the Gridification of Popular Music page.