Saturday, September 28, 2024

Virtual Eye vs. Virtual Ear

Crescent Moon
Lately, I’ve been self-indulgently going back through old blog posts to take a look at their structure… and (true confession time) to have an LLM generate a pod-cast on them. I’ve come to the conclusion that spoken analysis of some of my blog posts from an LLM feel less virtual than written analysis. Also, I’d say that there’s still some garbage-in/garbage-out going on in the analysis: since I’d fed it fifty different posts, the LLM doesn’t quite know how to deal with the wide-ranging topics, and defaults to statements about personal blogs being like a diary or a lexical exploration of how to live one’s life. My sense is that if I focused the source documents around just one topic, the output would be less general.

The virtual podcast is fairly amusing — the male voice sounds a little like A Martínez from NPR, the female voice sounds like the character Roz Doyle (from the sitcom Fraiser), and the script seems like it was lifted from a Radio Lab show. Occasionally, filler phrases like “totally,” “one-hundred percent,” “exactly,” “absolutely,” “of course,” “fer sure,” and “I’m here for it,” become obtrusive. Probably the more jarring moments are when one virtual host will talk about “clicking through the posts”: which you know couldn’t have happened because they don’t have a body.

I’d also say that two virtual hosts speaking about what I’ve written — at least when they aren’t looping through the same script for the third time — somehow feels more validating than reading a generated analysis. I’m pretty sure the hosts have been primed to offer emotional judgements (e.g. “he’s so vulnerable writing that,”) over the textual analysis. And the synthetic voices have fairly good (if sometimes glitchy) tone and inflection.

Jane Yollen was right, the ear and the eye are different audiences.

No comments: