this post was submitted on 01 Apr 2025

66 points (94.6% liked)

Technology

70395 readers

3861 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

The case against conversational interfaces « julian.digital (julian.digital)

submitted 1 month ago by excel24@feddit.org to c/technology@lemmy.world

23 comments fedilink hide all child comments

top 23 comments

sorted by: hot top controversial new old

[–] aesthelete@lemmy.world 9 points 1 month ago* (last edited 1 month ago) (2 children)

The obsession with conversational interfaces likely stems from two places: sci-fi and CEOs (and other executive, businessy types) who are used to ordering people around.

[–] paraphrand@lemmy.world 5 points 1 month ago

“Do this, do that, and read between the lines!”

[–] taladar@sh.itjust.works 2 points 1 month ago (1 children)

Maybe the tendency for LLMs to shower the user with praise for their prompts also makes them attractive to egocentric CEO type of personalities?

[–] aesthelete@lemmy.world 2 points 1 month ago* (last edited 1 month ago)

When I used those things, I absolutely understood why a CEO would want those to be the future. It's everything they're looking for: a strident, confident yes man machine who will produce without consternation any kind of spin (unethical or not) to any kind of gibberish content requested.

[–] sugar_in_your_tea@sh.itjust.works 4 points 1 month ago

It's a great article, actually click through and read it if you haven't already.

My favorite example of truly effortless communication is a memory I have of my grandparents. At the breakfast table, my grandmother never had to ask for the butter – my grandfather always seemed to pass it to her automatically, because after 50+ years of marriage he just sensed that she was about to ask for it. It was like they were communicating telepathically.

That is the type of relationship I want to have with my computer!

The author's point is that natural language is a slow way to communicate, and it's not even our preferred way, so why are we pushing so hard for it?

One of the best productivity tools for me is my CLI shell, which predicts what I'm about to type based on what I've done in the past. There's no AI here, just simple history search. It turns out i do the same thing a lot.

None of this is to say that LLMs aren’t great. I love LLMs. I use them all the time. In fact, I wrote this very essay with the help of an LLM.

The author argues that LLMs are an augmentation to existing tools, not a replacement. Just like the mouse didn't replace the keyboard (my example), LLMs won't replace existing workflows, it'll merely help in the knowledge retrieval stage.

For this future to become an actual reality, AI needs to work at the OS level. It’s not meant to be an interface for a single tool, but an interface across tools.

This is where I partially disagree.

Yes, I think some level of AI makes sense at the OS layer, but its function should be to find the right tool, not to be a tool. For example, "open my budget" would know from context which file that is (family, company, client, etc), which program (GNUCash, Excel, or a URL in a browser), and then pass on context to the app-specific AI, which would know which part to open and be ready for context-relevant questions (is it payday? Was I just looking at concert tickets? Is someone's birthday coming up?).

But even then, the usefulness of a system-wide AI is pretty limited. Most people can efficiently navigate to what they want. Indexes work well to find files (and full text search is feasible), file extensions work well to open the right application, and applications remembering what they were last doing is usually sufficient.

So I see it as more of an accessibility feature at the system level instead of an actual, useful system in itself. However, I really like the idea of different models passing context in some standard way to each other so I can seamlessly move between apps.

But I absolutely agree with the main point here: AI should be seen as an add-on, not a replacement.

[–] taladar@sh.itjust.works 4 points 1 month ago (1 children)

Every couple of years a shiny new AI development emerges and people in tech go “This is it! The next computing paradigm is here! We’ll only use natural language going forward!”. But then nothing actually changes and we continue using computers the way we always have, until the debate resurfaces a few years later.

Reminds me a bit of graphical programming. Every couple of years someone comes up with the idea of replacing textual programming with some kind of graphical interface with arrows between nested boxed of various shapes and it inevitably fails.

[–] jorm1s@sopuli.xyz 2 points 1 month ago (1 children)

Except there's Simulink, which has been around since the 80's, and is anything but a failure. For a few specific usecases, like modeling complex physical systems and developing control algorithms for them, it's far better than any traditional text based language. Especially when it comes to maintainability of that code.

Though I have to admit that if you try to use it as a general programming language, you'll learn that while that's possible, it's also very painfull. And even while implementing said control algorithms you'll occasionally run on to some bits of logic that prove to be annoyingly difficult to implement with it compared to any text based language.

[–] taladar@sh.itjust.works 2 points 1 month ago (1 children)

I think the problem is that you can't create new abstractions very well in graphical languages. It works for something like fixed domains (e.g. Blender node editor or your example) but for a general purpose language you need the ability to define abstractions that never existed before.

The other problem is that you can't really apply any of the tooling to it that works with other languages, e.g. version control, formatters, linters,...

[–] jorm1s@sopuli.xyz 1 points 1 month ago

I have to agree. I guess the only reasonable application for graphical languages is domain specific languages, and even then they need to provide a significant benefit over any text based alternative to outweight the tooling incompatibilities you mentioned.

[–] Mbourgon@lemmy.world 4 points 1 month ago (2 children)

An interesting/useful article, if only because I was unaware of the Mac equivalence of Launchy (Linear, Superhuman,etc). The biggest problem is that a Conversational interface could excel at certain tasks, but just using it Willy-Nilly is asking for disappointment, plus the voice recognition still needs to get better, especially with context clues, which would require more integration (watching whatever you’re watching, hearing whatever you’re hearing, what page are you looking at, etc. )

But if you’re in front of a keyboard, then directing the computer to do something has got to require less context switching then just bringing up on the keyboard. Even if it’s only 60 words a minute, if all you’re doing is typing in a handful of strokes, then it’s probably faster than coming up with all the keywords necessary to tell the computer to do it.

[–] paraphrand@lemmy.world 2 points 1 month ago* (last edited 1 month ago) (1 children)

Quicksilver for Mac OS X was the original one of these apps AFAIR. And it appears it predates all of those (launchy, etc)

I prefer Alfred these days myself.

You can even do similar with just Spotlight on macOS and the Start Menu in Windows 10+.

[–] darkkite@lemmy.ml 2 points 1 month ago

i replaced my windows start menu with the offical powertools run. which is pretty much the alfred/spotlight solution now

[–] taladar@sh.itjust.works 2 points 1 month ago

As a primarily CLI user on Linux I wouldn't even think of most commands as "words per minute" unless I am composing a complex pipeline or run a command with dozens of parameters at which point typing speed is not my bottleneck limiting the speed of input anyway and a free conversational interface would be totally fucked trying to figure out what I want it to do.

[–] TheUniverseandNetworks@lemmy.world 3 points 1 month ago (1 children)

Interesting article, I agree with his analysis, not sure (yet) that I agree with his conclusions. My brain needs to think about it in the background for a bit (just the way mine works).

TLDR: we should expect conversational interfaces to be an addition to the workflows we currently use.

[–] drspod@lemmy.ml 5 points 1 month ago (1 children)

Computer upvote this post. I mean comment. No, I meant the comment. Computer remove the upvote from the post. Computer upvote the comment.

Computer compose reply.

Dear Aunt, let's set so double the killer delete select all

[–] Imgonnatrythis@sh.itjust.works 1 points 1 month ago* (last edited 1 month ago) (1 children)

In my own real world usage I estimate a comprehension rate of about 92% with voice agents. I'm no linguist, but I'd guess that you'd need to achieve at least 98% comprehension to not feel like a conversation is frustrating. I'm also instantly irritated if my computer is delayed and nothing happens when I click on something, or if I go to use someone else's computer and they have double-clicking enabled for some reason (why?!) so my tolerance is probably on the low end.

Anyway, I thought this was an insightful read and the key to me is that the bar is pretty high now for Man-machine interfaces, so any implementation of newer tech needs to be both thoughtful and bug-free as possible in this realm.

[–] taladar@sh.itjust.works 1 points 1 month ago* (last edited 1 month ago) (1 children)

In my own real world usage I estimate a comprehension rate of about 92% with voice agents.

For me it feels more like 9.2% most of the time, and that is just the voice-to-text part, not even the interpretation of the resulting text as a command.

[–] Imgonnatrythis@sh.itjust.works 1 points 1 month ago (1 children)

Does feel like that, I agree, but if you spoke to someone who randomly completely misunderstood 8 out of every 100 words you said and had next to zero dead reckoning ability to figure out what that missing word was, I think you'd feel pretty frustrated.

[–] taladar@sh.itjust.works 1 points 1 month ago (1 children)

I thought about it some more since I wrote my comment and I am genuinely unsure any voice recognition system I have ever used managed to transcribe a full sentence to text successfully without making at least one mistake.

On the other hand with a keyboard I am reasonably sure I get problems such as network filesystems being unable to reach the server or broken hard drives more often than having to worry about mistyping a command I commonly use. Granted, part of that is thanks to tab completion but that is part of the issue with voice input, no easy way to correct what it got wrong.

[–] Imgonnatrythis@sh.itjust.works 1 points 1 month ago (1 children)

In English? Do you have an accent? Dragon is one of the better ones and it seems with many accents it does remarkably well. Google seems to have one of the worst I've come across.

[–] taladar@sh.itjust.works 1 points 1 month ago

Both in English and my native German. I probably do have an accent in English but that is difficult to judge myself. Certainly nothing that prevents other people from understanding me though.

[–] DemonVisual@lemm.ee 2 points 1 month ago* (last edited 1 month ago)

I always refer back to this video, he has a lot of excellent points.

https://youtu.be/AItTqnTsVjA

Of course all of this varies depending on the persons workflow on the given device. Personally I don't know if it's actually be able to write 36 words per minute on a traditional phone keyboard, so I agree on the sentiment that a phone generally is not a replacement for a computer, but a Swype enabled keyboard I might actually get close.. I really like the idea of a pie launcher for the desktop, I use one for the phone, and while it's limited to a single "layer" it's still faster when doing routine tasks on the phone (music,maps, notes, internet).

Again, phones seem to be the forerunner for these concepts to be implemented, since AI is creeping in quickly - I haven't really found a great use case, but maybe actually listening to the sales pitches; "create a calendar note", "compose short message of ETA" could be something that can accelerate the day to day, maybe even run locally on the phone or computer?

For coding, AI acceleration is great. Sometimes you just want something that fills in a gap and can be replaced later. It doesn't replace the need for critical thinking in system architecture design but it's a great accelerator for prototyping.

[–] tynansdtm@lemmy.ml 1 points 1 month ago

A fascinating read. It inspired me to look further into the StarCraft voice integration. Other games have tried it, using voice commands to direct computer companions as an additional layer of realism. But I'm not aware of any game that's done it well. Might be nice for applications like picking from a long list, sometimes "build unit X" is way faster than paging through buttons, but again we have keyboard shortcuts for that. Keyboard shortcuts wouldn't work for dynamic menus though, and voice commands do.

Sorry for the stream of consciousness.