99.999% would be fantastic.
90% is not good enough to be a primary feature that discourages inspection (like a naive chatbot).
What we have now is like...I dunno, anywhere from <1% to maybe 80% depending on your use case and definition of accuracy, I guess?
I haven't used Samsung's stuff specifically. Some web search engines do cite their sources, and I find that to be a nice little time-saver. With the prevalence of SEO spam, most results have like one meaningful sentence buried in 10 paragraphs of nonsense. When the AI can effectively extract that tiny morsel of information, it's great.
Ideally, I don't ever want to hear an AI's opinion, and I don't ever want information that's baked into the model from training. I want it to process text with an awareness of complex grammar, syntax, and vocabulary. That's what LLMs are actually good at.
In case anyone is unfamiliar, Aaron Swartz downloaded a bunch of academic journals from JSTOR. This wasn't for training AI, though. Swartz was an advocate for open access to scientific knowledge. Many papers are "open access" and yet are not readily available to the public.
Much of what he downloaded was open-access, and he had legitimate access to the system via his university affiliation. The entire case was a sham. They charged him with wire fraud, unauthorized access to a computer system, breaking and entering, and a host of other trumped-up charges, because he...opened an unlocked closet door and used an ethernet jack from there. The fucking Secret Service was involved.
https://en.wikipedia.org/wiki/Aaron_Swartz#Arrest_and_prosecution
Nothing Swartz did is anywhere close to the abuse by OpenAI, Meta, etc., who openly admit they pirated all their shit.