If you need the vocab first then it is too advanced. Pick easier works to read. As a beginner there is no option but it shouldn't take too long before you can find something you can understand without looking up words.
Language Learning
A community all about learning languages!
Ask / talk about a specific language or language learning in general.
Sopuli's instance rules apply
- Remember the human! (no harassment, threats, etc.)
- No racism or other discrimination
- No Nazis, QAnon or similar whackos and no endorsement of them
- No porn
- No ads or spam
- No content against Finnish law
Other active Lemmy language communities:
- !duolingo@lemmy.world
- !japaneselanguage@sopuli.xyz
- !chinese@lemmy.world
- !learn_finnish@sopuli.xyz
- !german@lemmy.world
- !latin@piefed.social
- !estonian@sopuli.xyz
- !spanish@sopuli.xyz
- !translator@sopuli.xyz (translation studies)
- !esperanto@sopuli.xyz
Other communities outside Lemmy:
Community banner & icon credits:
Icon: The book cover of Babel (2022 novel by R. F. Kuang)
Banner: Epic of Gilgamesh tablet (© The Trustees of the British Museum)
JPDB.io does something like this for Japanese. Not sure you can really import books, but it basically combines some kind of parser in with a dictionary API, example sentence corpus, and its own spaced repetition system.
Gotta be something along the line out there for most languages, but I can't say I know of the tools. Honestly, the breaking-down-into-a base-word part of it is probably in the dictionary's domain. If you give it a conjugated verb it should usually be able to tell. But then some ambiguities need context, not sure how to account for that.
AnkiConnect lets you tap into the Anki APIs, Wiktionary or (from a quick search) Collins should have a dictionary API available for French-English. If the dictionary APIs are good then you could probably get pretty far with basic sentence parsing.
But yeah, feels like there's gotta be something ready made for it, wish I knew and could point you in a direction.
I've only done enough programming to know this is very possible. A word count is probably all I'd need to do this manualy. Just wondering if this is one of those things I do instead of learning, so the less time I spend on it, the better I'll feel.
Was messing around with Jiten.moe (spiritual successor to jpdb, again boasts the utility of ingesting a book or subtitle file and creating anki cards) and it made me think of this question. (And Jiten is actually open-source, so the repo's there with how they do it... but I'm pretty sure it's mostly just wrapping a bunch of Japanese-specific tools.)
Did a little looking. Tried checking https://github.com/keon/awesome-nlp and didn't see anything French specific, but did come across https://github.com/french-ai/french-nlp which might have useful stuff. It sounds like a library called Spacy could be useful.
But then I ran across this tool, which might be pretty close to what you'd need? https://github.com/FreeLanguageTools/vocabsieve
VocabSieve is a companion program for language learning with Anki. Its primary function is sentence mining, in which sentences with vocabulary words are collected and added into Anki for long term retention. It aims to help intermediate learners gain vocabulary efficiently by allowing card creation with minimal friction. Possible use cases include sentence mining from videos, texts, asynchronously from ereader highlights, and even completely automatically from books or subtitles.
I haven't looked into exactly how the 'automatically from books' stuff would work or anything, but seems promising.
And I guess elephant in the room, NLP is the kind of task LLMs are actually pretty good at, so there's also always that lazy-ish route: convert the book to text, feed it through an LLM and ask it to identify important vocabulary words.
Thanks! Vocab sieve looks perfect (though experimental), and it works with KOReader, too. Fuck me, I'm running out of excuses.
I feel like you're approaching this incorrectly. Do you have graded readers?
An A2 graded reader would assume you knew all A2 level words and have definitions for the B1+ / B2 (or beyond) words in the text.
So instead of making software that does the work of making a graded reader, it is probably better to just start by using graded readers (where all this work has already been done).
I feel like it's not that much work and the benefit is that it gives me a lot more freedom to read what appeals to me.
FOr instance, I found an unseeded torrent of 600 French epubs. Imagine being able to do something as simple as sorting them by lexical complexity--that is do a unique word count and rank from lowest unique word count to most unique word count. Trivially simple to do and would yield me books that are constantly in my range of proximal learning.
But, yes, thank you for the suggestion! I'll look into some readers, depending if I feel more lazy than broke.