Comment by Sidhant Panda on "Looking for an ML idea"

Hey Mario,

First and foremost, you have to establish whether or not the paragraphs are relevant to the question asked. To find what the paragraphs are talking about, you could first do a simple test on these parameters:

The topic - The title of the text, by definition, is supposed to tell us what it is about. But only the title would not be sufficient to understand the gist.
The starting sentences - An article or a text often has its most relevant sentences in the beginning which sets the path for the rest of the document.
Ending sentences - An article or text tries to summarize what has been talked about in the last few sentences.
Frequent words - The words being talked about in the text repeatedly signify its importance.
Words related to the topic - The words which are contained in the topic itself or other important sentences give importance to the word itself, the sentences using such words would probably be more important to the context at hand.

Now that you have matched the context, you can proceed to find how difficult or easy it is to ready the text. Here is a list of popular tests (in no particular order):

Flesch-Kincaid readability tests:
- Flesch Reading Ease
- Flesch-Kincaid Grade Level
Automated Readability Index (ARI)
Coleman-Liau Index
Gunning-Fog Index
SMOG (Simple Measure Of Gobbledygook)
LIX
Accelerated Reader ATOS
Dale-Chall Readability Formula
Fry Readability Formula
Lexile Framework for Reading
Linsear Write
Raygor Estimate Graph
Spache Readability Formula

It would help to build a word ontology for similar words and you could crawl wikipedia or a thesaurus for this purpose.

The failed essays would help establish what kind of threshold score in both the context and the ease of reading test you are looking for.

This is really basic and we haven't yet dived into NLP for this problem yet. However I feel this might just do the job.

PS: I worked on something similar in my undergraduate!

Search Hashnode