Some tough love:<p>There will always be a gap between "your judgement" and the "judgement baked into a model" -- worse yet, if the model is very general and oriented towards cheap computation and away from expensive people it will have vague and contradictory judgements inside it that make the results meaningless.<p>That is the language of failure: the structure of success looks like the following.<p>(1) The system works like a "magic magic marker", that is, you mark up a lot of text (say 20,000 sentences) the way you think it should be marked up. This might be a character-at-a-time or word-at-a-time. Character-at-a-time is real and eternal, word-at-a-time is not real because there is not really such a thing as a "word". (e.g. "red ball" can fill slots that take "ball", you can smash together subwords to make words, for that matter people violate punctuation rules "Amazon.com announced that...", people call themselves n3pg34r, ...) So if you segment the text up front and segment it the wrong way you may throw out essential information and choose to fail.<p>(2) You need some system to mark up the text manually and efficiently. It is a lot of work. A typical person can make about 2000 or so up/down judgements a day; if a sentence counts for 10 decisions then maybe you can annotate 200 sentences a day. If you can get students to do it and get teachers to review it you might make short work of it.<p>This annotator<p><a href="http://brat.nlplab.org/" rel="nofollow">http://brat.nlplab.org/</a><p>ticks the requirements, but most people find it terribly hard to use and wind up building "easy to use" systems that don't align things right at level (1) and... fail.<p>Assuming you do (1) and (2) the odds are in your favor, but you have to now<p>(3) build models; it does not matter if the model is a bunch of rules you cobbled together, or hidden markov, or LSTM, or convolutional. Off the top I would train an LSTM to 'predict the next character' on maybe 100M characters of text, then I would stick a simple model that takes the LSTM state as an input and labels characters at the output (could be SVM, Random Forest, Logit, or 3 layer on NN)<p>(4) Accept that the system is not going to be perfect, but have the ability to manually patch wrong results, improve the training data over time. I'd say this is a more important practice than any particular approach to (3)<p>Some tool could give you (1-4) tied up in a bow<p><a href="https://www.tagtog.net/" rel="nofollow">https://www.tagtog.net/</a><p>claims to. But (2) involves elbow grease that 90% of people aren't going to do. Some of the 10% of people who do that elbow grease will succeed, the other 90% will fail.