comment by LeadGuit

a thoughtful web.

Good ideas and conversation. No ads, no tracking. Login or Take a Tour!

comment by LeadGuit

LeadGuit · 3682 days ago · link · · parent · post: Scientists of hubski, what science do you science?

Computational Linguistics, atm especially Corpus based analysis of a broad set of websites to analyse the use of words, sentence structures and so on.

2 pretty interesting parts of CL I had worked in were;

The analyses of a company by analyzing their internal documents (somewhat - depending on whats classified) - namely all the revision. So to see how a company functions (Who reviewed what and when, what changes did he made linguistically and conerning the content) - and therefore be able e.g. to give advice to the companys leadership.

Machine Translation for minor languages e.g. Spanish -> Quechua. That was pretty interesting since I'm not fluid in either language.

markup tips · 0

DC-3 · 3682 days ago · link ·

Computational Linguistics you say? :)

+discuss+discuss

–

LeadGuit · 3682 days ago · link ·

This XKCD hangs right on the wall in the office of a collegue ;-)

+discuss+discuss

–

DC-3 · 3681 days ago · link ·

Haha, that's great.

+discuss+discuss

hogwild · 3681 days ago · link ·

What do you use for language modeling? Are you learning a formal grammar, building a lexicon, or doing something weird with LSTMs?

+discuss+discuss

–

LeadGuit · 3681 days ago · link ·

Depends on the job ;-)

For all non-computational linguists I'll go ELI5 on this:

For machine translation I work mostly with the moses toolkit[1] with IRSTLM as language model. This toolkit is made for statistical machine translation - so you need a lot of data and a parallel corpus (identical texts in both languages e.g. stuff you find in those little "phrases for travelling" booklets). If you're running a Unix-System it's pretty easy to get started (baseline) - you can use the europarl-corpus for some nice experiments (what about your own Portuguese-German translator? ;-)

There is also a pretty nice tollkit named "apertium"[2] - this is about rule-based MT, so you don't need a lot of data, but you need a comprehensive grammar (constisting of a lexicon and grammar rules).

For other stuff I do there are tons of different methods and approaches each- from formal/funcitonal grammar up to machine learning/deep learning techniques (Naïve Bayes classifier, Support vector machines etc.)

If you're (or others) are interested, I could post some interesting links for Natural Language Processing (maybe a new Tag for that?)

[1] http://statmt.org/moses

[2] https://www.apertium.org

+discuss+discuss

–

hogwild · 3681 days ago · link ·

On Twitter, people use #nlproc to avoid the "neuro-linguistic programming" hypnosis cranks. If one of us posts something to that tag, I'll follow it!