The annual Cognitive Science conference is coming up at the end of the month so I thought I'd comb through the papers and see what looks good. This is going to be a stream-of-consciousness post with thoughts on a few papers (warning: these aren't going to be very clear or readable summaries!) Without further ado here's my list of favorites so far in no particular order:
1. Degen, Tessler, & Goodman. Wonky worlds: Listeners revise world knowledge when utterances are odd. This has to be my favorite paper of the conference. They provide evidence that people make weird inferences for unlikely events: people are more likely to interpret a sentence like "Some of the marbles sank" non-literally. Plus, they show that this has something to do with relaxing the subjective prior probability of marbles sinking -- that is, it seems like people are relaxing their assumptions about the physical world in order to accommodate the "some but not all" inference you get from the word "some". My one gripe with this paper is that they don't actually do any sort of fit of their model to the data, they just sort of wave their hands and go "it looks like it's qualitatively capturing the pattern so good enough for now". It's clear that they ran out of time near the submission deadline. I expect that'll be Degen et al. (2016).
2. Bennet & Goodman. Extremely costly intensifiers are stronger than quite costly ones. Noah Goodman's lab has great stuff at the conference this year. Here the authors show that adverbial intensity probably has more to do with how costly it is to produce them rather than their actual semantics. Words like "extremely" are both longer and lower frequency than words like "very", which makes them slightly more costly to produce in a psycholinguistic sense. This costliness leads to a pragmatic inference on the part of the listener which basically goes, "Well, this speaker chose to use a costly form, so they must be trying to really emphasize the importance of whatever they're talking about". They test this hypothesis in two price estimation experiments.
3. Beckage, Mozer, & Colunga. Predicting a Child's Trajectory of Lexical Acquisition. The authors develop a model based on a child's characteristics (age, sex, etc) & current vocabulary and are able to predict whether or not they will learn a particular word in a particular month pretty well!
4. Morgan & Levy. Modeling idiosyncratic preferences: How generative knowledge and expression frequency jointly determine language structure. This paper kicks ass, I love it. Basically the authors try to model the distribution of binomial expressions we see in corpora. Some phrases, like "bread and butter", are very polarized towards one order over the other (compare "bread and butter" with "butter and bread"), while others are more balanced in ordering preferences. They find that a Bayesian beta-binomial model can account for these idiosyncratic preferences. One part of the model models generative preferences, i.e., our a priori ordering preference for a binomial in the absence of frequency evidence. The other portion of the model says that we draw binomials from a beta distribution with a mean at the generative preference estimate, and then the model varies concentration of the distribution by expression frequency. Then when you look at the parameters of this model after training, you find that very high-frequency expressions' concentration parameter is quite low, so that most expressions become very polarized toward their generative preferred order and a very few become polarized, but in the opposite direction, and nothing in between. On the other hand, lower frequency expressions tend toward the generative preference mean (i.e., higher concentration parameter). Really cool implications for both historical linguistics and psycholinguistics.
5. del Prado Martin & Du Bois. Syntactic Alignment is an Index of Aective Alignment: An Information-Theoretical Study of Natural Dialogue. Super cool paper. Syntactic priming has always been thought to be something of an automatic process that applies equally in all situations, but this seems not to be the case. Syntactic priming is actually stronger when speakers are acting more positive towards one another! Haven't had a chance to entirely read through yet but it seems like a well-done study.
Linguistics isn't my favourite part of cogsci, but thanks for the write-up! This stuff is pretty fascinating.
Wow, (4) sounds really cool! Putting it on my zotero reading list. I don't understand your walk-through just yet. The generative part of the model learns from the actual occurrence of "bread and butter" vs "butter and bread". Is the binomial now drawn per usage instance of that phrase, or per user, or per context in some other way? How would you even encounter a situation where the polarity of the beta distribution is flipped? If each binomial is drawn iid from the beta, how would they all gravitate towards opposite polarity? And isn't the beta learned from the actual text, so it should learn to place a high weight on the final correct polarity? Maybe I just need to sit down with the statistics for a while to understand. (2) is a really neat example of how we develop a consensus on language through recursive modeling! I don't know that much about pragmatics or this sort of maxim, coming from a computational background, but I did get really excited by a NIPS paper a couple years ago on a model of consensus-building in language learning, also from Goodman's lab.Then when you look at the parameters of this model after training, you find that very high-frequency expressions' concentration parameter is quite low, so that most expressions become very polarized toward their generative preferred order and a very few become polarized, but in the opposite direction, and nothing in between.
So the way this works is actually a bit more complicated than that. The background on this is that there are a bunch of people in the past who have tried to account for binomial ordering preferences through a bunch of constraints like "short words come before long words", "words with more general meanings come first", etc. This model formalizes that. They selected a bunch of the most agreed-upon constraints and coded a bunch of binomials (in their preferred order) for whether they follow a particular constraint or not (or whether the constraint doesn't apply). Then they use a logistic regression model to learn weights for each of the constraints. So for a new binomials, you can just spit out a number between 0 and 1 based on the weights of the various constraints. Yeah, this was hard for me to understand at first too. (I actually know the author of the paper, and I had to have her explain this to me in person :)) So, the way this works is that you have some estimate of preference from the first part of the model. Let's say that the estimate is 0.7. This will be the mean of the beta distribution you're drawing from. However, the concentration parameter varies by overall frequency of the binomial in the corpus. Look at the red line in this graph: https://upload.wikimedia.org/wikipedia/commons/f/f3/Beta_distribution_pdf.svg See how it's U-shaped so that you mostly end up drawing from the extremes and not much in between? That's how they found the very frequent binomials to behave -- they get polarized one way or the other, but nowhere in between (although it's not exactly like the function presented here). And the binomials that are really infrequent are more like the orange line -- you mostly end up drawing from the mean, and they don't get polarized one way or the other. I hope this helped somewhat -- I'm not totally clear on implementation details but I think I got the conceptual bits. Yes they do amazing work!!! There's been a trend in some fields of linguistics lately to do more computational work and I love it. I think Noah Goodman has a PhD in Math actually...we need more interdisciplinary people like him. That reminds me, there was another paper at this year's CogSci on joint inference of word & concept that is somewhat similar to Goodman's work. Mollica & Piantadosi. Towards semantically rich and recursive word learning models. (Side note, it's really awesome to have someone on this site with similar interests :D)The generative part of the model learns from the actual occurrence of "bread and butter" vs "butter and bread". Is the binomial now drawn per usage instance of that phrase, or per user, or per context in some other way?
How would you even encounter a situation where the polarity of the beta distribution is flipped? If each binomial is drawn iid from the beta, how would they all gravitate towards opposite polarity? And isn't the beta learned from the actual text, so it should learn to place a high weight on the final correct polarity? Maybe I just need to sit down with the statistics for a while to understand.
(2) is a really neat example of how we develop a consensus on language through recursive modeling! I don't know that much about pragmatics or this sort of maxim, coming from a computational background, but I did get really excited by a NIPS paper a couple years ago on a model of consensus-building in language learning, also from Goodman's lab.