Sunday, 4 May 2014

Philosophy of Mind and Psychology Reading Group -- Hohwy Replies to Siegel

I am posting this on behalf of Jakob Hohwy, author the book we are currently discussing, The Predictive Mind (OUP 2013). It is a reply to Susanna Siegel's post on chapter 3.

Jakob Hohwy
Thanks to Susanna for this set of really nice, and challenging, questions. I’ll try to address them here. I do hope others might have insights to share on some of these issues!

First question: What are proper grounds for low confidence in a signal?

The signal can buck the trend in different ways. Importantly, a signal can change such that even if it has the same mean, it loses precision (e.g., a sound becomes blurry). This bucks a precision trend. This is sufficient for decreasing the trust in the signal. This is a reasonable response (and is seen in the ventriloquist effect). We build up expectations that deal with such state-dependent changes in uncertainty. Sometimes we will appeal to higher-level, slower regularities that interact with the low level expectations in question (my headphones are running out of battery, that’s why the sound has become blurry). This will often help in a particular case, but might not (I might rather learn I have developed otosclerosis, or I might need to adjust to new levels of irreducible noise). But if I do rely on the battery hypothesis then I down-grade the trust in the current signal. A similar story can be told for a change in mean, as when the location of the sound-source seems to change. Here I might appeal to an expectation about interfering causes that push the sound-source around. But if I assume the sound shifts due to some kind of echo chamber, then I’d be justified in trusting the signal less. The underlying story here is that the brain in the long run will end up apportioning noise to the right sources, and thus being able to make the right decision about any given signal.

Noise comes in different guises. There is irreducible noise, which the brain can’t do anything about (risk). There is expected uncertainty, variability in the signal that we know about and expect. And there is unexpected uncertainty, variability that we haven’t predicted (due to state-dependent levels of noise). Much of the work concerns unexpected uncertainty. The assumption is that often such uncertainty arises because there are interacting factors, which can be inferred.

All this is taken care of by the brain in unconscious perceptual inference, mechanistically by plasticity at various time scales. The somewhat elaborate illustration with the dam-plugging was meant to capture this mechanistic aspect (and how it ensures representation). If the brain is able to anticipate the ‘inflow’ of sensory input both at fast and slower time-scales, then it’ll minimize its prediction error (on average and in the long run), and if it does that then it is guaranteed to approximate Bayesian inference. Of course, there is much buried here in terms of how something like the brain can implement Bayes.

Second question: When there’s a discrepancy between the expectation and the signal, is that a prediction error?

There are a few different things to respond to in this question. I think it is best not to have anything but prediction error being ascending signals in the brain (so there is nothing that is ‘discrepancy’ which is different from prediction error). Prediction error can pertain both to the mean and the precision. So a new mean (from yellow to grey, say) can be a prediction error, but so can a new precision (from clearly yellow to a blurry yellow, say). If I expect to go into an environment with lots of blurriness (e.g., dusk), then the brain somewhat inhibits the prediction error I get there, which lets colour perception be more driven by priors. If course, if my predictions are truly perfect (in whatever environment), then there is no prediction error (then sensory input is completely attenuated by the model). But this situation never arises, due to irreducible and state-dependent noise. Crucially, it makes sense to speak of expected prediction error (this is going to be central for understanding action, in ch4). Organisms strive to avoid prediction error, but they also strive to avoid bad, imprecise prediction error in so far as they cannot entirely avoid prediction error (or rather, as they integrate out uncertainty about what kind of prediction error they expect, as pr the free energy principle).

In this way there is only one functional role for second-order statistics in the brain: control the gain on prediction error. Sometimes, the gain is high, when the signal is trusted, and sometimes it is low, when the signal is not trusted. This gain-control relates to expected precisions. It is all part of Bayes, but remember Bayes is empirical, hierarchical Bayes, which automatically has room for higher-level hyperparameters (expectations for what the next mean should be) and higher-level hyperpriors (expectations for what the next precision should be). In this sense, first and second order statistics are already there in the Bayesian framework. For the case of the grey banana, we have to ask if the ‘grey’-signal is precise or not. If precise, then it is likely it will drive perceptual inference, so we see the banana as grey (since a precise, narrow signal around the ‘grey’ mean makes it very unlikely that there is any evidence for a ‘yellow’ mean). If the signal is instead imprecise then it will be inhibited (proportional to its imprecision) and will exert less influence on the perceptual inference; this makes sense since even an imprecise signal has a mean (‘grey’) and this mean should not be allowed to determine perception, because it is embedded in a poor signal. At the same time, since it is an imprecise signal, it will (as we assume here) carry some evidence for a ‘yellow’ mean. Combined with a strong prior for yellow bananas, we might see it as yellow-ish (as indeed recent studies show).

Third question: Does the signal come in the form of a probability distribution?

I am not entirely sure I get the full intention with this question. I think it is safe to say that the signal, or the prediction error, comes as a probability distribution - it has a mean and variance. We might think of it as a set of samples (a certain firing pattern of a population of neurons when confronted with the banana; they each ask “yellow?”, and get a set of more or less consistent answers back). I think there is computational neuroscience out there, which treats such firing patterns as encoding a probability distribution, independently of this particular Bayesian story. It makes good sense I think: on a given occasion, the prediction error is a set of samples, which cluster imperfectly around the ‘yellow’ mean (e.g., a few samples will be towards the ‘red’ mean).

It is worth noting that with a hierarchical structure to rely on, these sensory samples can often afford to be very sparse and yet inference will be reliable; I just need a few high precision cues that a given pattern is instantiated and then I’ll know. Given all my prior knowledge, the sensory input need not embody all the information right here and now. (Illusions probably exploit this propensity to ‘jump’ to conclusions to create false inference (e.g., the pac man corners in the Kanizsa triangle)).

Fourth question: What is the impact of second-order perceptual inference on perceptual processing, and when and how does it differ functionally from first-order perceptual inference?

The banana-dialogues very nicely set out the statistical decisions one can engage in when confronted with different types of mean/precision combinations combined with different types of prior expectations for means and precisions. Some of my answers to the previous questions begin to address these. Option 1: in general if there is great precision in a signal then it’ll tend to drive inference, even against long-held priors. Much of the book goes on about this, e.g. for our experiments on the rubber hand illusion, where priors about the body are pushed aside. But the brain is shifty, it might try some subtle compromise (seeing the grey banana as slightly yellow-ish). Option 2: I discuss this above too: this seems a likely scenario for a weak signal. Option 3: here we can appeal to the idea that precision optimization is just attention (an idea which comes up in ch9 especially), and that attention comes in endogenous and exogenous forms. So I might have an expectation about a signal’s precision that sets the gain at a low ‘2’ (=I endogenously attend away from it) but later its intrinsic signal strength increases (someone changes the batteries) and with it its precision improves. This means that even if the gain is set at 2 for the old, weak signal, more of this new, strong signal will filter through (=I exogenously attend to it).

There is, as far as I know, no certain answer to what the brain does in these kinds of cases. Much depends on prior learning, and – we suspect – on individual differences. Given a new and unexpected level of imprecision the brain needs to decide whether to recruit higher levels or whether to trust the signal more, or less. Only quite subtle high level and long term (meta-)learning can really give the answers here (e.g., over the long term it may be best in terms of prediction error minimization to explain away uncertainty hierarchically). We have a fair bit of experimental evidence that these kinds of differences play out along the autism spectrum. Well-functioning individuals can differ somewhat in the extent to which they engage the hierarchy when there is unexpected uncertainty, with individuals low on autism traits more likely to do so than individuals high on autism traits (perhaps the former are more likely to be found in diverse, changing environments). But clinical individuals seem to be even more hierarchically restricted, which may be counterproductive. I mention this to highlight that we are doing experiments that speak to all this, and that we should not expect univocal answers to how well individuals approximate Bayes in the long run. (I discuss these issues much throughout later parts of the book).

Fifth question: Where can we locate second-order inferences in Bayes’ theorem?

I address some of this above too. Noise and uncertainty are both integral parts of Bayes, and especially of empirical, hierarchical Bayes. There is no separate noise-‘monitor’, there is just learning of means and precisions all the way up the hierarchy. Changes in precisions are encoded and learned in hierarchical Bayes because Bayes without it won’t work in a world with state-dependent noise (its learning rule is inappropriate). If we compress everything into Bayes’ rule, I think we get that a low precision prediction error will give us a low likelihood. I don’t think the posterior (p(h|e)) comes into it, except of course in the shape of the ‘old’ posterior, which is todays’ prior.

Sixth question: Do you think experience takes the form of a probability distribution, like any other signal does in PEM? If not then what makes it special?

There is probably more in this question than I am able to address here. (There are for example elements here of a nice question that Michael Madary asked of Clark, in Frontiers). One answer is that perception is driven often by precise prediction errors, which dampens down the presence of alternative possible experiences (action serves this kind of function, ‘deepening’ the selected hypothesis at the expense of others; hence without action there should be more of the sea of probability distributions, and there is some evidence of this in binocular rivalry when introspective reports are not required). When there is imprecision, perceptual inference begins to waver and becomes ambiguous (e.g., some phases of the rubber hand illusion). Another answer is that when one hypothesis wins, other hypotheses are ‘explained away’ (this is the classic example from Pearl of rain/sprinkler given wet grass): conditional on the new evidence E, Hi increases its probability and Hj decreases its, even though they were independent before the new evidence came in. This speaks to some kind of lateral reciprocal inhibition in the brain. A further answer is that the brain ‘cleans’ up its own hypotheses in the light of its assessment of noise (there are some comments on this at the end of Ch 5). None of these mechanisms relate to phenomenality, its all just more Bayes. I do speculate about phenomenality later, namely that selection of a content into consciousness happens when there is enough evidence for Hi for active inference to kick in and sensory consequences of Hi be generated, given action (ch 10).

I probably have not answered all the questions fully or satisfactorily here. The questions are really nice to think about however, and allow me to highlight the crucial role of precisions in the PEM framework.


  1. Hi Jakob,

    Thanks a lot for your replies to Susanna. Here are a few more questions:

    1. With regards to your response to Susanna's first question, it seems like you are saying that when we get some deviation from the expected signal, there are multiple ways we could interpret this deviation, and one is as a decrease in precision which leads to a downgrade in trust of the incoming signal and a greater reliance on predictions. Another would be as a first-order prediction error. But the question (or at least a question I am still worried about) is how does the mind/brain decide whether to downgrade the signal or revise the prediction to match the change in the signal in any given case, without an independent way to decide between equally compatible hypotheses? It seems like you're invoking "plasticity at multiple time scales" to explain this. Is the difference between first and second order predictions supposed to supervene solely on what happens over long ranges of time (i.e. after any individual prediction error is detected), or is there anything at a given moment that could decide between the two?
    2. On Q4 about bananas, it seems like this explanation of color memory effect only works if the conditions are such that the subjects expect an imprecise signal (so for example blurry vision or poor lighting conditions or something like that). But in the experimental set up (Hansen 2006, Olkkonen 2008, Witzel 2011) it is not like that at all. Subjects are looking at a clearly presented image of a banana shape on a computer screen, with nothing that I can think of that would be an indication of a poor incoming signal. So there should be a high precision prediction and the system should trust the incoming signal and end up with a grey perceptual experience, not yellow. The two ways I could see making any sense of this on your picture are a) if the priors for banana-shaped things being yellow are so so high that the prediction error from the grey signal doesn't do enough to revise them into generating a totally grey perception, or b) there is some sort of error in the precision predictions, which causes the system to treat the incoming signal as unreliable, and so rely mostly on the priors. But it seems like this would be a malfunctioning of the system because the incoming signal is actually good in this case. Given the actual reliability of the incoming signal, which of these two ways would you like to treat the color memory effects data? Or is there some alternative that I am not thinking of?

    3. Your answer to Susanna's Q6 still leaves me wondering a few things. It seems you are saying that conscious perceptual experience is not a probability distribution. It also seems like you are saying that at least a lot of unconscious perception is not a probability distribution either, or at least not as full a distribution as is represented at some stages, because the other hypotheses may be explained away through lateral inhibition. It seems like there are two separate questions here: 1) Does phenomenal experience have one determinate non-probability distribution content and how does that happen, and 2) is there always selection of one hypothesis among the many in a perceptual process, whether that content is experienced consciously or not, and how does that happen? It seems like you want to say yes at least to an extent to both, but I'm not sure what form the relationship between your replies to these two questions is taking. You could answer yes to both and say the answer to 2 explains the answer to 1, or you could answer yes to 1 and no to 2, and offer an independent explanation of how 1 happens.

    Thanks a lot and looking forward to hearing what you have to say.

  2. Hi Zoe - sorry about the delay in trying to answer these questions:
    1. When confronted with a new prediction error, the system just have to go with what the priors dictate at that moment, and how the likelihoods fall out. This will sometimes lead to prediction error minimization that is misrepresenting the world. For example, I might wrongly minimize error in such a way that I perceive a sheep as dog when really I should have minimized error by turning down the gain on the error (due to the dusky conditions) and relied on my priors (that there are many sheep around here). In the long run, such errors can be corrected (e.g., as I explore the surrounding fields over a couple of days); this is discussed somewhat in Ch8. So, at a given moment there is something I can do: rely on priors (for both means and precisions) and go with the hypothesis that has highest likelihood given priors; over time learning (more stable, less uncertain updating of priors) will increase the quality of new inferences. The remark about plasticity does not explain the inference but hints at how this is realized in the brain, immediate suppression of prediction error through fast-acting synaptic plasticity and learning over slower acting associative plasticity, both instantiate Bayesian inference but over different time-scales. Perhaps one way to think of this is by reflecting about empirical Bayes, which is this hybrid of Bayes and frequentism: over the longer run the priors get to reflect the true frequencies in the environment but through repeated Bayesian updating.

    2. I think the interpretation of the banana-type study is different: the colour is very etiolated when the participants get down close to the gray, which means the signal strength is weak which goes with a turned down gain and increased weighting of priors; notice also that there is a violation of prior expectations of the typical colour of bananas. In most of this type of studies, there is also the context of the experiment where participants through many trials have learned that the colours vary much, so they expect variability; and often that the colour will move around their threshold.

    3. There is no doubt that we are only trading in probability distributions. The winning hypothesis is a probability distribution, and we have to assume that this is what determines phenomenal experience (somehow). This is clear from the fact that we are operating with (empirical, hierarchical) Bayesian inference. I am unsure why you think that perhaps experience is not determined by a probability distribution, or why you think I am suggesting this. In the explaining away case, for example, it is a matter of which hypothesis ends up being most probable (i.e., we think of it as a probability distribution) and thereby explains away (“flattens”) others. There are many such distributions in the system, each with more or less evidence in its favour (e.g., one for ‘it’s raining’ one for ‘the sprinkler is on’), the winning one is the one that has the most evidence in its favour, given priors. Similarly, in the ventriloquist case we have two separate distributions, which later either is or isn’t integrated to form a new one – in either case, this has an impact on perception (experiencing the doll speaking or not). Distributions are partly determined (hyperparameterised) by distributions higher in the hierarchy, so percepts are many-layered in terms of their temporal ‘texture’. The assumption is that the distributions are Gaussian, so bimodal distributions are not in play, and similarly that there is always only one winning hypothesis for particular evidence (this presumably has to do with maximum entropy).