Cross-linguistic exploration of phonemic representations

Kaya, Zeynep Gokcen

All languages around the world have their own vast sound inventories. Understanding each other through verbal communication requires, first of all, understanding each other’s phonemes. This often overlooked constraint is non-trivial already among native speakers of the same language, given the variability with which we all articulate our phonemes. It becomes even more challenging when interacting with non-native speakers, who have developed neural representations of different sets of phonemes. How can the brain make sense of such diversity? It is remarkable that the sounds produced by the vocal tract, that have evolved to serve as sym-bols in natural languages, fall almost neatly into two classes with such different characteristics, consonants and vowels. Consonants are complex in nature: beyond acoustically-defined formant (resonant) frequencies, additional physical parameters such as formant transitions, the delay period in those transitions, energy bursts, the vibrations of the vocal cords occurring before and during the consonant burst, and the length of those vibrations are needed to identify them. Surprisingly, consonants are very quickly categorized through a quite mysterious form of invariant feature ex-traction. In contrast to consonants, vowels can be represented in a simple and transparent manner and that is because, amazingly, only two analog dimensions within a continuous space are essen-tially enough to characterize a vowel. The first dimension corresponds to the degree to which the vocal tract is open when producing the vowel and the second dimension is the location of the main occlusion. Surprisingly, these anatomically-defined production modes match very precisely the first two acoustically-defined formant frequencies, namely F1 and F2. While for some languages some additional features are necessary to specify a vowel, such as its length or roundedness, whose nature may be more discrete, for many others F1 and F2 are all there is to it. In this thesis, we use both behavioral (phoneme confusion frequencies) and neural measures (the spatio- temporal distribution of phoneme-evoked neural activation) to study the cross-linguistic organization of phoneme perception. In Chapter 2, we study the perception of consonants by repli-cating and extending a classical study on sub-phonemic features underlying perceptual differences between phonemes. Comparing the responses of native listeners to that of Italian, Turkish, Hebrew, and (Argentinian) Spanish listeners to a range of American English consonants, we look at the specific patterns of errors that speakers of different languages make by using the metric content index, which was previously used in entirely different contexts, with either discrete, e.g. in face space, or continuous representations, e.g. of the spatial environment. Beyond the analysis of percent correct score, and transmitted information, we frame the problem in terms of ‘place attractors’, in analogy to those which have been well studied in spatial memory. Through our experimental paradigm, we try to access distinct attractors in different languages. In the same chapter, we provide auditory evoked potentials of some consonant-vowel syllables, which hint at transparent processing of the vowels regulated by the first two formants that characterize them, and accordingly we then turn to investigating the vowel trajectories in the vowel manifold. We start our exploration of the vowel space in Chapter 3 by addressing a perceptually important third dimension for native Turkish speakers – that is rounding. Can native Turkish speakers navigate better vowel trajectories in which the second formant changes over a short time, to reflect rounding, compared to native Italian speakers, who are not required to make such fine discriminations on this dimension? We found no mother tongue effects. We have found, however, that rounding in vowels could be represented with similar efficiency by fine differences in a F2 peak frequency which is constant in time, or inverting the temporal dynamics of a changing F2, which then makes vowels not mere points in the space, but rather continuous trajectories.We walk through phoneme trajectories at every tens of milliseconds, it comes to us as nat-urally as walking in a room, if not more. Similar to spatial trajectories, we create equidistant continuous vowel trajectories in Chapter 4 on a vowel wheel positioned in the central region of the two-dimensional vowel space where in some languages like Italian there are no standard vowel categories, and in some other, like English, there are. Is the central region in languages like Italian to be regarded as a flat empty space with no attractors? Is there any reminiscence of their own phoneme memories? We ask whether this central region is flat, or can at least be flattened through extensive training. If so, would then we find a neural substrate that modulates the perception in the 2D vowel plane, similar to grid cell representation that is involved in the spatial navigation of empty 2D arenas? Our results are not suggestive of a grid-like representation, but rather points at the modulation of the neural signal by the position of Italian vowels around the outer contour of the wheel. Therefore in Chapter 5, we ask how our representation of the vowel space, not only in the central region but rather in the entirely of its linguistically relevant portion, is deformed by the presence of the standard categories of our vowel repertoire. We use ‘belts’, that are short stretches along which formant frequencies are varied quasi-continuously, to determine the local metric that best describes, for each language, the vowel manifold as a non-flat space constructed in our brain. As opposed to the ‘consonant planes’, that we constructed in Chapter 2, which appear to have a similar structure to a great extent, we find that the vowel plane is subjective and that it is language dependent. In light of language-specific transformations of the vowel plane, we wonder whether native bilinguals hold simultaneously multiple maps available and use one or the other to interpret linguistic sources depending on context. Or alternatively, we ask, do they construct and use a fusion of the two original maps, that allows them to efficiently discriminate vowel contrast that have to be discriminated in either language? The neural mechanisms underlying the physical map switch, known as remapping, have been well studied in rodent hippocampus; is the vowel map alternation governed by similar principles? We compare and show that the perceptual vowel maps of native Norwegian speakers, who are not bilingual but fluent in English, are unique, probably sculpted by their long-term memory codes, and we leave the curious case of bilinguals for future studies. Overall we attempt to investigate phoneme perception in a different framework compared to how it has been studied in the literature, which has been in the interest of a large community for many years, but largely disconnected from the study of cortical computation. Our aim is to demonstrate that insights about persisting questions in the field may be reached from another well explored part of cognition.

Cross-linguistic exploration of phonemic representations / Kaya, Zeynep Gokcen. - (2018 Dec 17).