Prosodic constraints on statistical strategies in segmenting fluent speech

Shukla, Mohinish

Learning a spoken language is, in part, an input-driven process. However, the relevant units of speech like words or morphemes are not clearly marked in the speech input. This thesis explores some possible strategies to segment fluent speech. Two main strategies for segmenting fluent speech are considered. The first involves computing the distributional properties of the input stream. Previous research has established that adults and infants can use the transition probabilities (TPs) between syllables to segment speech. Specifically, researchers have found a preference for syllabic sequences which have relatively high average transition probabilities between the constituent syllables. The second strategy relies on the prosodic organization of speech. In particular, larger phrasal constituents of speech are invariably aligned with the boundaries of words. Thus, any sensitivity to the edges of such phrases will serve to place additional constraints on possible words. The main goal of this thesis is to understand how different strategies conspire together to provide a rich set of cues to segment speech. In particular, we explore how prosodic boundaries influence distributional strategies in segmenting fluent speech. The primary methodology employed is behavioral studies with Italian-speaking adults. In the initial experimental chapters, a novel paradigm is described for studying distributional strategies in segmenting artificial, fluent speech streams. This paradigm uses artificial speech containing syllabic noise, defined as the presence of syllables that do not comprise the target nonce words, but occur at random at comparable frequencies. It is shown that the presence of syllabic noise does not affect segmentation. This suggests that statistical computations are robust. We find that, although the presence of the noise syllables do not affect TP computations, the placement of nonce words with respect to each other does. In particular, 'words' with a clumped distribution are better segmented than 'words' with an even spacing. This suggests that even the process of statistical segmentation itself is constrained. The syllabic noise paradigm is utilized to create speech streams as sequences of frames: syllabic sequences of fixed length. 'Words' can be placed at arbitrary positions with respect to these frames; the remaining positions are occupied by noise syllables. By adding pitch and length characteristics of Intonational Phrases (IPs, which are large phrasal constituents) from the native language, the frames can be turned into prosodic 'phrases'. Thus, nonce words can be placed at different positions with respect to such 'phrases'. It is found that 'words' that straddle such 'phrases' are not preferred over non-words, while 'phrase'-internal 'words' are. Removing the prosodic aspects from the frames abrogates this effect. These initial experiments suggest that prosody carves speech streams into smaller constituents. Presumably, participants infer the edges of these 'phrases' as being edges of words, as in natural speech. It is well known that edge positions are salient. This suggests that 'words' at the edges of the 'phrases' should be better recognized than 'words' in the middles. The subsequent experiments show such an edge effect of prosody. The previous results are ambiguous as to the whether prosody blocks the computation of TPs across phrasal boundaries, or acts at a later stage to suppress the outcome of TP computations. It is seen that prosody does not block TP computations: under certain conditions one can find evidence that participants compute TPs for both 'phrase'-medial and phrase'-straddling 'words'. These results suggest that prosody acts as a filter against statistically cohesive 'words' that straddle prosodic boundaries. Based on these results, the prosodic filtering model is proposed. Next, we examine the generality of the prosodic filtering effect. It will be shown that a foreign prosody causes a similar perception of 'phrasal' edges; the edge effect and the filtering effect are both observed even with foreign IPs. Phonologists have proposed that IPs are universally marked by similar acoustic cues. Thus, the results with foreign prosody suggest that these universal cues play a role in the perception of phrases in fluent speech. Such cues include final lengthening and final pitch decline; further experiments show that, at least in the experimental paradigm used in this thesis, pitch decline plays the primary role in the perception of 'phrases'. Finally, we consider the possible bases for the perception of prosodic edges in otherwise fluent speech. It is suggested that this capacity is not purely linguistic, but arises from acoustic perception: we will see that time-reversed IPs, which maintains pitch breaks at 'phrasal' boundaries, can still induce the filtering effect. In an annex, the question of how time-reversed (backward) speech is perceived in neonates is addressed. In a brain imaging (OT) study with neonates, we find evidence that forward speech is processed differently from backward speech, replicating previous results. In conclusion, the task of finding word boundaries in fluent speech is highly constrained. These constraints can be understood as the natural limitations that ensue when multiple cognitive systems interact in solving particular tasks.

Prosodic constraints on statistical strategies in segmenting fluent speech / Shukla, Mohinish. - (2006 Nov 21).