Pitch and Tone for Conlangs

Some time ago I was in the company of my girlfriend’s teenage son. At one point he responded to a question with a non-committal grunt. Not to be outdone I grunted back. There then followed about five minutes of “conversation” using only grunts, shrugs and other occasional gestures. It was quite surprising what nuances of emotion and information that could be communicated by such means.
There is a hypothesis that before true language existed early hominids communicated with body language, gestures and vocalizations. Possibly one of the earliest “sentences” was something that meant “look at me, pay attention to what I am showing”. Modern greetings effectively do the same. “Give me your attention/Acknowledge me”.
Below is an interesting experiment that had subjects trying to communicate a variety of concepts with grunts. What is interesting is that subjects choose rising noises for concepts like “big” or “high” and lowering ones for “small” or “low”. If one concept used a rising tone the paired opposite would get a lowered one. Would we see the same results with non-English speakers? What if the experiment was made with speakers of tonal languages such as Mandarin? Is there a correlation between concepts and the tones used for them in languages such as Cantonese and Mandarin?
Interesting stuff, and of relevance to Conlang creators too. Hence Diinlang uses “ta” and “up” and pairs them with tonal opposites “ko” and “loh”.
Human Language May Have Started Differently than Thought
Caption: The plots show the acoustic characteristics of each of the 18 meanings. The five variables are represented on the x-axis: D, duration; H, harmonics to noise ratio; I, intensity; P, pitch; C, pitch change. All values are normalized (z-scored) for each of the five measures. The red line shows the median and the blue box spans the first and third quartiles. The up and down arrows indicate variables that differed reliably between antonymic meanings. For example, vocalizations for bad differed from those for good by having a lower harmonics to noise ratio and pitch. The variables marked with arrows were the basis for the iconic template of each meaning. Credit: Royal Society Open Science, DOI: 10.1098/rsos.150152
(—A trio of researchers, two with the University of Wisconsin, the other with the University of California, has conducted a study, the results of which suggest that maybe humans did not get a start on language using only hand gestures as many scientists have theorized. Instead, as Marcus Perlman, Rick Dale and Gary Lupyan note in their paper published in Royal Society Open Science, it may have been a result of both noise-making and gesturing.
Nobody can say for sure how it was that we humans first began speaking to one another—surely it was a gradual process with different groups and individuals using various signals such as eye contact, body language, gesturing with arms, hands and fingers, or as the researchers with this new effort suggest, noises that were meant to convey some degree of meaning.
To come to this conclusion, the research trio conducted a study whereby volunteers were asked to make noises to convey the meaning of different words, without using body language or even facial expressions. Nine pairs of volunteers were asked to play what amounted to vocal charades, taking turns trying to get their partner to understand which of 18 contrasting word ideas (up, down, big, small, etc.) were being expressed. The researchers recorded their efforts and then compared the results among the different pairs. In so doing, they found that there was a discernible pattern—people attempting to convey the idea of "up" for example tended to use a rising pitch, whereas they did the opposite for "down." The researchers discovered that the pairs tended to improve when going multiple rounds, eventually getting to a point where the partners could figure out which word idea was being expressed on average 82.2 percent of the time. It also carried over to a non-lab environment. When the voice sounds were played for anonymous people over a crowd-sourced site, listeners were able to guess correctly on average 35.6 percent of the time, far better than chance would suggest.
These findings, the researchers claim, suggest that it appears more likely that our ancestors used both hand-signals and noises to convey meaning, which over a long period of time, evolved into more complex sounds that came to be associated with common ideas among multiple people.
Studies of gestural communication systems find that they originate from spontaneously created iconic gestures. Yet, we know little about how people create vocal communication systems, and many have suggested that vocalizations do not afford iconicity beyond trivial instances of onomatopoeia. It is unknown whether people can generate vocal communication systems through a process of iconic creation similar to gestural systems. Here, we examine the creation and development of a rudimentary vocal symbol system in a laboratory setting. Pairs of participants generated novel vocalizations for 18 different meanings in an iterative 'vocal' charades communication game. The communicators quickly converged on stable vocalizations, and naive listeners could correctly infer their meanings in subsequent playback experiments. People's ability to guess the meanings of these novel vocalizations was predicted by how close the vocalization was to an iconic 'meaning template' we derived from the production data. These results strongly suggest that the meaningfulness of these vocalizations derived from iconicity. Our findings illuminate a mechanism by which iconicity can ground the creation of vocal symbols, analogous to the function of iconicity in gestural communication systems.