An AI model can transcerate audio into text, depending on this, researchers behind the study can more accurately map the brain activity occurring during conventional models that encroach the specific characteristics of the language structure – such as phonem (simple sounds that form words) and the part of the speech (such as noun, verb and adjective).
The model used in the study, called whisper, takes audio files and their text transcript instead, which is used as a training data to map the audio in text. Then it “learns” to predict text from new audio files using the data of mapping that he has not heard before.
Thus, whipper works completely through these data without any feature of the language structure encountered in its original settings. But still, in the study, scientists showed that even after being trained, those structures emerged in the model.
The study highlights that this type of AI model – called large language model (LLM) – how to work. But the research team is more interested in insight in human language and feeling. Identifying similarities between how model language processing capabilities develop and people develop these skills can be useful for engineering devices that help people communicate.