DEV Community

Man yin Mandy Wong for Tencent Cloud

Posted on

GME 3D Voice Technology: High-Precision HRTF + Distance Attenuation Model

3D voice provides more auditory information for players to help them identify the positions of their teammates/enemies through voice and feel their presence much like in the physical world. This makes the gaming experience more convenient and fun.

Many game developers may ask: How does 3D voice work? How do I add it to my games? Below is a quick guide to 3D voice technology.

1. How do we determine sound source positions?

We can determine the position of a sound source mainly because the sound reaches the left and right ears at different times, and the strengths and other metrics are different, too. Specifically, we identify the horizontal position based on the differences in time, sound level, and timbre between binaural signals. The auricle acts as a comb filter to help identify the vertical position of a compound sound source. Sound localization also depends on such factors as sound level, spectrum, and personal experience.

2. How are the voice positions of players simulated? How does Tencent Cloud GME work?

A head-related transfer function (HRTF) is needed to do so. It can be regarded as a comprehensive filtering process where sound signals travel from the sound source to both ears. The process includes air filtering, reverb in the ambient environment, scattering and reflection on the human body (such as torso, head, and auricle), etc.

The implementation of the real-time 3D virtualization feature for voice is not merely about calling the HRTF. It also entails mapping the virtual space in the game to the real-life environment and performing high-frequency operations. The implementation process is summarized as follows. Assume there are N players connecting to the mic in a game. Given the high requirements for real-timeness in gaming, each player's terminal should receive at least (N-1) packets containing voice information and relative position information within a unit time of 20 ms in order to ensure a smooth gaming experience. Based on the relative position information, the high-precision HRTF model in the 3D audio algorithm is used to process the voice information, coupled with the information about the presence of obstacles in the way, ambient sounds in the game (such as the sound of running water and echo in a room), etc. In this way, realistic real-time 3D sound is rendered on the players' devices.

The entire process is compute-intensive, and some low/mid-end devices may be unable to handle it. How to minimize resource usage on the players' devices while ensuring a smooth gaming experience remains an industry challenge. In addition, some HRTF libraries can result in serious attenuation for some frequencies in audio signals, most notably the musical instrument sounds with diverse frequency components. This not only affects the accuracy of sound localization but also dulls the instrument sounds in the output ambient sounds.

Tencent Cloud Game Multimedia Engine (GME) launched the 3D voice feature in partnership with Tencent Ethereal Audio Lab, a top-notch audio technology team. Through the high-precision HRTF model and the distance attenuation model, the feature gives players a highly immersive gaming experience in the virtual world. Thanks to optimized terminal rendering algorithms, the computing efficiency increases by nearly 50%, and the real-time spatial rendering time of a single sound source is around 0.5 ms, so that most low/mid-end devices can sustain real-time 3D sound rendering. To address the problem of signal attenuation in the rendering process, GME improves the 3D rendering effect through its proprietary audio signal equalization techniques, making ambient sounds crystal clear.

3. How do we integrate 3D voice?

There are two 3D voice integration methods available. You can choose a suitable method based on the characteristics of your game.

Method 1: For non-VR games

How it works:

As the implementation of 3D voice requires calculations based on the positions and distances of sound sources, position coordinates are needed as key data in order to achieve 3D sound effects. Based on the coordinates, we can identify the position in the virtual space, calculate the distance from the sound source, and get the position information.

GME has streamlined the overall integration process. You only need to transfer the local coordinate information and position information to GME through the API. Then, GME will aggregate the data and calculate the coordinate information and position information of everyone in the room to get the 3D voice information.

Now we already have the position information of each speaker in the room in the virtual world. In order to achieve a 3D sound effect, 3D sound needs to be created. The position information, together with the audio streams, reaches the voice-receiving client. Without position information, the sound would be played back without any sound effect, just like in a common phone call or conference call. By contrast, with position information and GME's local 3D voice model engine, a 3D sound effect can be achieved.

Integration steps:

Prerequisites:

The "EnterRoom" API has been called, and the result in the room entry callback is successful room entry.

On the premise of successful connection to the voice chat service, you can integrate 3D voice as instructed below:

Call "InitSpatializer" to initialize the 3D sound effect engine.

Call "EnableSpatializer" to enable 3D voice.

Call "UpdateAudioRecvRange" to set the attenuation range.

Call "UpdateSelfPosition" to update the position information in real time.

Integration Guide: https://cloud.tencent.com/document/product/607/18218

Method 2: For VR games

There is a dedicated integration method for VR games. As we have noticed, VR device users have high requirements for the refresh rate, sound responsiveness, and spatial perception of sound. In VR gaming scenarios that emphasize real-time interactions and deep immersion, a premium low-latency 3D voice experience is of paramount importance. However, the traditional RTC voice call and 3D voice solutions in the market fall short of players' expectations of accuracy, real-timeness, etc.

How it works:

We have further optimized the 3D voice feature for the GME SDK 2.9.2. You can directly call the 3D audio model to pass in the 3D position information in real time and therefore achieve a real-time 3D sound effect.

Integration steps:

Prerequisites:

The "EnterRoom" API has been called, and the result in the room entry callback is successful room entry.

On the premise of successful connection to the voice chat service, you can integrate 3D voice as instructed below:

Call "InitSpatializer" to initialize the 3D sound effect engine.

Call "EnableSpatializer" to enable 3D voice.

Call "UpdateAudioRecvRange" to set the attenuation range.

Call "UpdateSelfPosition" to update the position information in real time.

Call "UpdateOtherPosition" to update in real time the position information of others in the room (which can be obtained at the business layer).

Read more at: https://www.tencentcloud.com/dynamic/blogs/sample-article/100365

Top comments (0)