Draft: Add many audio sources (including voice) #5870

rom1v · 2025-02-22T12:00:03Z

The existing audio sources were:

output (default): forwards the whole audio output, and disables playback on the device (mapped to REMOTE_SUBMIX).
playback: captures the audio playback (Android apps can opt-out, so the whole output is not necessarily captured).
mic: captures the microphone (mapped to MIC).

This PR adds:

mic-unprocessed: captures the microphone unprocessed (raw) sound (mapped to UNPROCESSED).
mic-camcorder: captures the microphone tuned for video recording, with the same orientation as the camera if available (mapped to CAMCORDER).
mic-voice-recognition: captures the microphone tuned for voice recognition (mapped to VOICE_RECOGNITION).
mic-voice-communication: captures the microphone tuned for voice communications (it will for instance take advantage of echo cancellation or automatic gain control if available) (mapped to VOICE_COMMUNICATION).
voice-call: captures voice call (mapped to VOICE_CALL).
voice-call-uplink: captures voice call uplink only (mapped to VOICE_UPLINK).
voice-call-downlink: captures voice call downlink only (mapped to VOICE_DOWNLINK).
voice-performance: captures audio meant to be processed for live performance (karaoke), includes both the microphone and the device playback (mapped to VOICE_PERFORMANCE).

Discontinuities

The existing audio sources always produce a continuous audio stream. A major issue is that some new audio sources (like the "voice call" source) do not produce packets on silence (they only capture during a voice call).

The audio regulator (the component responsible to maintain a constant latency) assumed that the input audio stream was continuous. In this PR, it now detects discontinuities based on the input PTS (and adjusts its behavior). This only works correctly if the input PTS are "correct".

Another major problem is that, even if the capture timestamps are correct, some encoders (OPUS) rewrite the PTS based on the number of samples (ignoring the input PTS). As a consequence, when encoding in OPUS, the timings are broken: they represent a continuous audio stream where the silences are removed. This breaks the discontinuity detection in the audio regulator (we could work around the problem by relying on the current recv date, since the real time playback itself does not depend on PTS). But the most important problem is that it breaks recording timings. For example:

scrcpy --audio-source=voice-call --record=file.mp4

If the voice call does not start immediately, the audio will not be played at the correct date.

With the AAC encoder, it works (the encoder on the device does not rewrite the PTS based only on the number of samples):

scrcpy --audio-source=voice-call --record=file.mp4 --audio-codec=aac

This PR is in draft due to this unsolved issue.

Aims to fix #5670 and #5412.

Victor239 · 2025-02-25T06:11:08Z

Can there also be an option to capture no sound? When using multiple virtual display windows and playing audio it usually plays on all windows currently with no way disable it except through the OS sound settings.

rom1v · 2025-02-25T08:46:39Z

Can there also be an option to capture no sound?

https://github.com/Genymobile/scrcpy/blob/master/doc/audio.md#no-audio

rom1v · 2025-03-02T16:18:01Z

Another major problem is that, even if the capture timestamps are correct, some encoders (OPUS) rewrite the PTS based on the number of samples (ignoring the input PTS). As a consequence, when encoding in OPUS, the timings are broken: they represent a continuous audio stream where the silences are removed. This breaks the discontinuity detection in the audio regulator (we could work around the problem by relying on the current recv date, since the real time playback itself does not depend on PTS). But the most important problem is that it breaks recording timings.

This PR is in draft due to this unsolved issue.

Should be fixed by commit Fix PTS produced by the default OPUS encoder on this PR (the SHA1 will change on rebase, but currently it's 63d848f).

Please review/test/check.

LaptopDev · 2025-03-03T10:50:37Z

ref So because VOICE_UPLINK restricts 3rd party apps, microphone source cannot be passed from computer to phone during calls?

yNEX · 2025-03-05T15:20:34Z

I tested the changes from this PR using a private fork and built the project by using the GitHub Action. For my testing scenario, I received a WhatsApp call from a second phone. I tried both the --audio-source=voice-call-downlink option and voice-call-uplink and in both cases, the audio was transferred regardless of which phone was muted.

Additionally, with the regular --audio-source=playback option, the audio is no longer played back on the device. Is it possible to extend this behavior to voice calls as well?

I am using a Pixel 8 Pro (Android 15) and the Windows Client

rom1v · 2025-03-05T15:27:18Z

I tried both the --audio-source=voice-call-downlink option and voice-call-uplink and in both cases, the audio was transferred

👍 Thank you for the test.

Fixed:

diff --git a/server/src/main/java/com/genymobile/scrcpy/audio/AudioSource.java b/server/src/main/java/com/genymobile/scrcpy/audio/AudioSource.java
index 6689611ad..d16b5e387 100644
--- a/server/src/main/java/com/genymobile/scrcpy/audio/AudioSource.java
+++ b/server/src/main/java/com/genymobile/scrcpy/audio/AudioSource.java
@@ -13,8 +13,8 @@ public enum AudioSource {
     MIC_VOICE_RECOGNITION("mic-voice-recognition", MediaRecorder.AudioSource.VOICE_RECOGNITION),
     MIC_VOICE_COMMUNICATION("mic-voice-communication", MediaRecorder.AudioSource.VOICE_COMMUNICATION),
     VOICE_CALL("voice-call", MediaRecorder.AudioSource.VOICE_CALL),
-    VOICE_CALL_UPLINK("voice-call-uplink", MediaRecorder.AudioSource.VOICE_CALL),
-    VOICE_CALL_DOWNLINK("voice-call-downlink", MediaRecorder.AudioSource.VOICE_CALL),
+    VOICE_CALL_UPLINK("voice-call-uplink", MediaRecorder.AudioSource.VOICE_UPLINK),
+    VOICE_CALL_DOWNLINK("voice-call-downlink", MediaRecorder.AudioSource.VOICE_DOWNLINK),
     VOICE_PERFORMANCE("voice-performance", MediaRecorder.AudioSource.VOICE_PERFORMANCE);
 
     private final String name;

Additionally, with the regular --audio-source=playback option, the audio is no longer played back on the device. Is it possible to extend this behavior to voice calls as well?

The playback audio source uses a specific API, where we can request to duplicate audio or not (--audio-dup). For the others, we have no control (Android determines the behavior).

yNEX · 2025-03-05T16:03:29Z

Thanks for the quick response! 👌🏼

My idea was to use scrcpy to transfer both game audio and voice chat from Call of Duty Mobile to my PC for streaming with OBS. While everything works fine for the most part, I’m encountering an issue with voice call audio. When headphones are connected directly to the phone, the game sound and voice chat are bundled together. However, since I’m using the headphones on my PC, the audio streams remain separated on the phone.

Do you have any suggestions for this use case? Unfortunately, a capture card isn’t an option as it reduces the refresh rate from 120 Hz to 60 Hz. If it’s more convenient, we could discuss this privately to avoid cluttering the PR comment section.

rom1v · 2025-03-05T16:13:46Z

When headphones are connected directly to the phone, the game sound and voice chat are bundled together. However, since I’m using the headphones on my PC, the audio streams remain separated on the phone.

See #4084 #4087. Scrcpy has no control over this behavior.

davidsmith91 · 2025-03-07T07:32:09Z

@rom1v Hello, I have a phone in country A with a local sim card. I have a pc in country B. I can already use scrcpy from PC-B to phone-A passing through PC-A adb server.

What I need to do is: be able to make phone calls.

From what I'm reading here, I should be able to receive the audio of the call using voice-call-downlink.
But any idea on how to send my voice into the call (as "uplink"?). I think I could just do it calling PC-A from PC-B (or streaming the microphone in some low latency way) and putting the speakers on (and the phone would capture my voice from the pc). Yeah not the best thing but it can work, few days ago I did a call with just someone putting a phone in front of another phone and using Signal + normal phone call.

Anyway, how can I use the voice-call-downlink? I'm on a macbook, I use homebrew to install scrcpy.
I don't know if I understood correctly, but you said that the phone cannot use the mic while the audio is forwarded?

So, I couldn't hear what the person in the call is saying through voice-call-downlink and at the same moment make the phone record from the mic what the PC-A is outputting from his speakers?

Thanks very much.

rom1v · 2025-03-07T09:40:18Z

But any idea on how to send my voice into the call

See discussions in #3880.

davidsmith91 · 2025-03-07T17:22:17Z

But any idea on how to send my voice into the call

See discussions in #3880.

@rom1v what about getting the voice downlink while using phone microphone? Is it possible? If you don't get what I mean please just read again my question above, you answered the part about sending audio to the microphone, but not the other part. Thanks

rom1v · 2025-03-07T17:43:21Z

what about getting the voice downlink while using phone microphone? Is it possible?

I don't know, I just expose the audio sources from the Android API. How the Android implementation behaves with and without mic is to be tested by device, I have no control over that.

Only enable them if SC_AUDIO_REGULATOR_DEBUG is set, as they may spam the output.

Report the number of silence samples inserted due to underflow every second, along with the other metrics.

The default OPUS encoder on Android rewrites the PTS so that it exactly matches the number of samples. As a consequence: - audio clock drift is not compensated - hard silences are ignored To fix this behavior, recreate the PTS based on the current time (after encoding) and the packet duration.

The audio regulator assumed a continuous audio stream. But some audio sources (like the "voice call" audio source) do not produce any packets on silence, breaking this assumption. Use PTS to detect such discontinuities.

Store the target audio source integer (one of the constants in android.media.MediaRecorder.AudioSource) in the AudioSource enum (or -1 if not relevant). This will simplify adding new audio sources.

Expose more audio sources from MediaRecorder.AudioSources. Refs <https://developer.android.com/reference/android/media/MediaRecorder.AudioSource>

rom1v · 2025-03-07T20:01:12Z

Should be fixed by commit Fix PTS produced by the default OPUS encoder

In fact, there are still several problems.

Firstly, the resulting audio stream is broken in VLC and Firefox (it works "fine" with warnings in mpv).

Secondly, the "fixed" PTS is not correct, because we push blocks of 960 samples, but the opus encoder outputs blocks of 1024 samples, so after 960 samples, it waits for the next 960 samples before producing an output packet… so fixing the PTS on the output side adds noise in the timestamps.

I don't know how to record a correct file while allowing to compensate for clock drift (so that the video and audio remains synchronized) or handle "missing" silence packets.

davidsmith91 · 2025-03-08T06:02:01Z

what about getting the voice downlink while using phone microphone? Is it possible?

I don't know, I just expose the audio sources from the Android API. How the Android implementation behaves with and without mic is to be tested by device, I have no control over that.

Okay okay, but how can I use the voice downlink source? Can you ship it to homebrew so I can install from there? Or even as a release here on github, or a pre-release. Or am I forced to build from source? I need it on macos arm64

yNEX · 2025-03-08T11:52:01Z

what about getting the voice downlink while using phone microphone? Is it possible?

I don't know, I just expose the audio sources from the Android API. How the Android implementation behaves with and without mic is to be tested by device, I have no control over that.

Okay okay, but how can I use the voice downlink source? Can you ship it to homebrew so I can install from there? Or even as a release here on github, or a pre-release. Or am I forced to build from source? I need it on macos arm64

I actually tested both voice-downlink and voice-uplink with a WhatsApp call. However, I noticed that both audio sources seemed to include both up- and downlink audio. This could be due to WhatsApp potentially using the Android API differently than other applications (this is just my hypothesis - perhaps someone can confirm?). You'll need to test different scenarios as this feature is still under development.

I've created a fork that includes these changes. It's publicly available on my GitHub and built with the GitHub Action provided by this repo :)

davidsmith91 · 2025-03-08T13:44:48Z

what about getting the voice downlink while using phone microphone? Is it possible?

I don't know, I just expose the audio sources from the Android API. How the Android implementation behaves with and without mic is to be tested by device, I have no control over that.

Okay okay, but how can I use the voice downlink source? Can you ship it to homebrew so I can install from there? Or even as a release here on github, or a pre-release. Or am I forced to build from source? I need it on macos arm64

I actually tested both voice-downlink and voice-uplink with a WhatsApp call. However, I noticed that both audio sources seemed to include both up- and downlink audio. This could be due to WhatsApp potentially using the Android API differently than other applications (this is just my hypothesis - perhaps someone can confirm?). You'll need to test different scenarios as this feature is still under development.

I've created a fork that includes these changes. It's publicly available on my GitHub and built with the GitHub Action provided by this repo :)

I would need to use it for normal phone calls using Android 14 or 15 phone.
At least, I hope I can get the voice-downlink. Then I have to find a way to make my voice go into the uplink.
Maybe some bluetooth thing is the only solution, but I haven't found anything worth pursuing

yNEX · 2025-03-08T14:52:06Z

what about getting the voice downlink while using phone microphone? Is it possible?

I don't know, I just expose the audio sources from the Android API. How the Android implementation behaves with and without mic is to be tested by device, I have no control over that.

Okay okay, but how can I use the voice downlink source? Can you ship it to homebrew so I can install from there? Or even as a release here on github, or a pre-release. Or am I forced to build from source? I need it on macos arm64

I actually tested both voice-downlink and voice-uplink with a WhatsApp call. However, I noticed that both audio sources seemed to include both up- and downlink audio. This could be due to WhatsApp potentially using the Android API differently than other applications (this is just my hypothesis - perhaps someone can confirm?). You'll need to test different scenarios as this feature is still under development.
I've created a fork that includes these changes. It's publicly available on my GitHub and built with the GitHub Action provided by this repo :)

I would need to use it for normal phone calls using Android 14 or 15 phone. At least, I hope I can get the voice-downlink. Then I have to find a way to make my voice go into the uplink. Maybe some bluetooth thing is the only solution, but I haven't found anything worth pursuing

Maybe Bluetooth over TCP/IP is a solution for you

davidsmith91 · 2025-03-10T06:44:51Z

Maybe Bluetooth over TCP/IP is a solution for you

@yNEX any idea/example?

yNEX · 2025-03-10T09:29:35Z

Maybe Bluetooth over TCP/IP is a solution for you

@yNEX any idea/example?

I couldn't find any direct solution online for this. You might have to do some research. I don't know if USB over IP solutions like VirtualHere could help you. This could make a USB Bluetooth Adapter available over the Internet like it is locally attached to another PC. Just an idea, don't know if it works

rom1v mentioned this pull request Feb 22, 2025

Draft: Add many audio sources (including voice) #5869

Closed

rom1v changed the base branch from master to dev February 22, 2025 12:00

rom1v force-pushed the audio_sources branch from 6ac3915 to 5d933c8 Compare March 2, 2025 16:17

rom1v added 6 commits March 7, 2025 20:58

Disable audio regulator underflow logs

6dbb0e6

Only enable them if SC_AUDIO_REGULATOR_DEBUG is set, as they may spam the output.

Report underflow samples in verbose mode

53096fd

Report the number of silence samples inserted due to underflow every second, along with the other metrics.

Handle audio stream discontinuities

433f6a9

The audio regulator assumed a continuous audio stream. But some audio sources (like the "voice call" audio source) do not produce any packets on silence, breaking this assumption. Use PTS to detect such discontinuities.

Refactor audio sources

5d90621

Store the target audio source integer (one of the constants in android.media.MediaRecorder.AudioSource) in the AudioSource enum (or -1 if not relevant). This will simplify adding new audio sources.

Add more audio sources

fba3cd4

Expose more audio sources from MediaRecorder.AudioSources. Refs <https://developer.android.com/reference/android/media/MediaRecorder.AudioSource>

rom1v force-pushed the audio_sources branch from 5d933c8 to fba3cd4 Compare March 7, 2025 20:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft: Add many audio sources (including voice) #5870

Draft: Add many audio sources (including voice) #5870

rom1v commented Feb 22, 2025

Victor239 commented Feb 25, 2025

rom1v commented Feb 25, 2025

rom1v commented Mar 2, 2025 •

edited

Loading

LaptopDev commented Mar 3, 2025 •

edited

Loading

yNEX commented Mar 5, 2025 •

edited

Loading

rom1v commented Mar 5, 2025 •

edited

Loading

yNEX commented Mar 5, 2025

rom1v commented Mar 5, 2025

davidsmith91 commented Mar 7, 2025

rom1v commented Mar 7, 2025

davidsmith91 commented Mar 7, 2025

rom1v commented Mar 7, 2025

rom1v commented Mar 7, 2025

davidsmith91 commented Mar 8, 2025

yNEX commented Mar 8, 2025 •

edited

Loading

davidsmith91 commented Mar 8, 2025

yNEX commented Mar 8, 2025

davidsmith91 commented Mar 10, 2025

yNEX commented Mar 10, 2025

Draft: Add many audio sources (including voice) #5870

Are you sure you want to change the base?

Draft: Add many audio sources (including voice) #5870

Conversation

rom1v commented Feb 22, 2025

Discontinuities

Victor239 commented Feb 25, 2025

rom1v commented Feb 25, 2025

rom1v commented Mar 2, 2025 • edited Loading

LaptopDev commented Mar 3, 2025 • edited Loading

yNEX commented Mar 5, 2025 • edited Loading

rom1v commented Mar 5, 2025 • edited Loading

yNEX commented Mar 5, 2025

rom1v commented Mar 5, 2025

davidsmith91 commented Mar 7, 2025

rom1v commented Mar 7, 2025

davidsmith91 commented Mar 7, 2025

rom1v commented Mar 7, 2025

rom1v commented Mar 7, 2025

davidsmith91 commented Mar 8, 2025

yNEX commented Mar 8, 2025 • edited Loading

davidsmith91 commented Mar 8, 2025

yNEX commented Mar 8, 2025

davidsmith91 commented Mar 10, 2025

yNEX commented Mar 10, 2025

rom1v commented Mar 2, 2025 •

edited

Loading

LaptopDev commented Mar 3, 2025 •

edited

Loading

yNEX commented Mar 5, 2025 •

edited

Loading

rom1v commented Mar 5, 2025 •

edited

Loading

yNEX commented Mar 8, 2025 •

edited

Loading