What is audio capture in Android?
Audio capture refers to the process of recording audio from a microphone or other input source on an Android device. Audio can be captured by Android apps to enable features like voice memos, audio notes, game audio, and adding audio tracks to videos.
The Android platform provides APIs and frameworks to allow apps to record audio using the built-in microphone or other audio sources like Bluetooth headsets. Android also enables processing of the captured audio data, applying effects, encoding/decoding, and playback.
Some common use cases for audio capture on Android include:
- Voice memos and audio notes – Recording short voice messages and notes.
- Game audio – Recording game sound effects and background audio.
- Video audio – Adding voiceover narration or background music to videos.
- VoIP and chat apps – Enabling voice chat and audio calling features.
- Audio monitoring – Continuously capturing audio for security or ambient monitoring purposes.
The rest of this guide will provide an overview of the key APIs, capabilities, and best practices for handling audio capture in Android apps.
Android Audio Capture APIs
Android provides several APIs for capturing audio, including:
- MediaRecorder – Records audio from audio sources such as microphones and streams. It can capture raw audio or encode it to common formats like 3GPP, MP3, and AAC. Android recommends using MediaRecorder for most audio capture needs. See AudioCapture – Zebra Technologies TechDocs.
- MediaProjection – Captures the entire device’s audio output. Useful for capturing audio playing on the device that apps don’t have direct access to. See playback-capture-api for examples.
- AudioRecord – Captures audio from audio sources into a raw buffer in memory, useful for processing audio in realtime. See Android – Audio Capture for more details.
These APIs allow capturing audio from the mic and device audio output, encoding to common formats, realtime access via buffers, and more control over the audio pipeline.
Recording Audio
The MediaRecorder API is the primary API on Android for recording audio. It provides a high level interface for audio capture by handling audio sources, encoding, and output automatically.
To record audio, you first need to set the audio source using MediaRecorder.setAudioSource()
. Common sources include the device’s microphone with MediaRecorder.AudioSource.MIC
, or voice communications using MediaRecorder.AudioSource.VOICE_COMMUNICATION
. The audio source determines how the recorder captures audio data from the device.
Next, you should set the output audio format such as MP3, AAC, or 3GPP using MediaRecorder.setOutputFormat()
and MediaRecorder.setAudioEncoder()
. The recorder handles the encoding automatically.
Finally, call MediaRecorder.prepare()
to finalize the recorder state, and MediaRecorder.start()
to begin recording. Audio data will be written to the set output file during recording. When finished, call MediaRecorder.stop()
and MediaRecorder.reset()
to release resources.
Using MediaRecorder provides a straightforward API for recording audio in common formats on Android devices.
Processing Audio
The Android AudioRecord API allows processing audio data at a lower level for tasks like applying audio effects. To process audio with AudioRecord, you first need to initialize an AudioRecord instance by specifying key parameters like the audio source, sample rate, channel configuration, etc.
Once initialized, you can read audio data from the AudioRecord instance by calling methods like read()
in a loop. This gives you access to the raw audio data as a byte buffer that you can then manipulate and process as needed. For example, you could apply audio effects like echo or reverb by changing the buffer data before playing it back.
Some key aspects of using AudioRecord for audio processing include:
- Properly sizing the read byte buffer to avoid data loss
- Reading fast enough to keep up with the incoming audio data rate
- Applying effects efficiently in native code for good performance
With AudioRecord, you have low-level control to process and transform audio as you read it from the recording device. This allows creating customized audio experiences and effects for your app.
Playing Audio
The Android AudioTrack API is used for audio playback in Android. It allows streaming PCM audio buffers to the audio hardware for playback. Some key capabilities of AudioTrack include:
- Streaming audio playback – AudioTrack supports streaming playback by writing audio data to the track in a loop. This allows playing back arbitrary length audio streams.
- Volume control – Volume can be set on the AudioTrack using
setVolume()
orsetStereoVolume()
. Volume fades are also supported. - Audio focus – Managing audio focus allows multiple apps to share audio resources. The
setAudioFocusRequest()
method handles requesting and abandoning audio focus. - Playback position – Methods like
getPlaybackHeadPosition()
allow monitoring playback position for synchronization.
Key steps for audio playback with AudioTrack include creating an AudioTrack instance, writing audio data to it in a loop, handling volume and audio focus, and finally releasing the resources when done.
Recording System Audio
One key capability when capturing audio on Android is the ability to record system audio using the MediaProjection API. This allows an app to capture the audio streams being played on a device, enabling features like screen recording with audio.
To use MediaProjection for system audio capture, first the app must request the MEDIA_PROJECTION permission from the user. Once granted, the app can create a VirtualDisplay to mirror screen content, then obtain the associated Surface to pass to a MediaRecorder instance.
When starting the MediaRecorder, the app indicates audio sources including the device screen audio stream. As content plays on the device, MediaRecorder transparently captures the system audio output allowing seamless integration with the screen recording.
There are also options to optimize for low audio latency, choosing an appropriate audio codec, and handling audio permissions to enable high quality audio capture alongside screen recordings.
Optimizing for Audio Latency
Latency is the delay between making a request and getting a response in audio playback or capture. Lower latency means a more responsive system. There are two types of audio latency that are important for Android apps:
Audio output latency is the delay from when an app makes a request to play an audio frame, to when that frame is actually played through the audio hardware. This is important for apps doing audio playback. To minimize output latency, developers can use the AAudio API rather than OpenSL ES and create a high priority audio thread using setPriority().
Audio input latency is the delay from when sound enters the audio input hardware, to when it reaches the app processing the audio. This is important for apps doing audio recording or processing. To reduce input latency Google recommends using a THREAD_PRIORITY_URGENT_AUDIO
for the recording thread.
For audio apps needing the lowest latency on Android, the AAudio API gives the best results, achieving around 1ms of latency on some devices. Careful thread prioritization and testing on target devices is key.
Common Audio Codecs
Android supports a variety of audio codecs out of the box for recording, processing, and playback of audio. The most commonly used codecs include:
- AAC – Advanced Audio Coding is an industry standard lossy compression format used by default in MP4 and M4A files. AAC provides good audio quality at smaller file sizes. Android natively supports AAC in mono or stereo from 8kbps to 320kbps.
- OPUS – An open and royalty-free lossy audio coding format optimized for both speech and music at bitrates from 6kbps to 510kbps. Provides lower latency and better quality over Bluetooth compared to AAC or SBC. Supported in Android 5.0+. (source)
- FLAC – Free Lossless Audio Codec is a lossless compressed format that preserves all data from the original audio source. Results in larger file sizes but allows for perfect reconstruction of audio. Supported in Android 3.1+.
By default, most Android devices use Qualcomm’s aptX codec over Bluetooth which offers improved audio quality over the standard SBC codec. Certain OEM devices also support other enhanced codecs like aptX HD, LDAC, and LHDC for high definition wireless audio.
To take full advantage of available codecs for recording, streaming, or Bluetooth playback, users can install third party Android applications that provide additional codec support beyond what is available out of the box on their device.
Audio Permissions
Android uses runtime permissions to allow users to control access to sensitive data or device capabilities, like audio recording. Android apps must request permission from the user to access the microphone and record audio.
Starting with Android Marshmallow (6.0), app permissions are categorized as normal or dangerous. The ability to record audio is considered dangerous, so apps must explicitly request the RECORD_AUDIO permission at runtime before accessing the microphone.
An app can let the user know why the permission is required and show a rationale for the request. Once the permission is granted, the user can revoke or change permissions in the Settings app. This gives users transparency and control over access to audio capture.
Conclusion
In summary, Android provides various APIs for handling audio capture and playback. Developers can use MediaRecorder for basic audio recording, AudioRecord for low-level audio sampling, and MediaPlayer to playback audio. Optimizing for low audio latency is important for real-time audio apps.
Key capabilities covered here included:
- Recording raw audio with MediaRecorder and AudioRecord
- Processing and analyzing audio data
- Playing back audio via MediaPlayer and other APIs
- Capturing system/application audio
- Reducing audio delay for real-time apps
- Supporting common formats like MP3, AAC, FLAC
- Requesting necessary audio permissions
For more information, refer to the official Android developer documentation on media APIs and audio capture. There are also open source libraries that can assist with common audio tasks.