reproducir - ¿Cómo uso los datos de muestra de audio de Java Sound?

reproducir sonido jframe (2)

Esta pregunta generalmente se hace como parte de otra pregunta, pero resulta que la respuesta es muy larga. Decidí responderlo aquí para poder vincularlo a otro sitio.

Aunque no soy consciente de que Java puede producir muestras de audio para el programador en este momento, si eso cambia en el futuro, este puede ser un lugar para ello. Sé que JavaFX está empezando a tener cosas como esta, por ejemplo AudioSpectrumListener .

Estoy usando javax.sound.sampled para la reproducción y / o grabación, pero me gustaría hacer algo con el audio.

Tal vez me gustaría mostrarlo visualmente o procesarlo de alguna manera.

¿Cómo accedo a los datos de muestra de audio para hacer eso con Java Sound?

Ver también:

Tutoriales de Java Sound (Oficial)
Recursos de sonido de Java (no oficial)

Así es como obtienes los datos de muestra reales del sonido que se está reproduciendo actualmente. La otra respuesta excelente le dirá qué significan los datos. No lo he probado en otro sistema operativo que no sea mi máquina con Windows 10 YMMV. Para mí, extrae el dispositivo de grabación predeterminado del sistema actual. En Windows, configúrelo en "Mezcla estéreo" en lugar de "Micrófono" para obtener sonido de reproducción. Es posible que deba alternar entre "Mostrar dispositivos deshabilitados" para ver "Mezcla estéreo".

import javax.sound.sampled.*; public class SampleAudio { private static long extendSign(long temp, int bitsPerSample) { int extensionBits = 64 - bitsPerSample; return (temp << extensionBits) >> extensionBits; } public static void main(String[] args) throws LineUnavailableException { float sampleRate = 8000; int sampleSizeBits = 16; int numChannels = 1; // Mono AudioFormat format = new AudioFormat(sampleRate, sampleSizeBits, numChannels, true, true); TargetDataLine tdl = AudioSystem.getTargetDataLine(format); tdl.open(format); tdl.start(); if (!tdl.isOpen()) { System.exit(1); } byte[] data = new byte[(int)sampleRate*10]; int read = tdl.read(data, 0, (int)sampleRate*10); if (read > 0) { for (int i = 0; i < read-1; i = i + 2) { long val = ((data[i] & 0xffL) << 8L) | (data[i + 1] & 0xffL); long valf = extendSign(val, 16); System.out.println(i + "/t" + valf); } } tdl.close(); } }

Bueno, la respuesta más simple es que, por el momento, Java no puede producir datos de muestra para el programador. La reproducción con javax.sound.sampled actúa en gran medida como un puente entre el archivo y el dispositivo de sonido. Los bytes se leen desde el archivo y se envían.

¡No suponga que los bytes son muestras de audio significativas! A menos que tengas un archivo AIFF de 8 bits, no lo son. (Por otro lado, si las muestras están definitivamente firmadas con 8 bits, puede hacer aritmética con ellas).

Entonces, en su lugar, enumeraré los tipos de AudioFormat.Encoding y describiré cómo decodificarlos usted mismo. Esta respuesta no cubrirá cómo codificarlos, pero se incluye en el ejemplo de código completo en la parte inferior. La codificación es principalmente el proceso de decodificación en reversa.

Esta es una respuesta muy larga, pero quería dar una visión general tan completa como pudiera.

Un poco sobre audio digital

En general, cuando se explica el audio digital, nos referimos a la modulación de código de pulso lineal (LPCM).

Una onda de sonido continua se muestrea a intervalos regulares y las amplitudes se cuantifican en enteros de alguna escala.

Aquí se muestra una onda sinusoidal muestreada y cuantificada a 4 bits:

Observe que el valor más positivo en la representación del complemento de dos es 1 menos que el valor más negativo. Este es un pequeño detalle para tener en cuenta. Por ejemplo, si está recortando una forma de onda y olvida esto, los clips positivos se desbordarán.

Cuando tenemos audio en la computadora, tenemos una matriz de estas muestras. Esto es a lo que queremos convertir la matriz de bytes. Para decodificar PCM no nos importa demasiado la frecuencia de muestreo o la cantidad de canales, así que no lo cubriremos aquí.

Algunas suposiciones

Todos los ejemplos de código asumirán las siguientes declaraciones:

byte[] bytes; La matriz de bytes, leída de InputStream.
float sample; La muestra en la que estamos trabajando.
long temp; Un valor provisional utilizado para la manipulación general.
int i; La posición en la matriz de bytes en cada muestra.

Todas las codificaciones se escalarán en la matriz float[] al rango de -1f <= sample <= 1f . Todos los formatos de punto flotante que he visto vienen de esta manera y también es el más útil.

Escalar es simple, solo:

sample = sample / fullScale(bitsPerSample);

Donde fullScale es 2 ^{bits Por Muestra - 1} .

¿Cómo forzar la matriz de bytes a datos significativos?

La matriz de bytes contiene los marcos de muestra divididos y todo en una línea. Esto es realmente sencillo, excepto por algo llamado endianness , que es el orden de los bytes en cada paquete.

Aquí hay un diagrama. Este paquete contiene el valor decimal 9999:

24-bit sample as big-endian: bytes[i] bytes[i + 1] bytes[i + 2] ┌──────┐ ┌──────┐ ┌──────┐ 00000000 00100111 00001111 24-bit sample as little-endian: bytes[i] bytes[i + 1] bytes[i + 2] ┌──────┐ ┌──────┐ ┌──────┐ 00001111 00100111 00000000

Ellos tienen los mismos valores binarios; sin embargo, las órdenes de bytes están invertidas.

En big-endian, los bytes más significativos se presentan antes que los bytes menos significativos.
En little-endian, los bytes menos significativos aparecen antes que los bytes más significativos.

Los archivos WAV se almacenan en orden de bytes little-endian y los archivos AIFF se almacenan en orden de bytes big-endian. Endianness se puede obtener de AudioFormat .

Para concatenar los bytes y ponerlos en nuestra variable de temp , nosotros:

Bitwise Y cada byte con la máscara 0xFF (que es 0b1111_1111 ) para evitar la extensión del signo cuando el byte se promueve automáticamente. (char, byte y short se promueven a int cuando se realiza una aritmética en ellos).
Bit cambia cada byte a la posición.
Bitwise O los bytes juntos.

Aquí hay un ejemplo de 24 bits:

if (isBigEndian) { temp = ( ((bytes[i ] & 0xffL) << 16L) | ((bytes[i + 1] & 0xffL) << 8L) | (bytes[i + 2] & 0xffL) ); } else { temp = ( (bytes[i ] & 0xffL) | ((bytes[i + 1] & 0xffL) << 8L) | ((bytes[i + 2] & 0xffL) << 16L) ); }

Observe que el orden de cambio se invierte para endianness.

Este proceso también se puede generalizar en un bucle (que se incluye en el código completo), aunque tiene un aspecto mucho más esotérico.

Ahora que tenemos los bytes concatenados juntos, podemos convertirlos en una muestra.

¿Cómo decodizo `Encoding.PCM_SIGNED` ?

El signo del complemento de los dos debe ser extendido. Esto significa que si el bit más significativo (MSB) se establece en 1, llenaremos todos los bits de arriba con 1. El desplazamiento de aritmética hacia la derecha ( >> ) nos rellenará automáticamente si el bit de signo está configurado, por lo que Normalmente lo hago de esta manera:

int extensionBits = bitsPerLong - bitsPerSample; sample = (temp << extensionBits) >> extensionBits.

(Donde bitsPerLong es 64.)

Para comprender cómo funciona esto, aquí hay un diagrama de signos que se extiende de 8 bits a 16 bits:

This is the byte value -1 but the upper bits of the short are 0. Shift the byte''s MSB in to the MSB position of the short. 0000 0000 1111 1111 << 8 ─────────────────── 1111 1111 0000 0000 Shift it back and the right-shift fills all the upper bits with a 1. We now have the short value of -1. 1111 1111 0000 0000 >> 8 ─────────────────── 1111 1111 1111 1111

Los valores positivos (que tenían un 0 en el MSB) no se modifican. Esta es una propiedad agradable del desplazamiento a la derecha aritmético.

Luego escalarlo.

¿Cómo decodifico `Encoding.PCM_UNSIGNED` ?

Lo convertimos en un número firmado. Las muestras sin firmar se compensan simplemente de modo que, por ejemplo:

Un valor sin signo de 0 corresponde al valor más negativo firmado.
Un valor sin signo de 2 ^{bits Por Muestra - 1} corresponde al valor 0 con signo.
Un valor sin signo de 2 ^{bits por muestra} corresponde al valor más positivo firmado.

Así que esto resulta ser bastante simple, solo resta la compensación:

sample = temp - fullScale(bitsPerSample);

Luego escalarlo.

¿Cómo decodizo `Encoding.PCM_FLOAT` ?

Esto es nuevo desde Java 7.

En la práctica, el PCM de coma flotante es invariablemente IEEE de 32 bits o IEEE de 64 bits y ya está escalado al rango de ±1.0 . Las muestras se pueden obtener con los métodos de utilidad Float#intBitsToFloat y Double#longBitsToDouble .

// IEEE 32-bit sample = Float.intBitsToFloat((int) temp); // IEEE 64-bit sample = (float) Double.longBitsToDouble(temp);

¿Cómo decodifico `Encoding.ULAW` y `Encoding.ALAW` ?

Estos son códecs de compresión comparativos que son más comunes en teléfonos y tal. Están respaldados por javax.sound.sampled , supongo porque son utilizados por el formato Au de Sun. (Aunque no se limita a este tipo de contenedor, por ejemplo, WAV puede contener estas codificaciones).

Puede conceptualizar la ley A y la ley μ como si fueran un formato de coma flotante. Estos son formatos PCM, pero el rango de valores no es lineal.

Hay dos formas de decodificarlos. Mostraré la ecuación matemática. También puede decodificarlos manipulando el binario directamente, que se describe en esta publicación de blog, pero es un poco más esotérico.

Para ambos, los datos comprimidos son de 8 bits. La ley A estándar es de 13 bits cuando se decodifica y la ley μ es de 14 bits cuando se decodifica; sin embargo, aplicar la ecuación produce un rango de ±1.0 .

Antes de poder aplicar la ecuación, hay tres cosas que hacer:

Algunos de los bits están invertidos de forma estándar para el almacenamiento debido a alguna razón arcaica que involucra la integridad de los datos.
Se almacenan como un signo y magnitud en lugar de un complemento de dos.
La ecuación también espera un rango de ±1.0 por lo que el valor de 8 bits debe escalarse.

Para μ-law, todos los bits se invierten así:

temp = temp ^ 0xffL; // 0xff == 0b1111_1111

Para la ley A, todos los demás bits se invierten así:

temp = temp ^ 0x55L; // 0x55 == 0b0101_0101

(XOR se puede usar para hacer inversión. Consulte "¿Cómo se configura, se borra y se alterna un poco?" )

Para convertir de signo y magnitud a complemento de dos, nosotros:

Verifique si el bit de signo está configurado.
Si es así, borre el bit de signo y niegue el número.

// 0x80 == 0b1000_0000 if ((temp & 0x80L) == 0x80L) { temp = temp ^ 0x80L; temp = -temp; }

Luego escale los números codificados, de la misma manera que se describió anteriormente:

sample = temp / fullScale(8);

Ahora podemos aplicar la expansión.

La ecuación de la ley μ traducida a Java es entonces:

sample = (float) ( signum(sample) * (1.0 / 255.0) * (pow(256.0, abs(sample)) - 1.0) );

La ecuación de la ley A traducida a Java es entonces:

float signum = signum(sample); sample = abs(sample); if (sample < (1.0 / (1.0 + log(87.7)))) { sample = (float) ( sample * ((1.0 + log(87.7)) / 87.7) ); } else { sample = (float) ( exp((sample * (1.0 + log(87.7))) - 1.0) / 87.7 ); } sample = signum * sample;

Aquí está el código de ejemplo completo para la clase SimpleAudioConversion .

package mcve.audio; import javax.sound.sampled.AudioFormat; import javax.sound.sampled.AudioFormat.Encoding; import static java.lang.Math.ceil; import static java.lang.Math.pow; import static java.lang.Math.signum; import static java.lang.Math.abs; import static java.lang.Math.log; import static java.lang.Math.exp; /** * Performs rudimentary audio format conversion. * * Example usage: * * <pre>{@code * AudioInputStream ais = ... ; * SourceDataLine line = ... ; * AudioFormat fmt = ... ; * * // do prep * * for (int blen = 0; (blen = ais.read(bytes)) > -1;) { * int slen; * slen = SimpleAudioConversion.unpack(bytes, samples, blen, fmt); * * // do something with samples * * blen = SimpleAudioConversion.pack(samples, bytes, slen, fmt); * line.write(bytes, 0, blen); * } * }</pre> * * @author Radiodef * @see <a href="http://.com/a/26824664/2891664">Overview on .com</a> */ public final class SimpleAudioConversion { private SimpleAudioConversion() {} /** * Converts: * <ul> * <li>from a byte array ({@code byte[]}) * <li>to an audio sample array ({@code float[]}). * </ul> * * @param bytes the byte array, filled by the {@code InputStream}. * @param samples an array to fill up with audio samples. * @param blen the return value of {@code InputStream.read}. * @param fmt the source {@code AudioFormat}. * * @return the number of valid audio samples converted. * * @throws NullPointerException * if {@code bytes}, {@code samples} or {@code fmt} is {@code null} * @throws ArrayIndexOutOfBoundsException * if {@code (bytes.length < blen)} * or {@code (samples.length < blen / bytesPerSample(fmt.getBitsPerSample()))}. */ public static int unpack(byte[] bytes, float[] samples, int blen, AudioFormat fmt) { int bitsPerSample = fmt.getSampleSizeInBits(); int bytesPerSample = bytesPerSample(bitsPerSample); boolean isBigEndian = fmt.isBigEndian(); Encoding encoding = fmt.getEncoding(); double fullScale = fullScale(bitsPerSample); int i = 0; int s = 0; while (i < blen) { long temp = unpackBits(bytes, i, isBigEndian, bytesPerSample); float sample = 0f; if (encoding == Encoding.PCM_SIGNED) { temp = extendSign(temp, bitsPerSample); sample = (float) (temp / fullScale); } else if (encoding == Encoding.PCM_UNSIGNED) { temp = signUnsigned(temp, bitsPerSample); sample = (float) (temp / fullScale); } else if (encoding == Encoding.PCM_FLOAT) { if (bitsPerSample == 32) { sample = Float.intBitsToFloat((int) temp); } else if (bitsPerSample == 64) { sample = (float) Double.longBitsToDouble(temp); } } else if (encoding == Encoding.ULAW) { sample = bitsToMuLaw(temp); } else if (encoding == Encoding.ALAW) { sample = bitsToALaw(temp); } samples[s] = sample; i += bytesPerSample; s++; } return s; } /** * Converts: * <ul> * <li>from an audio sample array ({@code float[]}) * <li>to a byte array ({@code byte[]}). * </ul> * * @param samples an array of audio samples to encode. * @param bytes an array to fill up with bytes. * @param slen the return value of {@code unpack}. * @param fmt the destination {@code AudioFormat}. * * @return the number of valid bytes converted. * * @throws NullPointerException * if {@code samples}, {@code bytes} or {@code fmt} is {@code null} * @throws ArrayIndexOutOfBoundsException * if {@code(samples.length < slen)} * or {@code (bytes.length < slen * bytesPerSample(fmt.getSampleSizeInBits()))} */ public static int pack(float[] samples, byte[] bytes, int slen, AudioFormat fmt) { int bitsPerSample = fmt.getSampleSizeInBits(); int bytesPerSample = bytesPerSample(bitsPerSample); boolean isBigEndian = fmt.isBigEndian(); Encoding encoding = fmt.getEncoding(); double fullScale = fullScale(bitsPerSample); int i = 0; int s = 0; while (s < slen) { float sample = samples[s]; long temp = 0L; if (encoding == Encoding.PCM_SIGNED) { temp = (long) (sample * fullScale); } else if (encoding == Encoding.PCM_UNSIGNED) { temp = (long) (sample * fullScale); temp = unsignSigned(temp, bitsPerSample); } else if (encoding == Encoding.PCM_FLOAT) { if (bitsPerSample == 32) { temp = Float.floatToRawIntBits(sample); } else if (bitsPerSample == 64) { temp = Double.doubleToRawLongBits(sample); } } else if (encoding == Encoding.ULAW) { temp = muLawToBits(sample); } else if (encoding == Encoding.ALAW) { temp = aLawToBits(sample); } packBits(bytes, i, temp, isBigEndian, bytesPerSample); i += bytesPerSample; s++; } return i; } /** * Computes the block-aligned bytes per sample of the audio format, * with {@code (int) ceil(bitsPerSample / 8.0)}. * * This is generally equivalent to the optimization * {@code ((bitsPerSample + 7) >>> 3)}. (Except for * the invalid argument {@code bitsPerSample <= 0}.) * * Round towards the ceiling because formats that allow bit depths * in non-integral multiples of 8 typically pad up to the nearest * integral multiple of 8. So for example, a 31-bit AIFF file will * actually store 32-bit blocks. * * @param bitsPerSample the return value of {@code AudioFormat.getSampleSizeInBits}. * @return The block-aligned bytes per sample of the audio format. */ public static int bytesPerSample(int bitsPerSample) { return (int) ceil(bitsPerSample / 8.0); } /** * Computes the largest magnitude representable by the audio format, * with {@code pow(2.0, bitsPerSample - 1)}. * * For {@code bitsPerSample < 64}, this is generally equivalent to * the optimization {@code (1L << (bitsPerSample - 1L))}. (Except for * the invalid argument {@code bitsPerSample <= 0}.) * * The result is returned as a {@code double} because, in the case that * {@code bitsPerSample == 64}, a {@code long} would overflow. * * @param bitsPerSample the return value of {@code AudioFormat.getBitsPerSample}. * @return the largest magnitude representable by the audio format. */ public static double fullScale(int bitsPerSample) { return pow(2.0, bitsPerSample - 1); } private static long unpackBits(byte[] bytes, int i, boolean isBigEndian, int bytesPerSample) { switch (bytesPerSample) { case 1: return unpack8Bit(bytes, i); case 2: return unpack16Bit(bytes, i, isBigEndian); case 3: return unpack24Bit(bytes, i, isBigEndian); default: return unpackAnyBit(bytes, i, isBigEndian, bytesPerSample); } } private static long unpack8Bit(byte[] bytes, int i) { return bytes[i] & 0xffL; } private static long unpack16Bit(byte[] bytes, int i, boolean isBigEndian) { if (isBigEndian) { return ( ((bytes[i ] & 0xffL) << 8L) | (bytes[i + 1] & 0xffL) ); } else { return ( (bytes[i ] & 0xffL) | ((bytes[i + 1] & 0xffL) << 8L) ); } } private static long unpack24Bit(byte[] bytes, int i, boolean isBigEndian) { if (isBigEndian) { return ( ((bytes[i ] & 0xffL) << 16L) | ((bytes[i + 1] & 0xffL) << 8L) | (bytes[i + 2] & 0xffL) ); } else { return ( (bytes[i ] & 0xffL) | ((bytes[i + 1] & 0xffL) << 8L) | ((bytes[i + 2] & 0xffL) << 16L) ); } } private static long unpackAnyBit(byte[] bytes, int i, boolean isBigEndian, int bytesPerSample) { long temp = 0L; if (isBigEndian) { for (int b = 0; b < bytesPerSample; b++) { temp |= (bytes[i + b] & 0xffL) << ( 8L * (bytesPerSample - b - 1L) ); } } else { for (int b = 0; b < bytesPerSample; b++) { temp |= (bytes[i + b] & 0xffL) << (8L * b); } } return temp; } private static void packBits(byte[] bytes, int i, long temp, boolean isBigEndian, int bytesPerSample) { switch (bytesPerSample) { case 1: pack8Bit(bytes, i, temp); break; case 2: pack16Bit(bytes, i, temp, isBigEndian); break; case 3: pack24Bit(bytes, i, temp, isBigEndian); break; default: packAnyBit(bytes, i, temp, isBigEndian, bytesPerSample); break; } } private static void pack8Bit(byte[] bytes, int i, long temp) { bytes[i] = (byte) (temp & 0xffL); } private static void pack16Bit(byte[] bytes, int i, long temp, boolean isBigEndian) { if (isBigEndian) { bytes[i ] = (byte) ((temp >>> 8L) & 0xffL); bytes[i + 1] = (byte) ( temp & 0xffL); } else { bytes[i ] = (byte) ( temp & 0xffL); bytes[i + 1] = (byte) ((temp >>> 8L) & 0xffL); } } private static void pack24Bit(byte[] bytes, int i, long temp, boolean isBigEndian) { if (isBigEndian) { bytes[i ] = (byte) ((temp >>> 16L) & 0xffL); bytes[i + 1] = (byte) ((temp >>> 8L) & 0xffL); bytes[i + 2] = (byte) ( temp & 0xffL); } else { bytes[i ] = (byte) ( temp & 0xffL); bytes[i + 1] = (byte) ((temp >>> 8L) & 0xffL); bytes[i + 2] = (byte) ((temp >>> 16L) & 0xffL); } } private static void packAnyBit(byte[] bytes, int i, long temp, boolean isBigEndian, int bytesPerSample) { if (isBigEndian) { for (int b = 0; b < bytesPerSample; b++) { bytes[i + b] = (byte) ( (temp >>> (8L * (bytesPerSample - b - 1L))) & 0xffL ); } } else { for (int b = 0; b < bytesPerSample; b++) { bytes[i + b] = (byte) ((temp >>> (8L * b)) & 0xffL); } } } private static long extendSign(long temp, int bitsPerSample) { int extensionBits = 64 - bitsPerSample; return (temp << extensionBits) >> extensionBits; } private static long signUnsigned(long temp, int bitsPerSample) { return temp - (long) fullScale(bitsPerSample); } private static long unsignSigned(long temp, int bitsPerSample) { return temp + (long) fullScale(bitsPerSample); } // mu-law constant private static final double MU = 255.0; // A-law constant private static final double A = 87.7; // reciprocal of A private static final double RE_A = 1.0 / A; // natural logarithm of A private static final double LN_A = log(A); // if values are below this, the A-law exponent is 0 private static final double EXP_0 = 1.0 / (1.0 + LN_A); private static float bitsToMuLaw(long temp) { temp ^= 0xffL; if ((temp & 0x80L) == 0x80L) { temp = -(temp ^ 0x80L); } float sample = (float) (temp / fullScale(8)); return (float) ( signum(sample) * (1.0 / MU) * (pow(1.0 + MU, abs(sample)) - 1.0) ); } private static long muLawToBits(float sample) { double sign = signum(sample); sample = abs(sample); sample = (float) ( sign * (log(1.0 + (MU * sample)) / log(1.0 + MU)) ); long temp = (long) (sample * fullScale(8)); if (temp < 0L) { temp = -temp ^ 0x80L; } return temp ^ 0xffL; } private static float bitsToALaw(long temp) { temp ^= 0x55L; if ((temp & 0x80L) == 0x80L) { temp = -(temp ^ 0x80L); } float sample = (float) (temp / fullScale(8)); float sign = signum(sample); sample = abs(sample); if (sample < EXP_0) { sample = (float) (sample * ((1.0 + LN_A) / A)); } else { sample = (float) (exp((sample * (1.0 + LN_A)) - 1.0) / A); } return sign * sample; } private static long aLawToBits(float sample) { double sign = signum(sample); sample = abs(sample); if (sample < RE_A) { sample = (float) ((A * sample) / (1.0 + LN_A)); } else { sample = (float) ((1.0 + log(A * sample)) / (1.0 + LN_A)); } sample *= sign; long temp = (long) (sample * fullScale(8)); if (temp < 0L) { temp = -temp ^ 0x80L; } return temp ^ 0x55L; } }

reproducir - ¿Cómo uso los datos de muestra de audio de Java Sound?

reproducir sonido jframe (2)

Un poco sobre audio digital

Algunas suposiciones

¿Cómo forzar la matriz de bytes a datos significativos?

¿Cómo decodizo Encoding.PCM_SIGNED ?

¿Cómo decodifico Encoding.PCM_UNSIGNED ?

¿Cómo decodizo Encoding.PCM_FLOAT ?

¿Cómo decodifico Encoding.ULAW y Encoding.ALAW ?

¿Cómo decodizo `Encoding.PCM_SIGNED` ?

¿Cómo decodifico `Encoding.PCM_UNSIGNED` ?

¿Cómo decodizo `Encoding.PCM_FLOAT` ?

¿Cómo decodifico `Encoding.ULAW` y `Encoding.ALAW` ?