Skip to content

Commit

Permalink
* update audio doc
Browse files Browse the repository at this point in the history
  • Loading branch information
lxowalle committed Jan 24, 2025
1 parent 49f0381 commit e668ae3
Show file tree
Hide file tree
Showing 11 changed files with 177 additions and 175 deletions.
15 changes: 1 addition & 14 deletions docs/doc/en/audio/play.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,17 +118,4 @@ Steps:
- `p.play(bytes(ctx))` plays the audio, `p` is the opened player object, `ctx` is the `PCM` data converted to type bytes
- `time.sleep_ms(10)` Here there is a loop to wait for the playback to complete, as the playback operation is performed asynchronously, and if the program exits early, then it may result in the audio not being played completely.

4. Done

### Other

The `Player` and `Recorder` modules have some `bugs` to be worked out, make sure they are created before other modules (`Camera` module, `Display` module, etc.). For example:

```python
# Create Player and Recorder first.
p = audio.Player()
r = audio.Recorder()

# Then create the Camera
c = camera.Camera()
```
4. Done
115 changes: 62 additions & 53 deletions docs/doc/en/audio/record.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,34 +5,38 @@ update:
author: lxowalle
version: 1.0.0
content: Initial document
- date: 2025-01-24
author: lxowalle
version: 1.0.1
content:
- Update the usage instructions for the audio module.
---

## Introduction

This document provides the usage of audio recording and supports recording audio in `PCM` and `WAV` formats.
This document provides instructions on how to use the recording feature, supporting the recording of audio in both `PCM` and `WAV` formats.

`PCM (Pulse Code Modulation)` is a digital audio encoding format used to convert analog audio signals into digital signals. It is also the commonly required format for general hardware processing.

`WAV (Waveform Audio File Format)` is a popular audio file format. It is typically used to store uncompressed `PCM` audio data but also supports other encoding formats.

The `MaixCAM` has a microphone on board, so you can use the recording function directly.
The MaixCAM board comes with a built-in microphone, so you can directly use the recording feature.

### How to use

#### Getting `PCM` data
#### Record an Audio File in `PCM`/`WAV` Format

If you don't pass `path` when constructing a `Recorder` object, it will only record audio and not save it to a file, but you can save it to a file manually.

```python
from maix import audio, time, app
from maix import audio

r = audio.Recorder()
r.volume(12)
print("sample_rate:{} format:{} channel:{}".format(r.sample_rate(), r.format(), r.channel()))

while not app.need_exit():
data = r.record()
print("data size", len(data))
r = audio.Recorder("/root/test.wav")
r.volume(100)
print(f"channel: {r.channel()}")
print(f"sample rate: {r.sample_rate()}")

time.sleep_ms(10)

print("record finish!")
r.record(3000)
```

Steps:
Expand All @@ -46,73 +50,78 @@ Steps:
2. Initialize Recorder

```python
r = audio.Recorder()
r.volume(12)
r = audio.Recorder("/root/test.wav")
r.volume(100)
```

- Note that the default sample rate is 48k, the sample format is little-endian format - signed 16-bit, and the sample channel is 1. You can also customise the parameters like this `r = audio.Recorder(sample_rate=48000, format=audio.Format.FMT_S16_LE, channel = 1)`. So far only tested with sample rate 48000, format `FMT_S16_LE`, and number of sampling channels 1.
- Note that the default sample rate is 48k, the sample format is little-endian format - signed 16-bit, and the sample channel is 1. You can also customise the parameters like this `r = audio.Recorder(sample_rate=48000, format=audio.Format.FMT_S16_LE, channel = 1)`. So far only tested with sample rate 16000 and 48000, format `FMT_S16_LE`, and number of sampling channels 1.

- `r.volume(12)` is used to set the volume, the volume range is [0,100]
- `r.volume(100)` is used to set the volume, the volume range is [0,100]

3. Start recording

```python
data = r.record()
r.record(3000)
```

- `data` is `bytes` type data in `PCM` format that holds the currently recorded audio. The `PCM` format is set when initialising the `Recorder` object, see step 2. Note that if the recording is too fast and there is no data in the audio buffer, it is possible to return an empty `bytes` of data.
- Record audio for 3000 milliseconds.

- This function will block until the recording is complete.

4. Done, you can do voice processing on the `PCM` data returned by `r.record()` when doing your own applications.
4. Done

#### Records audio and saves it in `WAV` format.
#### Record an Audio File in `PCM`/`WAV` Format (Non-blocking)

If you pass `path` when constructing a `Recorder` object, the recorded audio will be saved to a `path` file, and you can also get the currently recorded `PCM` data via the `record` method. `path` only supports paths with `.pcm` and `.wav` suffixes, and the `record` method does not return `WAV` headers when recording `.wav`, it only returns `PCM` data.
When developing applications, if you need to record audio but do not want the recording function to occupy time for other applications, you can enable non-blocking mode.

```python
from maix import audio, time, app
from maix import audio, app, time

r = audio.Recorder("/root/output.wav")
r.volume(12)
print("sample_rate:{} format:{} channel:{}".format(r.sample_rate(), r.format(), r.channel()))
r = audio.Recorder("/root/test.wav", block=False)
r.volume(100)
r.reset(True)

while not app.need_exit():
data = r.record()
print("data size", len(data))
data = r.record(50)
// Your application
time.sleep_ms(50)

time.sleep_ms(10)

print("record finish!")
print("finish!")
```

The code means basically the same as above.
**Notes:**

#### Record audio and save to `WAV` format (blocking)
1. In non-blocking recording, you need to use the `reset(True)` function to enable the audio stream and the `reset(False)` function to stop the audio stream.

If the `record_ms` parameter is set during recording, recording audio will block until the time set by `record_ms` is reached, unit: ms.
2. The length of the audio data returned by `record` may not match the input time. For example, if you request to record `50ms` of audio but only `20ms` of data is ready in the audio buffer, then `record(50)` will only return `20ms` of audio data.

```python
from maix import audio, time, app
3. If you want the audio data returned by `record()` to match the input parameter, you can wait until the buffer has enough audio data before reading.

r = audio.Recorder("/root/output.wav")
r.volume(12)
print("sample_rate:{} format:{} channel:{}".format(r.sample_rate(), r.format(), r.channel()))

r.record(5000)

print("record finish!")
```
```python
remaining_frames = r.get_remaining_frames()
need_frames = 50 * r.sample_rate() / 1000
if remaining_frames > need_frames:
data = r.record(50)
```

The above example will keep recording `5000`ms and save it to `WAV` format, during the recording period it will block in `record` method, note that `PCM` data will not be returned when `record` is set to `record_ms`.
Use the `get_remaining_frames()` function to get the number of remaining frames in the receive buffer. Note that this returns the number of frames, not bytes. Use `sample_rate()` to get the audio sample rate and calculate the actual number of frames to read.

### Other
#### Obtain Real-time `PCM` Audio Stream

The `Player` and `Recorder` modules have some `bugs` to be worked out, make sure they are created before other modules (`Camera` module, `Display` module, etc.). For example:
When developing applications that need to process audio data, you may not need to save files but only require the raw `PCM` stream. To achieve this, simply do not provide a path when creating the `Recorder`. Of course, you can also enable non-blocking mode.

```python
# Create Player and Recorder first.
p = audio.Player()
r = audio.Recorder()
from maix import audio, app, time

r = audio.Recorder(block=False)
r.volume(100)
r.reset(True)

# Then create the Camera
c = camera.Camera()
while not app.need_exit():
data = r.record(50)
print(f'record {len(data)} bytes')
// Your application
time.sleep_ms(50)
```

The code logic is essentially the same as above.
15 changes: 0 additions & 15 deletions docs/doc/zh/audio/play.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,18 +122,3 @@ print("play finish!")
- `time.sleep_ms(10)`这里有一个循环来等待播放完成,因为播放操作是异步执行的,如果提前退出了程序,那么可能导致音频不会完全播放。

4. 完成



### 其他

`Player``Recorder`模块有些`bug`待解决,请保证它们在其他模块(`Camera`模块,`Display`模块等)之前创建。例如:

```python
# 先创建Player和Recorder
p = audio.Player()
r = audio.Recorder()

# 再创建Camera
c = camera.Camera()
```
123 changes: 65 additions & 58 deletions docs/doc/zh/audio/record.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,118 +4,125 @@ update:
- date: 2024-05-20
author: lxowalle
version: 1.0.0
content: 初版文档
content:
- 初版文档
- date: 2025-01-24
author: lxowalle
version: 1.0.1
content:
- 更新audio模块的使用方法
---

## 简介

本文档提供录音的使用方法,支持录入`PCM``WAV`格式的音频。

- `PCM(Pulse Code Modulation)` 是一种数字音频编码格式,用于将模拟音频信号转换为数字信号,也是一般需要硬件处理所需的常用格式
- `WAV(Waveform Audio File Format)`是一种常见的音频文件格式。它通常用于存储未压缩的` PCM `音频数据,但也支持其他编码格式。

`MaixCAM`板载了麦克风,所以你可以直接使用录音功能。

### 使用方法

#### 获取`PCM`数据
#### 录制一个`PCM`/`WAV`格式的音频文件

当构造`Recorder`对象时不传入`path`则只会录入音频后不会保存到文件中,当然你可以手动保存到文件
在创建`Recorder`对象时传入了`path`则录入的音频将会保存到`path`文件中,你也可以通过`record`方法获取当前录入的`PCM`数据。`path`只支持`.pcm``.wav`后缀的路径。当录入`.wav`时,`record`方法不会返回`WAV`头部信息,只会返回`PCM`数据

```python
from maix import audio, time, app

r = audio.Recorder()
r.volume(12)
print("sample_rate:{} format:{} channel:{}".format(r.sample_rate(), r.format(), r.channel()))
from maix import audio

while not app.need_exit():
data = r.record()
print("data size", len(data))

time.sleep_ms(10)
r = audio.Recorder("/root/test.wav")
r.volume(100)
print(f"channel: {r.channel()}")
print(f"sample rate: {r.sample_rate()}")

print("record finish!")
r.record(3000)
```

步骤:

1. 导入audio、time和app模块
1. 导入audio模块

```python
from maix import audio, time, app
from maix import audio
```

2. 初始化录制器

```python
r = audio.Recorder()
r.volume(12)
r = audio.Recorder("/root/test.wav")
r.volume(100)
```

- 注意默认的采样率是48k,采样格式为小端格式-有符号16位,采样通道为1。你也可以像这样自定义参数`p = audio.Recorder(sample_rate=48000, format=audio.Format.FMT_S16_LE, channel = 1)`。目前只测试过采样率48000,`FMT_S16_LE`格式,和采样通道数为1
- 音频文件会保存到`/root/test.wav`

- 注意默认的采样率是48k,采样格式为小端格式-有符号16位,采样通道为1。你也可以像这样自定义参数`p = audio.Recorder(sample_rate=48000, format=audio.Format.FMT_S16_LE, channel = 1)`。目前只测试过采样率16000和48000,`FMT_S16_LE`格式,和采样通道数为1

- `r.volume(12)`用来设置音量,音量范围为[0,100]
- `r.volume(100)`用来设置音量,音量范围为[0,100]

3. 开始录制

```python
data = r.record()
r.record(3000)
```

- `data``PCM`格式的`bytes`类型数据,保存了当前录入的音频。`PCM`格式在初始化`Recorder`对象时设置,见步骤2。注意如果录制太快,音频缓冲区没有数据, 则有可能返回一个空的`bytes`数据。
- 录制`3000`毫秒的音频
- 该函数会阻塞直到录入完成

4. 完成,做自己的应用时可以对`r.record()`返回的`PCM`数据做语音处理。
4. 完成

#### 录制一个`PCM`/`WAV`格式的音频文件(非阻塞)


#### 录制音频并保存为`WAV`格式

当构造`Recorder`对象时传入了`path`, 则录入的音频将会保存到`path`文件中,并且你也可以通过`record`方法获取当前录入的`PCM`数据。`path`只支持`.pcm``.wav`后缀的路径,并且当录入`.wav`时,`record`方法不会返回`WAV`头部信息,只会返回`PCM`数据。
开发应用是如果需要录音,但又不希望录音函数占用其他应用的时间,则可以开启非阻塞模式

```python
from maix import audio, time, app
from maix import audio, app, time

r = audio.Recorder("/root/output.wav")
r.volume(12)
print("sample_rate:{} format:{} channel:{}".format(r.sample_rate(), r.format(), r.channel()))
r = audio.Recorder("/root/test.wav", block=False)
r.volume(100)
r.reset(True)

while not app.need_exit():
data = r.record()
print("data size", len(data))

time.sleep_ms(10)
data = r.record(50)
// Your application
time.sleep_ms(50)

print("record finish!")
print("finish!")
```

代码含义基本同上。

注意:

1. 非阻塞录制时,需要使用`reset(True)`函数来启用音频流,使用`reset(False)`函数来停止音频流

#### 录制音频并保存为`WAV`格式(阻塞)
2. `record`返回的音频数据长度不一定与传入的时间对等,比如假设录制`50ms`音频,但此时音频缓冲区只有`20ms`的数据已经准备好了,那么`record(50)`只会返回`20ms`的音频数据

录入时如果设置了`record_ms`参数,录入音频会阻塞直到到达`record_ms`设置的时间,单位ms。
3. 如果希望record()返回的音频数据与传入参数相等,则可以让等待缓存区准备了足够的音频数据后再读取

```python
from maix import audio, time, app

r = audio.Recorder("/root/output.wav")
r.volume(12)
print("sample_rate:{} format:{} channel:{}".format(r.sample_rate(), r.format(), r.channel()))

r.record(5000)
print("record finish!")
```
```python
remaining_frames = r.get_remaining_frames()
need_frames = 50 * r.sample_rate() / 1000
if remaining_frames > need_frames:
data = r.record(50)
```

上面示例将会持续录入`5000`ms,并保存为`WAV`格式,录入期间将会阻塞在`record`方法中,注意当`record`设置了`record_ms`后不会返回`PCM`数据
使用`get_remaining_frames()`函数获取接收缓冲区剩余的帧数,注意返回的是帧数,不是字节数。通过`sample_rate()`获取音频采样率,并计算实际要读取的帧数

### 其他
#### 获取实时`PCM`音频流

`Player``Recorder`模块有些`bug`待解决,请保证它们在其他模块(`Camera`模块,`Display`模块等)之前创建。例如:
开发需要处理音频数据的应用时,不需要保存文件,只需要获取`PCM`裸流的场景。要实现这个功能,只需要在创建`Recorder`时不传入路径即可。当然你也可以开启非阻塞模式。

```python
# 先创建Player和Recorder
p = audio.Player()
r = audio.Recorder()
from maix import audio, app, time

r = audio.Recorder(block=False)
r.volume(100)
r.reset(True)

# 再创建Camera
c = camera.Camera()
while not app.need_exit():
data = r.record(50)
print(f'record {len(data)} bytes')
// Your application
time.sleep_ms(50)
```

代码含义基本同上。
Loading

0 comments on commit e668ae3

Please sign in to comment.