Skip to content

adrianSRoman/LAM

Repository files navigation

Latent Acoustic Mapping for Direction of Arrival Estimation: A Self-Supervised Approach

arXiv Platform Python CC BY 4.0

LAM Architecture

Installation

See installation instructions.

Datasets

Dataset Format Type URL
EigenScape em32 real Link
STARSS23 mic & em32 real Link
LOCATA em32 real Link
SpatialScaper Simulated Audio mic & em32 synthetic Link

Generate dataset

See more details on how to generate the HDF dataset.

Training

Use train.py to train the model.

  • -h, display help information
  • -C, --config, specify the configuration file required for training
  • -R, --resume, continue training from the checkpoint of the last saved model

Please refer to the config files config/train/README to understand how to setup your training config.

Example:

# The configuration file used to train the model is "config/train/train.json"
python train.py -C config/train/train.json

# continue training from the last saved model checkpoint
python train.py -C config/train/train.json -R

Inference

Use infer.py to run inference with a pre-trained model.

  • -h, display help information
  • -D, --device, GPU index to be use (0 for single GPU / default)
  • -C, --config, Configuration for k-means inference (*.json).

Please refer to the config files config/infer/README to understand how to setup your inference config.

python infer.py -C /path/to/config/inference.json -D 0

Example:

python infer.py -C config/inference/inference.json -D 0

DoA Metrics from Infered K-means Output

python doa_metrics.py -C /path/to/config/inference.json

Sound Event Localization using LAM

Use LAM's spherical acoustic maps (SAMs) as features to a SELD network (DCASE-style). Please refer to the seld directory, where you can perform batch feature extraction of SAMS and then train a network to perform DOA on datasets like STARSS23 or LOCATA.

Visualization

# Run tensorboard pointing to your directory of logs generated during training
tensorboard --logdir train

# You can use --port to specify the port of the tensorboard static server
tensorboard --logdir train --port <port> --bind_all

Pre-trained Models

Model Input Checkpoint
UpLAM 4-channel UpLAM.pth
LAM 32-channel LAM.pth

Citation

If you find our work useful, please cite our paper:

@article{roman2025latent,
  title={Latent Acoustic Mapping for Direction of Arrival Estimation: A Self-Supervised Approach},
  author={Roman, Adrian S, Roman, Iran R and Bello, Juan P},
  journal={IEEE Workshop on Appplications of Signal Processing to Audio and Acoustics},
  year={2025}
}

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages