This repository contains a customized version of OpenCompass 0.4.2, tailored for model evaluation in our specific business scenarios.
This project provides a comprehensive framework for evaluating language models using OpenCompass, with custom configurations and datasets relevant to our business use cases. The evaluation environment is containerized using Docker to ensure consistency and reproducibility.
- Docker with GPU support
- Git
- NVIDIA GPU drivers
- Sufficient disk space for models and datasets
git clone [email protected]:YoctoHan/opencompass.git
cd opencompass
Navigate to the docker directory and build the evaluation environment:
cd docker
docker build -t aix-opencompass-eval-250524:latest .
Use the provided script to create and start a new container:
cd ../scripts
./launch_container.sh
This script will:
- Create a new container with GPU support
- Mount the parent directory as a workspace
- Set up the necessary network configurations
- Activate the pre-configured OpenCompass environment
Once inside the container, run the data preparation script:
./workspace/scripts/prepare_data.sh
This script will download and organize all required datasets for evaluation.
After data preparation is complete, you're ready to start model evaluations. The OpenCompass environment is pre-configured and activated by default.
Example evaluation command:
python run.py evaluations/eval_aixcoder_debug.py
opencompass/
├── docker/ # Docker configuration files
├── scripts/ # Utility scripts
├── configs/ # Evaluation configurations
├── data/ # Dataset storage (created after preparation)
├── outputs/ # Evaluation results
└── evaluations/ # Evaluation scripts
- Base Image: PyTorch 2.6.0 with CUDA 12.6 and cuDNN 9
- Python Version: 3.10
- OpenCompass Version: 0.4.2
- Pre-configured proxy settings for package installation
- For questions or issues related to this evaluation framework, please open an issue in the repository.
- This project follows the licensing terms of the original OpenCompass project. Please refer to the LICENSE file for more details.