Skip to content

Commit

Permalink
update-2024-07-29_00:12:47
Browse files Browse the repository at this point in the history
  • Loading branch information
liguodongiot committed Jul 28, 2024
1 parent da75365 commit 31c75e9
Show file tree
Hide file tree
Showing 34 changed files with 365 additions and 2,778 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -196,10 +196,10 @@
### LLM推理优化技术

- [LLM推理优化技术概述]()
- [大模型推理优化技术-KV Cache](https://www.zhihu.com/question/653658936/answer/3569365986)
- Continuous Batching
- FlashAttention
- PagedAttention
- Continuous Batching
- [大模型推理优化技术-KV Cache](https://www.zhihu.com/question/653658936/answer/3569365986)
- Flash Decoding
- FlashDecoding++

Expand Down Expand Up @@ -228,7 +228,7 @@
- [大模型量化技术原理:AWQ、AutoAWQ](https://zhuanlan.zhihu.com/p/681578090)
- [大模型量化技术原理:SpQR](https://zhuanlan.zhihu.com/p/682871823)
- [大模型量化技术原理:ZeroQuant系列](https://zhuanlan.zhihu.com/p/683813769)
- [大模型量化技术原理:FP8]()
- [大模型量化技术原理:FP8](https://juejin.cn/post/7392071348480917515)
- [大模型量化技术原理:FP6]()
- [大模型量化技术原理:FP4]()
- [大模型量化技术原理:总结]()
Expand Down
7 changes: 7 additions & 0 deletions ai-infra/ai-hardware/CUDA.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,10 @@ CUDA CURAND库:这是CUDA的随机数库,用于生成各种分布的随机
- https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html
- https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#id4
CUDA Toolkit and Corresponding Driver Versions




- CUDA 编程手册: https://github.com/HeKun-NVIDIA/CUDA-Programming-Guide-in-Chinese


1 change: 0 additions & 1 deletion ai-infra/ai-hardware/cuda镜像.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@ https://gitlab.com/nvidia/container-images/cuda/-/blob/master/dist/12.1.1/centos
nvcr.io/nvidia/cuda:12.1.0-cudnn8-runtime-centos7
```


Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
6 changes: 6 additions & 0 deletions ai-infra/算力/GPU工作原理.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@



- [GPU 工作原理解析](https://zhuanlan.zhihu.com/p/697694330)
- [GPU 架构与 CUDA 关系](https://zhuanlan.zhihu.com/p/697746975)

Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@





Nvidia下游市场分为四类:游戏、专业可视化、数据中心、汽车,各市场重点产品如下:

游戏:GeForce RTX/GTX系列GPU(PCs)、GeForce NOW(云游戏)、SHIELD(游戏主机);
Expand Down
File renamed without changes.
File renamed without changes.
5 changes: 5 additions & 0 deletions docs/llm-base/scenes/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,14 @@
关系抽取(Relation Extraction):从文本中抽取出实体之间的关系或联系。
信息抽取(Information Extraction):从非结构化文本中提取结构化的信息,如实体、关系和属性等。
句子相似度(Sentence Similarity):衡量两个句子之间的语义相似度或相关性。
文本翻译(Translation):将一种语言的文本转换为另一种语言的过程。
自然语言推理(NLI:Natural Language Inference):判断给定的前提和假设之间的逻辑关系,包括蕴含、矛盾和中立等。
情感分类(Sentiment Classification):将文本分为积极、消极或中性等情感类别。
人像抠图(Portrait Matting):从图像中准确地分离人物主体与背景。
通用抠图(Universal Matting):从图像中准确地分离目标物体与背景,不限于人像。
人体检测(Human Detection):检测图像或视频中的人体位置。
Expand Down
41 changes: 41 additions & 0 deletions llm-application/应用场景.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@




文生图:
- Stable Diffusion
- 文心一格:https://yige.baidu.com/creation?mode=0

图生文:
- Blip2






数字人
- 百度智能云曦灵数字人:https://xiling.cloud.baidu.com/main/plaza/portrait



AI教研平台


音乐生成模型: Suno V3 Alpha

弊端就是Suno最多只能生成2分钟的音乐,所以可以听到最后,会戛然而止直接截断,但是已经比V2好很多了。

但是这个音质、咬字、节奏编排啥的,也都好太多太多了。

https://app.suno.ai/

要生成音乐的话,第一步肯定是写prompt,第二步(纯音乐没有)就是写歌词。








25 changes: 25 additions & 0 deletions llm-localization/ascend/FAQ.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@





docker: Error response from daemon: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/579418211a825ef5c7fcf5becdbe90804f0ed7862d9c59663995f9dd463937b4/log.json: no such file or directory): /usr/local/Ascend/Ascend-Docker-Runtime/ascend-docker-runtime did not terminate successfully: exit status 1: 2024/07/24 09:59:29 owner not right /usr/bin/runc 1000




错误信息表明/usr/bin/runc这个文件的所有权不正确,即它不是由root用户拥有或者它的所属用户不是1000。Docker在创建并运行容器时需要runc这个二进制文件,如果权限设置不当,Docker将无法正确执行。


解决办法:


查看权限

ls -lah /usr/bin/runc


修改权限

sudo chown root:root /usr/bin/runc

17 changes: 17 additions & 0 deletions llm-localization/ascend/ascend-docker-runtime.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@



昇腾docker runtime仓库,在docker容器场景下,使用昇腾NPU,提供更简单的设备和依赖路径挂载方法。


https://gitee.com/ascend/ascend-docker-runtime



安装:https://www.hiascend.com/document/detail/zh/mindx-dl/300/dluserguide/clusterscheduling/dlug_installation_02_000025.html


Ascend Docker Runtime组件参考信息说明:

https://www.hiascend.com/document/detail/zh/mindx-dl/300/dluserguide/clusterscheduling/dlug_installation_02_000036.html

1 change: 1 addition & 0 deletions llm-localization/ascend/mindie/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,7 @@ docker save -o mindie-1.0.tar ascendhub.huawei.com/public-ascendhub/mindie:1.0.R
scp [email protected]:/root/mindie-1.0.tar .
# 断点续传
rsync -P --rsh=ssh -r [email protected]:/root/mindie-1.0.tar .
```
Expand Down
69 changes: 69 additions & 0 deletions llm-localization/ascend/mindie/config-1.0.RC1.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
{
"OtherParam":
{
"ResourceParam" :
{
"cacheBlockSize" : 128,
"preAllocBlocks" : 8
},
"LogParam" :
{
"logLevel" : "Info",
"logPath" : "/logs/mindservice.log"
},
"ServeParam" :
{
"ipAddress" : "0.0.0.0",
"port" : 1025,
"maxLinkNum" : 300,
"httpsEnabled" : false,
"tlsCaPath" : "security/ca/",
"tlsCaFile" : ["ca.pem"],
"tlsCert" : "security/certs/server.pem",
"tlsPk" : "security/keys/server.key.pem",
"tlsPkPwd" : "security/pass/mindie_server_key_pwd.txt",
"kmcKsfMaster" : "tools/pmt/master/ksfa",
"kmcKsfStandby" : "tools/pmt/standby/ksfb",
"tlsCrl" : "security/certs/server_crl.pem"
}
},
"WorkFlowParam":
{
"TemplateParam" :
{
"templateType": "Standard",
"templateName" : "Standard_llama",
"pipelineNumber" : 1
}
},
"ModelDeployParam":
{
"maxSeqLen" : 2560,
"npuDeviceIds" : [[$npuids]],
"ModelParam" : [
{
"modelInstanceType": "Standard",
"modelName" : "$model_name",
"modelWeightPath" : "$model_weight_path",
"worldSize" : $world_size,
"cpuMemSize" : 5,
"npuMemSize" : $npu_mem_size,
"backendType": "atb"
}
]
},
"ScheduleParam":
{
"maxPrefillBatchSize" : 192,
"maxPrefillTokens" : 12000,
"prefillTimeMsPerReq" : 150,
"prefillPolicyType" : 0,
"decodeTimeMsPerReq" : 50,
"decodePolicyType" : 0,
"maxBatchSize" : 256,
"maxIterTimes" : 1024,
"maxPreemptCount" : 200,
"supportSelectBatch" : true,
"maxQueueDelayMicroseconds" : 5000
}
}
142 changes: 142 additions & 0 deletions llm-localization/ascend/mindie/llm-server.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
#!/bin/bash

echo "入参:" $@

for a in "$@"; do
#echo $a
if [[ `echo $a | grep "^--model_name="` ]]; then
model_name=`echo $a | grep "^--model_name=" | awk -F '=' '{print $2}'`
fi
if [[ `echo $a | grep "^--model_weight_path="` ]]; then
model_weight_path=`echo $a | grep "^--model_weight_path=" | awk -F '=' '{print $2}'`
fi
if [[ `echo $a | grep "^--world_size="` ]]; then
world_size=`echo $a | grep "^--world_size=" | awk -F '=' '{print $2}'`
fi
if [[ `echo $a | grep "^--npu_mem_size="` ]]; then
npu_mem_size=`echo $a | grep "^--npu_mem_size=" | awk -F '=' '{print $2}'`
fi
done

if [ -z "$model_name" ]; then
model_name="default"
fi

if [ -z "$model_weight_path" ]; then
model_weight_path="/workspace/model"
fi

if [ -z "$world_size" ]; then
world_size=4
fi

if [ -z "$npu_mem_size" ]; then
npu_mem_size=8
fi

echo "平台入参: model_name: $model_name, model_weight_path: $model_weight_path , world_size: $world_size , npu_mem_size: $npu_mem_size"


npuids=""
card_num=$(($world_size - 1))
for i in `seq 0 $card_num`
do
if [[ $i == $card_num ]] ;
then
npuids=$npuids$i
else
npuids=$npuids$i","
fi
done


echo $npuids


# DEPLOYMENT_CONF_PATH="/home/guodong.li/workspace/config.json"

DEPLOYMENT_CONF_PATH="/usr/local/Ascend/mindie/latest/mindie-service/conf/config.json"

cat <<EOF > $DEPLOYMENT_CONF_PATH
{
"OtherParam":
{
"ResourceParam" :
{
"cacheBlockSize" : 128,
"preAllocBlocks" : 8
},
"LogParam" :
{
"logLevel" : "Info",
"logPath" : "/logs/mindservice.log"
},
"ServeParam" :
{
"ipAddress" : "0.0.0.0",
"port" : 1025,
"maxLinkNum" : 300,
"httpsEnabled" : false,
"tlsCaPath" : "security/ca/",
"tlsCaFile" : ["ca.pem"],
"tlsCert" : "security/certs/server.pem",
"tlsPk" : "security/keys/server.key.pem",
"tlsPkPwd" : "security/pass/mindie_server_key_pwd.txt",
"kmcKsfMaster" : "tools/pmt/master/ksfa",
"kmcKsfStandby" : "tools/pmt/standby/ksfb",
"tlsCrl" : "security/certs/server_crl.pem"
}
},
"WorkFlowParam":
{
"TemplateParam" :
{
"templateType": "Standard",
"templateName" : "Standard_llama",
"pipelineNumber" : 1
}
},
"ModelDeployParam":
{
"maxSeqLen" : 2560,
"npuDeviceIds" : [[$npuids]],
"ModelParam" : [
{
"modelInstanceType": "Standard",
"modelName" : "$model_name",
"modelWeightPath" : "$model_weight_path",
"worldSize" : $world_size,
"cpuMemSize" : 5,
"npuMemSize" : $npu_mem_size,
"backendType": "atb"
}
]
},
"ScheduleParam":
{
"maxPrefillBatchSize" : 256,
"maxPrefillTokens" : 8192,
"prefillTimeMsPerReq" : 150,
"prefillPolicyType" : 0,
"decodeTimeMsPerReq" : 50,
"decodePolicyType" : 0,
"maxBatchSize" : 256,
"maxIterTimes" : 1024,
"maxPreemptCount" : 200,
"supportSelectBatch" : true,
"maxQueueDelayMicroseconds" : 50000
}
}
EOF

echo "部署参数,$DEPLOYMENT_CONF_PATH"
cat $DEPLOYMENT_CONF_PATH

source /usr/local/Ascend/ascend-toolkit/set_env.sh
source /usr/local/Ascend/mindie/set_env.sh
source /usr/local/Ascend/llm_model/set_env.sh

export PYTHONPATH=/usr/local/Ascend/llm_model:$PYTHONPATH
cd /usr/local/Ascend/mindie/latest/mindie-service/bin

./mindieservice_daemon
4 changes: 2 additions & 2 deletions llm-localization/ascend/mindie/mindie-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ curl -H "Accept: application/json" -H "Content-type: application/json" -X POST -
"content": "如何养生?"
}
]
}' http://127.0.0.1:1025/v1/chat/completions
}' http://127.0.0.1:1125/v1/chat/completions
Expand Down Expand Up @@ -130,7 +130,7 @@ curl "http://127.0.0.1:1025/v1/chat/completions" \
----
curl "http://127.0.0.1:1025/v1/chat/completions" \
curl "http://127.0.0.1:1125/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen1.5-14b",
Expand Down
Loading

0 comments on commit 31c75e9

Please sign in to comment.