投稿

英伟达RTX 5090D显卡上机实测及使用教程

Name: %E8%8B%B1%E4%BC%9F%E8%BE%BERTX%205090D%E6%98%BE%E5%8D%A1%E4%B8%8A%E6%9C%BA%E5%AE%9E%E6%B5%8B%E5%8F%8A%E4%BD%BF%E7%94%A8%E6%95%99%E7%A8%8B
Uploaded: 2025-05-29T00:25:32.406Z

2206

2025-03-11 21:24:38

未经作者授权，禁止转载

正在缓冲...

安装说明 l 下载并安装最新版本驱动: https://cn.download.nvidia.com/XFree86/Linux-x86_64/570.124.04/NVIDIA-Linux-x86_64-570.124.04.run l 使用pytorch nightly build pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128 由于flash attention还不兼容blackwell芯片，所以必须在代码里将其禁用： model.config.use_flash_attention = False 失去flash attention的支持后，对LLM的处理能力大打折扣，测试结果显示，一个14B的FP16模型，batch size为1时，只能达到24 token/s。好消息是，使用Pytorch可以顺利将两模型加载到两卡上坏消息是，由于NCCL库不支持5090D，所以卡间通信较慢， batch size为1时，约为 18 token/s,比单卡24 token/s还慢。为了释放出5090D的实力，还是得上推理框架，但目前为止，vllm/sglang还不兼容blackwell，所以只能使用NV自家的TensorRT-LLM来进行测试，下边是使用指南。 TensorRT-LLM测试过程 l 安装依赖包 apt-get update && apt-get -y install git git-lfs git lfs install l 下载项目源代码 git clone https://github.com/NVIDIA/TensorRT-LLM.git cd TensorRT-LLM git submodule update --init --recursive git lfs pull l 编译tensorrt-llm make -C docker release_build l 运行docker环境 make -C docker release_run l 在docker中安装依赖包 pip install -r requirements.txt l 模型格式转换 Single GPU: python convert_checkpoint.py --model_dir /mnt/models/DeepSeek-R1-Distill-Qwen-7B --output_dir /mnt/models/trt/DeepSeek-R1-Distill-Qwen-7B-Trt --dtype float16 trtllm-build --checkpoint_dir /mnt/models/trt/DeepSeek-R1-Distill-Qwen-7B-Trt --output_dir /code/tensorrt_llm/models/DeepSeek-R1-Distill-Qwen-7B-engine --gemm_plugin float16 Multiple GPU: python convert_checkpoint.py --model_dir /mnt/models/DeepSeek-R1-Distill-Qwen-14B --output_dir /mnt/models/trt/DeepSeek-R1-Distill-Qwen-14B-Trt --dtype float16 --tp_size 2 trtllm-build --checkpoint_dir /mnt/models/trt/DeepSeek-R1-Distill-Qwen-14B-Trt --output_dir /code/tensorrt_llm/models/DeepSeek-R1-Distill-Qwen-14B-engine --gemm_plugin float16 l 运行tensor-llm推理引擎使用内置的summarize.py脚本进行测试，指令如下： python3 ../summarize.py --test_trt_llm --hf_model_dir /mnt/models/DeepSeek-R1-Distill-Qwen-7B --data_type fp16 --max_input_length 2048 --output_len 2048 --engine_dir /code/tensorrt_llm/models/DeepSeek-R1-Distill-Qwen-7B-engine

生活记录