-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Open
Labels
Description
OpenVINO Version
Latest, built from source in OpenVINO Model Server
Operating System
Other (Please specify in description)
Device used for inference
HETERO
Framework
None
Model used
Qwen/Qwen3-32B, converted to OV format using export_model.py.
Issue description
When attempting to load a converted version of Qwen/Qwen3-32B (converted to INT4, with u8 kv cache precision and a cache size of 6GB) in OpenVINO model server, the entire instance crashes. This issue is not unique to model server, as this same problem also happens on a seperate project, OpenArc, built on top of the OpenVINO backend. This issue was not present some months ago and this EXACT model with the same config (converted with the same command) worked perfectly. The system is an 11600KF with 3 x Intel Arc A770's running on Ubuntu Server 24.04.3
Step-by-step reproduction
- Install client GPU drivers on a fresh installation of Ubuntu Server 24.04.3 as per documentation.
- Build OVMS from source.
- Use the export_model.py script to convert the Qwen/Qwen3-32B model with the following command:
python export_model.py text_generation --source_model Qwen/Qwen3-32B --config_file_path /media/models/config.json --weight-format int4 --overwrite_models --target_device HETERO:GPU.0,GPU.1 --extra_quantization_params "--awq --group-size 128" --kv_cache_precision u8 --cache_size 6 --model_repository_path /media/models --reasoning_parser qwen3 - Load OVMS as per instructions.
Relevant log output
[2025-11-25 01:58:36.163][45639][serving][error][servable_initializer.cpp:214] Error during llm node initialization for models_path: /media/models/Qwen/Qwen3-32B/./ exception: Exception from src/inference/src/cpp/core.cpp:114:
Exception from src/inference/src/dev/plugin.cpp:53:
Exception from src/plugins/hetero/src/compiled_model.cpp:36:
Standard exception from compilation library: Exception from src/inference/src/dev/plugin.cpp:53:
Check 'false' failed at src/plugins/intel_gpu/src/plugin/program_builder.cpp:163:
[GPU] ProgramBuilder build failed!
Exception from src/plugins/intel_gpu/src/runtime/ocl/ocl_common.hpp:40:
[GPU] clEnqueueNDRangeKernel, error code: -52 CL_INVALID_KERNEL_ARGSIssue submission checklist
- I'm reporting an issue. It's not a question.
- I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
- There is reproducer code and related data files such as images, videos, models, etc.
savvadesogle and SearchSaviorsavvadesogle and SearchSaviorsavvadesogle and SearchSavior