[Bug]: HETERO config is broken for loading larger models in any branch of OpenVINO

### OpenVINO Version

Latest, built from source in OpenVINO Model Server

### Operating System

Other (Please specify in description)

### Device used for inference

HETERO

### Framework

None

### Model used

Qwen/Qwen3-32B, converted to OV format using export_model.py.

### Issue description

When attempting to load a converted version of Qwen/Qwen3-32B (converted to INT4, with u8 kv cache precision and a cache size of 6GB) in OpenVINO model server, the entire instance crashes. This issue is not unique to model server, as this same problem also happens on a seperate project, OpenArc, built on top of the OpenVINO backend. This issue was not present some months ago and this EXACT model with the same config (converted with the same command) worked perfectly. The system is an 11600KF with 3 x Intel Arc A770's running on Ubuntu Server 24.04.3

### Step-by-step reproduction

1. Install client GPU drivers on a fresh installation of Ubuntu Server 24.04.3 as per documentation.
2. Build OVMS from source.
3. Use the export_model.py script to convert the Qwen/Qwen3-32B model with the following command:
`python export_model.py text_generation --source_model Qwen/Qwen3-32B --config_file_path /media/models/config.json --weight-format int4 --overwrite_models --target_device HETERO:GPU.0,GPU.1 --extra_quantization_params "--awq --group-size 128" --kv_cache_precision u8 --cache_size 6 --model_repository_path /media/models --reasoning_parser qwen3`
4. Load OVMS as per instructions.

### Relevant log output

```shell
[2025-11-25 01:58:36.163][45639][serving][error][servable_initializer.cpp:214] Error during llm node initialization for models_path: /media/models/Qwen/Qwen3-32B/./ exception: Exception from src/inference/src/cpp/core.cpp:114:
Exception from src/inference/src/dev/plugin.cpp:53:
Exception from src/plugins/hetero/src/compiled_model.cpp:36:
Standard exception from compilation library: Exception from src/inference/src/dev/plugin.cpp:53:
Check 'false' failed at src/plugins/intel_gpu/src/plugin/program_builder.cpp:163:
[GPU] ProgramBuilder build failed!
Exception from src/plugins/intel_gpu/src/runtime/ocl/ocl_common.hpp:40:
[GPU] clEnqueueNDRangeKernel, error code: -52 CL_INVALID_KERNEL_ARGS
```

### Issue submission checklist

- [x] I'm reporting an issue. It's not a question.
- [x] I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
- [x] There is reproducer code and related data files such as images, videos, models, etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: HETERO config is broken for loading larger models in any branch of OpenVINO #33012

OpenVINO Version

Operating System

Device used for inference

Framework

Model used

Issue description

Step-by-step reproduction

Relevant log output

Issue submission checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: HETERO config is broken for loading larger models in any branch of OpenVINO #33012

Description

OpenVINO Version

Operating System

Device used for inference

Framework

Model used

Issue description

Step-by-step reproduction

Relevant log output

Issue submission checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions