fix(builder): handle both dict and string formats for core_entities in knowledge unit extractor #717
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
🎯 Summary
This PR fixes the
AttributeError: 'dict' object has no attribute 'split'error that occurs during knowledge extraction, as reported in issue #714.🐛 Problem
The knowledge unit extractor was failing when processing
core_entitiesfield that comes in dictionary format from the LLM, because the code only expected string format and attempted to call.split(",")on the value.Root Cause
The LLM can return
core_entitiesin two different formats depending on the language and prompt:"核心实体": "火电发电量,同比增长率,2019年""Core Entities": {"T.I.": "Person", "No Mediocre": "Culture and Entertainment"}The code at
kag/builder/component/extractor/knowledge_unit_extractor.py:587only handled the string format:Error Stack Trace
✅ Solution
Modified the
assemble_knowledge_unitmethod inknowledge_unit_extractor.pyto handle both formats gracefully:🧪 Testing
Experiment Scripts: Created comprehensive test scripts in
experiments/directory to verify the fix handles all scenarios:Unit Tests: Added
test_knowledge_unit_core_entities.pywith comprehensive test coverage for all core_entities formatsCode Quality: All changes pass flake8 validation
📝 Changes
kag/builder/component/extractor/knowledge_unit_extractor.py- Added type checking and handling for both dict and string formatstests/unit/builder/component/test_knowledge_unit_core_entities.py- Unit tests for the fixexperiments/test_core_entities_handling.py- Experiment script demonstrating the issue and fixexperiments/test_fix.py- Verification script for all scenarios🔗 Related Issues
Fixes #714
🤖 Generated with Claude Code