Skip to content

Conversation

@unidel2035
Copy link

@unidel2035 unidel2035 commented Nov 1, 2025

🎯 Summary

This PR fixes the AttributeError: 'dict' object has no attribute 'split' error that occurs during knowledge extraction, as reported in issue #714.

🐛 Problem

The knowledge unit extractor was failing when processing core_entities field that comes in dictionary format from the LLM, because the code only expected string format and attempted to call .split(",") on the value.

Root Cause

The LLM can return core_entities in two different formats depending on the language and prompt:

  1. String format (Chinese example): "核心实体": "火电发电量,同比增长率,2019年"
  2. Dict format (English example): "Core Entities": {"T.I.": "Person", "No Mediocre": "Culture and Entertainment"}

The code at kag/builder/component/extractor/knowledge_unit_extractor.py:587 only handled the string format:

for item in knowledge_value.get("core_entities", "").split(","):
    # This fails when core_entities is a dict!

Error Stack Trace

AttributeError: 'dict' object has no attribute 'split'
  File "/kag/builder/component/extractor/knowledge_unit_extractor.py", line 587, in assemble_knowledge_unit
    for item in knowledge_value.get("core_entities", "").split(","):

✅ Solution

Modified the assemble_knowledge_unit method in knowledge_unit_extractor.py to handle both formats gracefully:

core_entities_raw = knowledge_value.get("core_entities", "")

# Handle both string and dict formats for core_entities
if isinstance(core_entities_raw, dict):
    # Dict format: {entity_name: entity_type}
    core_entities = core_entities_raw
elif isinstance(core_entities_raw, str):
    # String format: comma-separated values
    for item in core_entities_raw.split(","):
        if not item.strip():
            continue
        core_entities[item.strip()] = "Others"
else:
    # Handle unexpected types gracefully with logging
    logger.warning(
        f"Unexpected type for core_entities: {type(core_entities_raw)}, "
        f"expected str or dict. Value: {core_entities_raw}"
    )

🧪 Testing

  1. Experiment Scripts: Created comprehensive test scripts in experiments/ directory to verify the fix handles all scenarios:

    • String format (Chinese)
    • Dict format (English)
    • Empty strings
    • Missing fields
    • Invalid types (with proper logging)
  2. Unit Tests: Added test_knowledge_unit_core_entities.py with comprehensive test coverage for all core_entities formats

  3. Code Quality: All changes pass flake8 validation

📝 Changes

  • Modified: kag/builder/component/extractor/knowledge_unit_extractor.py - Added type checking and handling for both dict and string formats
  • Added: tests/unit/builder/component/test_knowledge_unit_core_entities.py - Unit tests for the fix
  • Added: experiments/test_core_entities_handling.py - Experiment script demonstrating the issue and fix
  • Added: experiments/test_fix.py - Verification script for all scenarios

🔗 Related Issues

Fixes #714


🤖 Generated with Claude Code

unidel2035 and others added 2 commits November 1, 2025 16:12
Adding CLAUDE.md with task information for AI processing.
This file will be removed when the task is complete.

Issue: undefined
…n knowledge unit extractor

Fix AttributeError when LLM returns core_entities as dict instead of string.
The code now handles both formats:
- String format (Chinese): "entity1,entity2,entity3"
- Dict format (English): {"entity1": "Type1", "entity2": "Type2"}

This resolves the issue where knowledge extraction would fail with:
AttributeError: 'dict' object has no attribute 'split'

Fixes OpenSPG#714

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@unidel2035 unidel2035 changed the title [WIP] 知识抽取出现问题了,怎么解决 fix(builder): handle both dict and string formats for core_entities in knowledge unit extractor Nov 1, 2025
@unidel2035 unidel2035 marked this pull request as ready for review November 1, 2025 16:22
@unidel2035
Copy link
Author

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

📎 Log file uploaded as GitHub Gist (359KB)
🔗 View complete solution draft log


Now working session is ended, feel free to review and add any feedback on the solution draft.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

知识抽取出现问题了,怎么解决

1 participant