swe-bench

Here are 6 public repositories matching this topic...

smallcloudai / refact

AI Agent that handles engineering tasks end-to-end: integrates with developers’ tools, plans, executes, and iterates until it achieves a successful result.

open-source enterprise vscode self-hosted developer-tools on-prem fine-tuning rag ai-agent swe-bench

Updated Dec 8, 2025
Rust

SE-Agent is a self-evolution framework for LLM Code agents. It enables trajectory-level evolution to exchange information across reasoning paths via Revision, Recombination, and Refinement, expanding the search space and escaping local optima. On SWE-bench Verified, it achieves SOTA performance

mcts code-fix swe-agent test-time-scaling claude-code code-agent swe-bench self-evolve

Updated Sep 23, 2025
Python

logic-star-ai / insights

Star

We track and analyze the activity and performance of autonomous code agents in the wild

agents swe-agent swe-bench

Updated Dec 5, 2025
TypeScript

usetig / sage

Star

An LLM council that reviews your coding agent's every move

Updated Dec 8, 2025
TypeScript

abhaymundhara / llm-benchmark-suite

Star

Benchmark suite for evaluating LLMs and SLMs on coding and SE tasks. Features HumanEval, MBPP, SWE-bench, and BigCodeBench with an interactive Streamlit UI. Supports cloud APIs (OpenAI, Anthropic, Google) and local models via Ollama. Tracks pass rates, latency, token usage, and costs.

python benchmark evaluation gemini openai code-generation claude streamlit humaneval llm ollama swe-bench mbpp bigcodebench

Updated Dec 3, 2025
Python

RanjanaRaghavan / swe-bench-evaluation

Star

This project explores how Large Language Models (LLMs) perform on real-world software engineering tasks, inspired by the SWE-Bench benchmark. Using locally hosted models like Llama 3 via Ollama, the tool evaluates code repair capabilities on Python repositories through custom test cases and a lightweight scoring framework.

generative-ai swe-bench

Updated Feb 17, 2025
TeX

Improve this page

Add a description, image, and links to the swe-bench topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the swe-bench topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

swe-bench

Here are 6 public repositories matching this topic...

smallcloudai / refact

JARVIS-Xs / SE-Agent

logic-star-ai / insights

usetig / sage

abhaymundhara / llm-benchmark-suite

RanjanaRaghavan / swe-bench-evaluation

Improve this page

Add this topic to your repo