Lately I've been reading a paper that really matches something I've been thinking about: AI does not necessarily need retraining to become more capable.
Instead, we may be able to extract workflows, logic, commands, rules, and guardrails from open-source code, then package them into reusable skills that agents can load when needed.
The Core Insight from the Research
This paper (arxiv.org/html/2603.11808v2) argues that instead of focusing entirely on scaling models, we should pay more attention to how we design skill systems. An agent with the right skills will outperform a larger model without the right skills.
This doesn't mean models don't matter - models are the foundation. But skills are what turn AI from something that can "talk" into something that can actually "do."
"Models are the foundation. Skills are what turn AI from something that can 'talk' into something that can actually 'do'."
Build a Skill System, Not Isolated Tools
That idea feels very aligned with where I think this is going. Instead of building isolated tools, it may be more valuable to build a reusable skill system. The real value is not just the tool itself, but:
- The workflow design behind it
- Prompt structure
- Command logic
- Guardrails and safety constraints
- Output control
These can be extracted, tested independently, and shared. No retraining. No fine-tuning. Just load the right skill.
Taking This Further
I've been thinking about pushing this further: building a system that reads entire GitHub repositories and other sources, extracts system prompts, commands, rules, guardrails, and workflow logic, then runs them inside a virtual environment or sandbox to test how they actually perform.
The pipeline could look like this:
- Ingest: Crawl GitHub repos, CLAUDE.md files, cursor rules, MCP server configs, agent frameworks
- Extract: Parse out system prompts, workflow steps, commands, guardrails, output formats
- Test: Run in sandbox, measure quality, safety, reproducibility
- Store: Save to shared registry with full metadata
- Share: Publish with guidance: when to use, how to set up, what output looks like, what risks it has
If the results are good, everything could be stored in a shared repo or a website with detailed guidance.
Models Are the Foundation. Skills Are the Real Leverage.
The more I think about it, the more I believe the future of AI may not just be bigger models, but better skill ecosystems.
A useful analogy: a model is like a talented engineer - they understand everything and learn quickly. But to work effectively in a company, they need processes, checklists, coding standards, review guidelines. That's exactly what skills are.
Great engineer + good process = excellent results. Great engineer without process = inconsistent, unpredictable output.
Why This Matters Right Now
Because we're at an interesting inflection point:
- Foundation models are already capable enough for most tasks
- Inference costs are dropping fast
- The open-source ecosystem is exploding with thousands of agent patterns, CLAUDE.md files, cursor rules
- But this knowledge is scattered, not well organized or thoroughly tested
If there's a system to extract and curate the best patterns from the community, we could significantly accelerate AI agent capabilities without waiting for the next model release.
Next Steps I'm Thinking About
- Build a simple crawler that reads CLAUDE.md and cursor rules from top GitHub repos
- Classify them by domain: coding, security, review, deployment...
- Test each pattern against a basic benchmark set
- Publish results in a reusable format
If you're thinking along these lines or have experience building skill systems for AI agents - I'd love to hear your thoughts.
Source paper: arxiv.org/html/2603.11808v2