AI-Driven Metadata Enhancement for apcore-toolkit¶
This document outlines the strategy for using Small Language Models (SLMs) like Qwen 1.5 (0.6B - 1.7B) to enhance the metadata extracted by apcore-toolkit-python.
1. Goal¶
The toolkit's primary mission is to make existing code "AI-Perceivable". While static analysis (regex, AST) is efficient, it often fails to:
- Generate meaningful description and documentation for legacy code.
- Create effective ai_guidance for complex error handling.
- Infer input_schema for functions using *args or **kwargs.
Using a local SLM allows the toolkit to "understand" the code logic and fill these gaps with high speed and zero cost.
2. Architecture: Local LLM Provider (Option B)¶
To keep apcore-toolkit-python lightweight, we DO NOT bundle model weights. Instead, we use an OpenAI-compatible local API provider (e.g., Ollama, vLLM, LM Studio).
Configuration via Environment Variables¶
The AI enhancement feature is controlled by the following environment variables:
| Variable | Description | Default |
|---|---|---|
APCORE_AI_ENABLED |
Whether to enable SLM-based metadata enhancement. | false |
APCORE_AI_ENDPOINT |
The URL of the OpenAI-compatible local API. | http://localhost:11434/v1 |
APCORE_AI_MODEL |
The model name to use (e.g., qwen:0.6b). |
qwen:0.6b |
APCORE_AI_THRESHOLD |
Confidence threshold for AI-generated metadata (0-1). | 0.7 |
3. Recommended Setup (Ollama)¶
For the best developer experience, we recommend using Ollama:
- Install Ollama.
- Pull the recommended model:
ollama run qwen:0.6b - Configure environment:
export APCORE_AI_ENABLED=true export APCORE_AI_MODEL="qwen:0.6b"
4. Enhancement Workflow¶
When APCORE_AI_ENABLED is set to true, the Scanner will:
- Extract static metadata from docstrings and type hints.
- Identify missing fields (e.g., empty
descriptionor missingai_guidance). - Send code snippets to the local SLM with a structured prompt.
- Merge the AI-generated metadata into the final
ScannedModule, marking them with ax-generated-by: "slm"tag for human audit.
5. Security and Privacy¶
- No Data Leakage: Since the model runs locally, your source code never leaves your machine.
- Auditability: All AI-generated fields MUST be reviewed by the developer before committing the generated
apcore.yaml.