Linting tools catch missing semicolons and stray tabs—but they choke on nuanced team conventions.
- Should helper functions live below public ones?
- How many parameters before we demand a dataclass?
This post is for engineering leads and tooling nerds who want a machine-learning model that understands their codebase and enforces style without endless regex rules.
We’ll build a Python pipeline that learns from your main branch, converts the model to ONNX for fast inference, and drops inline comments during CI—so developers see actionable fixes while the coffee’s still hot.
Environment Setup
Layer | Tool / Service | Why it Matters |
---|---|---|
Parser | Tree-Sitter or LibCST | Generates robust ASTs per file |
Feature builder | Python 3.11 + Pandas + scikit-learn | Extracts metrics and n-gram features |
Model | PyTorch → ONNX | Train in PyTorch, export for fast runtime use |
Inference | onnxruntime 1.x | Blazing-fast checks in CI pipelines |
Storage | Parquet + MinIO | Cheap, scalable artifact versioning |
CI trigger | GitHub Actions | Runs on every pull request |
IDE plugin | VS Code extension (optional) | Highlights issues locally before CI kicks in |
1. Collecting Style Labels from History
Supervised models need labeled examples.
We mine the last six months of commit history to generate our dataset:
- Positive samples – Files post code-owner review (merged state).
- Negative samples – Pre-squash diffs flagged for style issues.
A GitPython
script walks through commit pairs, aligns line numbers, and tags offending lines:
inconsistent_style = 1 # Used in model training labels
2. Engineering Features Worth Learning
Move beyond raw tokens. Add structural cues:
- Indentation depth per AST node
- Function length quartiles
- Parameter count (bucketed)
- Decorator presence (e.g.,
@staticmethod
) - Sibling order (helper vs. public functions)
Tokenize using Hugging Face’s tokenizers
with a 30k BPE vocabulary. Then concatenate numeric features and feed into a shallow Transformer encoder.
Enough model capacity to learn style—without frying your GPU.
3. Training and Exporting to ONNX

model = CodeStyleEncoder(vocab_size, hidden_dim=256)
trainer = pl.Trainer(max_epochs=5)
trainer.fit(model, dataloader)
# Export to ONNX
dummy = torch.randint(0, vocab_size, (1, 512))
torch.onnx.export(model, dummy, "style_enforcer.onnx", opset_version=17)
- Resulting
.onnx
file: ~7MB - Inference time: ~7ms on CPU for 512-token chunk
4. CI Integration That Developers Don’t Hate
GitHub Action YAML snippet:
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.11'
- run: pip install onnxruntime tree-sitter-generic
- run: python ci_style_check.py $GITHUB_EVENT_PATH
Your ci_style_check.py
script should:
- Parse changed files
- Chunk and vectorize tokens
- Run predictions via
onnxruntime
- Post inline GitHub review comments using REST API
Example Comment:
style-bot:
Consider moving helper_slugify
below public API, fileutils/strings.py:44
.
5. Local Developer Experience
Bundle the ONNX model into a VS Code extension.
- Leverage Python Language Server
- Run checks on file save
- Underline violations in-editor
- Empower devs to fix before pushing to CI
Result: Fewer noisy pull requests.
6. Retraining Without Boiling the Ocean
Automate retraining with a nightly cron job:
- Pull newly merged PRs
- Extract new positive/negative diffs
- Fine-tune model for 1 epoch
- Export ONNX and tag:
style_enforcer_v{date}.onnx
- Update CI config via GitHub workflow dispatch
Training time: ~4 minutes on 10k samples (no GPU needed).
Best Practices
Freeze your vocabulary
Avoid breaking token IDs. Retrain from scratch only when necessary.
Explainability
Use Integrated Gradients to show which tokens influenced a prediction. Include top 5 in your review comment.
Version control
Store ONNX artifacts in a MinIO or S3 bucket. Pin the SHA in CI to prevent silent drift.
Team calibration
Host a “style linter party” to label borderline samples—this massively improves your dataset.
Conclusion
Hand-rolling linter rules is a game of whack-a-mole.
With ONNX, you get language-agnostic, low-latency inference that integrates easily into any CI pipeline.
The result?
- Consistent code style
- Fewer nit-picks in PRs
- More time debating architecture instead of indentation
You’re not just enforcing style. You’re scaling it—with confidence and no drama.
Read more posts:- Stop Guessing About Traffic. Let’s Build a Digital Twin of Your City’s Streets