{"id":4178,"date":"2026-02-22T14:03:37","date_gmt":"2026-02-22T14:03:37","guid":{"rendered":"https:\/\/www.gubatron.com\/blog\/?p=4178"},"modified":"2026-02-22T14:03:37","modified_gmt":"2026-02-22T14:03:37","slug":"mining-git-history-to-build-developer-agent-personas","status":"publish","type":"post","link":"https:\/\/www.gubatron.com\/blog\/mining-git-history-to-build-developer-agent-personas\/","title":{"rendered":"Mining Git History to Build Developer Agent Personas"},"content":{"rendered":"<div class=\"longform-unstyled\" data-block=\"true\" data-editor=\"26ibm\" data-offset-key=\"cqhan-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"cqhan-0-0\">\n<p>A new software engineering practice for the age of agentic teams and an honest look at what it costs<\/p>\n<p>There is a new kind of software team forming inside repositories everywhere. It is not made of humans alone. It is made of humans and AI agents working together, agents that browse code, write tests, fix bugs, open pull requests, and respond to feedback at any hour.<\/p>\n<p>You pay them as they go (unless you have your own expensive hardware with 256GB RAM per agent) and you don&#8217;t need to fire them.<\/p>\n<p>The practice of configuring and directing these agents is itself becoming a software engineering discipline.<br \/>\nBut most teams configure their agents as if they were generic tools.<br \/>\nThey write a single `AGENTS.md` or system prompt that says &#8220;use camelCase&#8221; and &#8220;always run tests before committing.&#8221;<\/p>\n<p>That is table stakes. What comes next is more interesting, and more uncomfortable.<\/p>\n<p>What if the agents on your team actually *think* like the senior engineers who built the codebase?<\/p>\n<h2>The Exercise<\/h2>\n<p>A git log is one of the most honest documents a programmer ever produces. It records what a person noticed, what they fixed, how they explained themselves, and what they considered worth doing.<br \/>\nOver thousands of commits spanning years of work, a developer&#8217;s values, instincts, and style crystallize into the history.<\/p>\n<p>The practice is simple to describe:<\/p>\n<p><strong>Read a project&#8217;s git history as an ethnographer would. Identify the major contributors. Extract their patterns. Write those patterns into agent persona files that an AI agent can load and reason from.<\/strong><\/p>\n<p>The result is a file \u2014 typically placed at `.claude\/agents\/&lt;handle&gt;.md` or the equivalent in your agentic framework \u2014 that describes a developer not as a rulebook but as a character. Not &#8220;always use the logger&#8221; but &#8220;you are bothered by raw `e.printStackTrace()` calls \u2014 the logger is right there, and raw stack traces are production noise.&#8221;<\/p>\n<p>A rulebook tells an agent what to do.<\/p>\n<p>A persona tells it what kind of engineer it is, which means it makes correct decisions in situations the rulebook never anticipated.<br \/>\n&#8212;<\/p>\n<h2>What the Git Log Actually Tells You<\/h2>\n<p>When you read commits looking for a person rather than a patch, different things become visible.<br \/>\nConsider two developers on the same long-lived project.<\/p>\n<p>One writes commit messages that function as post-mortems: the message contains the diagnosis, the failure mode, the reasoning behind the fix, and the test conditions.<\/p>\n<p>The other writes messages that are a single noun phrase<\/p>\n<blockquote><p>&#8220;avoid copy of peer info list&#8221;<\/p><\/blockquote>\n<p>and trusts the diff to speak.<\/p>\n<p>These are not two styles of writing. They are two philosophies of communication, two different theories of what a commit is <strong>for<\/strong>.<\/p>\n<p>Or consider what kind of changes each person makes.<\/p>\n<p>One developer&#8217;s refactors are net-positive on lines of code: new abstractions, new executors, new scoring algorithms.<\/p>\n<p>The other&#8217;s refactors are net-negative: inner classes made `static final`, `public` fields made package-private, unnecessary list copies deleted, `printStackTrace()` calls removed.<\/p>\n<p>The first developer tends to add structure. The second tends to remove it.<br \/>\nNeither pattern is better. Both are coherent. And both can be taught to an agent: not through rules, but through characterization.<br \/>\n&#8212;<\/p>\n<h2>The Process<\/h2>\n<p><strong>1. Identify the major contributors<\/strong><\/p>\n<p><code>git log --all --format=\"%ae\" | sort | uniq -c | sort -rn | head -10<\/code><\/p>\n<p>Look for the two to five people who have shaped the codebase most deeply. These are the candidates.<br \/>\nFocus on engineers who have made hundreds or thousands of commits, not dozens. You need enough signal to distinguish style from accident.<\/p>\n<p><strong>2. Sample commits across categories<\/strong><\/p>\n<p>Filter by author and sample across different types of work:<br \/>\n<code>git log --all --author=\"&lt;email&gt;\" --format=\"%H %s\" | head -100<\/code><\/p>\n<p>Then read full diffs for a selection of them:<\/p>\n<p><code>git show &lt;hash&gt; -p<\/code><\/p>\n<p>Sample from: bug fixes, refactors, new features, dependency updates, documentation changes, and test additions. Each category tends to reveal different facets of how a developer thinks.<\/p>\n<p><strong>3. Look for patterns across three dimensions<\/strong><\/p>\n<p><strong>Commit message style<\/strong><br \/>\nLength, structure, use of rationale vs. bare description, whether they explain the *why* or just the *what*, whether they name what was wrong before they describe what they changed.<br \/>\n<strong><br \/>\nDiff character<\/strong><br \/>\nDo their changes grow the codebase or shrink it? Do they touch adjacent code or limit scope precisely? Do they tend toward new abstractions or toward simplification of existing ones? How often do they update tests?<\/p>\n<p><strong>Documentation?<\/strong><br \/>\nCode style within the change<br \/>\nNaming conventions, exception handling posture, how they reach for data structures, how they handle null, whether they write comments and what kind, how they structure class visibility.<\/p>\n<p><strong>4. Write the persona in second person<\/strong><\/p>\n<p>The output should be a character description, not a style guide. Write it as if describing a person to another person:<\/p>\n<p><strong>&#8220;You are bothered by methods that take a wide object just to call two methods on a nested field. You refactor the signature to take the minimum required type, remove the transitive dependency, and clean the imports. You don&#8217;t comment on this in the commit message, the diff says enough.&#8221;<\/strong><\/p>\n<p>That kind of description transfers. An agent that has internalized it will make decisions consistently in new situations, not because it looked up a rule, but because it knows who it is.<\/p>\n<p><strong>5. Place the file canonically<\/strong><br \/>\nIn Claude Code: <code>`.claude\/agents\/&lt;handle&gt;.md`<\/code><\/p>\n<p>In other frameworks, this is typically a system prompt file loaded at agent startup. The filename should be the developer&#8217;s identifier \u2014 their GitHub handle, their name, whatever makes the reference clear.<br \/>\n&#8212;<\/p>\n<h2>What Changes When Agents Have Personas<\/h2>\n<p>When you run an agentic team, multiple AI agents working concurrently on different parts of a codebase, the agents without personas are interchangeable.<\/p>\n<p>They apply whatever global rules exist and move on.<\/p>\n<p>They produce code that compiles, passes tests, and solves the stated problem.<\/p>\n<p>But it is generic code. It does not have the fingerprints of the codebase.<\/p>\n<p>An agent running with a persona will, when it encounters a method that takes a full domain object just to reach a nested value, feel the friction and refactor the signature. Not because a rule said to. Because that is what this developer does.<\/p>\n<p>An agent running with a different persona will, when it catches an exception from a parser and has to surface it to the user, write a message that makes sense to a human, add a defensive check before the operation that caused the crash, explain in a comment why the check is there, and write a commit message that teaches the next developer what the failure mode was.<\/p>\n<p>These are not the same agent. They have different intuitions derived from different patterns. And when working together in an agentic team, they produce code that resembles the coherent output of a human team rather than the averaged-out output of a generic tool.<br \/>\n&#8212;<\/p>\n<h2>The Software Dark Factory<\/h2>\n<p>The term &#8220;dark factory&#8221; comes from manufacturing: a fully automated production facility that needs no lights because no humans are present. Robots build things. Sensors monitor them. Nothing requires a person to be in the room.<\/p>\n<p><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/images.yourstory.com\/cs\/2\/e35953e0c10a11eeaef14be6ff40ae87\/Image06m1-1745678387936.jpg?w=640&#038;ssl=1\" alt=\"Xiaomi's Robot Factory Works 24\/7 Without Lights, Breaks, or People | YourStory\" \/><br \/>\nThe software equivalent is arriving. Not fully dark, humans still make architectural decisions, review output, and define goals; but increasingly, the routine work of software maintenance and feature development is being performed by agents that run continuously, pick up tasks, write code, and submit it for review.<\/p>\n<p><strong>Some organizations are already measuring engineering output in terms of how much human time is consumed per shipped feature, with the goal of driving that number down.<\/strong><\/p>\n<p>Developer persona files are how you make a dark factory coherent rather than merely productive, and it is perhaps a way to convince a nay-sayer to let go somewhat of their ego and delegate the grunt work to someone that codes in a style that cares about the same things they do.<\/p>\n<p>They encode the judgment that makes a codebase readable.<\/p>\n<p>The accumulated instincts about what belongs in a commit message, when to copy a list and when not to, how to communicate a failure mode to a user.<\/p>\n<p>Distribute that judgment to agents that work without the original engineer present.<\/p>\n<p>This is useful. It is also worth being honest about what it means.<\/p>\n<h2>The Part Nobody Wants to Say<\/h2>\n<p>This practice will be used to replace people.<\/p>\n<p>Today our <strong>agent.md<\/strong> files based out on commits are a low resolution suggestion to mimmic our coding styles and wisdom, but future models parsing every commit, every email and many other data sources that show how you actually work will make a much better agent clone of a developer.<\/p>\n<p>Not immediately, and not crudely. But the logic is clear: if you can extract a senior engineer&#8217;s coding style and judgment into a persona file, and then run that persona at scale across an agentic team, the argument for keeping the senior engineer on payroll weakens.<\/p>\n<p>Not because their persona file is them (it isn&#8217;t) but because it is close enough for most of the work that person was doing.<\/p>\n<p>This is not an argument against doing the practice. It is going to happen regardless in some places where they want to cut costs and they see that the agents are just as good if not better and way more productive.<\/p>\n<hr \/>\n<h2>Caveats and Honest Limitations<\/h2>\n<p><strong>The persona is a model, not the person<\/strong>. An 100-line document cannot capture the full depth of an engineer&#8217;s judgment. <strong>It is a useful caricature, not a replacement.<\/strong><\/p>\n<p><strong>Git history has selection bias.<\/strong> You see what was committed, not what was argued about in review, not what was deleted before it shipped, not the hours of investigation before the one-line fix. The commit is the output; the thinking behind it is partially invisible.<\/p>\n<p><strong>Style is not wisdom.<\/strong> A developer&#8217;s patterns are a proxy for their values, not identical to them. An agent that mimics a deletion-heavy refactoring style might delete something that should not be deleted. The persona needs to be paired with human review, especially on changes that are consequential.<\/p>\n<p><strong>Personas go stale.<\/strong> People grow and change. A developer who wrote code in 2018 with one philosophy may have different instincts in 2026. Treat the persona file as a living document, not a permanent artifact.<br \/>\n&#8212;<\/p>\n<h2>Getting Started<\/h2>\n<p>You need three things:<br \/>\n<strong>1. A git repository with meaningful history.<\/strong> Real commit messages, written by actual people who cared about what they were saying. If your team uses squash-merge with auto-generated messages, the signal-to-noise ratio drops significantly.<\/p>\n<p>This is, incidentally, another argument for writing commit messages carefully, they compound into something valuable over time.<\/p>\n<p><strong>2. An LLM capable of synthesis.<\/strong> Any current frontier (3rd week feb 2026) model can do this. Feed it thirty to fifty commits per author, full diffs, not just messages, and ask it to characterize the developer across multiple dimensions.<\/p>\n<p>The prompt can be simple:<br \/>\n<strong>&#8220;Read the following commits. Write a character study of this engineer, not a style guide, but a persona. What do they notice? What bothers them? What do they reach for first? What do they refuse to do? Write in second person so it can be used as an agent persona.&#8221;<\/strong><\/p>\n<p><strong>3. A canonical location in your project.<\/strong> For Claude Code, <code>`.claude\/agents\/&lt;handle&gt;.md`<\/code>. For other frameworks, wherever system prompts for named agents live. The filename should reference the developer unambiguously.<br \/>\nReview the output against your own knowledge of the person. Correct what is wrong. Add what is missing \u2014 especially the tacit knowledge that never made it into a commit message. Commit the file as you would any other team configuration.<br \/>\n&#8212;<\/p>\n<h2>Closing Thought<\/h2>\n<p>Software has always been a human artifact, shaped not just by requirements but by the people who made it.<\/p>\n<p>The architecture of a system reflects the mental models of its architects.<\/p>\n<p>The idioms in the codebase reflect the habits of whoever wrote the most of it.<\/p>\n<p>The comments reflect what the authors thought was worth saying.<\/p>\n<p>That human fingerprint is what makes a codebase coherent rather than merely functional.<\/p>\n<p>It is what lets a new developer read old code and understand not just what it does but why it does it that way.<\/p>\n<p>If we are building agentic teams that maintain and extend these codebases, one of the most important things we can do is give those agents something like taste, derived not from abstraction but from evidence.<\/p>\n<p>The git log is that evidence. It has been accumulating for years.<\/p>\n<p>The dark factory does not have to be anonymous. The machines can work in someone&#8217;s style.<\/p>\n<p>The question each engineer should be sitting with is: <strong>what aspects of how I work cannot be captured from a log?<\/strong>\u00a0Those are the parts worth investing in.<\/p>\n<h2>Enjoy Human Error Free Code<\/h2>\n<p>For now, <strong>it&#8217;s time to enjoy the fact we can spend more time thinking of solutions and that the code and the bugs due to human error are of no concern<\/strong>, it&#8217;s time to build faster and better every day, about being creative on what our products can do.<\/p>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>A new software engineering practice for the age of agentic teams and an honest look at what it costs There is a new kind of software team forming inside repositories everywhere. It is not made of humans alone. It is made of humans and AI agents working together, agents that browse code, write tests, fix [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":4179,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[1666,15],"tags":[],"class_list":["post-4178","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","category-code"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"https:\/\/i0.wp.com\/www.gubatron.com\/blog\/wp-content\/uploads\/2026\/02\/download.jpeg?fit=1168%2C784&ssl=1","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p5Unzf-15o","jetpack-related-posts":[{"id":4168,"url":"https:\/\/www.gubatron.com\/blog\/cloudllm-v0-10-from-a-simple-llm-wrapper-to-a-multi-agent-orchestration-framework\/","url_meta":{"origin":4178,"position":0},"title":"CloudLLM v0.10: from a simple LLM wrapper to a Multi-Agent Orchestration framework","author":"gubatron","date":"February 11, 2026","format":false,"excerpt":"February 11th, 2026 \u00a0 CloudLLM has evolved dramatically over three consecutive releases (v0.8.0 through v0.10.0) into a\u00a0comprehensive, production-ready platform for building autonomous multi-agent systems. What began as an LLM wrapper request-response pattern library has grown into a sophisticated orchestration engine with seven distinct collaboration modes, real-time event observability, atomic task\u2026","rel":"","context":"In &quot;AI&quot;","block_context":{"text":"AI","link":"https:\/\/www.gubatron.com\/blog\/category\/ai\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.gubatron.com\/blog\/wp-content\/uploads\/2026\/02\/agent_orchestra.jpg?fit=1096%2C657&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.gubatron.com\/blog\/wp-content\/uploads\/2026\/02\/agent_orchestra.jpg?fit=1096%2C657&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/www.gubatron.com\/blog\/wp-content\/uploads\/2026\/02\/agent_orchestra.jpg?fit=1096%2C657&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/www.gubatron.com\/blog\/wp-content\/uploads\/2026\/02\/agent_orchestra.jpg?fit=1096%2C657&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/www.gubatron.com\/blog\/wp-content\/uploads\/2026\/02\/agent_orchestra.jpg?fit=1096%2C657&ssl=1&resize=1050%2C600 3x"},"classes":[]},{"id":593,"url":"https:\/\/www.gubatron.com\/blog\/ssh-add-l-cannot-connect-to-your-agent\/","url_meta":{"origin":4178,"position":1},"title":"ssh-add -l -> Cannot connect to your agent.","author":"gubatron","date":"September 21, 2007","format":false,"excerpt":"keychain not working for ya... you run ssh-agent but ssh-add won't add the keys. This is probably because your SSH_AGENT_PID and SSH_AUTH_SOCK variables are incorrect... so I recommend you put something like this on your .bashrc to initialize your ssh-agent correctly: export SSH_AGENT_PID= export SSH_AUTH_SOCK= #make sure no old agents\u2026","rel":"","context":"In &quot;Code&quot;","block_context":{"text":"Code","link":"https:\/\/www.gubatron.com\/blog\/category\/code\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":489,"url":"https:\/\/www.gubatron.com\/blog\/temboo-needs-a-gui-dev-work-with-gubatron\/","url_meta":{"origin":4178,"position":2},"title":"Temboo needs a GUI dev, work with Gubatron","author":"gubatron","date":"March 31, 2007","format":false,"excerpt":"Reply to: job-298333217@craigslist.org My name is Mitsu Hadeishi. I\u00e2\u20ac\u2122m looking for an experienced Software Engineer with an interest in advanced GUI development to join our team. Over the last fifteen years, I've led a series of engineering teams to design and implement innovative, successful software products. My past projects have\u2026","rel":"","context":"In &quot;Geeklife&quot;","block_context":{"text":"Geeklife","link":"https:\/\/www.gubatron.com\/blog\/category\/geeklife\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":306,"url":"https:\/\/www.gubatron.com\/blog\/dreamconspiracy-theory-nsa-google-human-genome-project-1\/","url_meta":{"origin":4178,"position":3},"title":"Dream+Conspiracy Theory: NSA + Google + Human Genome Project = 1","author":"gubatron","date":"April 29, 2006","format":false,"excerpt":"It's been a while I don't wake up and write down what I just dreamed, let's see how well I remember the dream I had this morning. It seems the episode of south park where the elderly people take over the town to recover their driver's licenses, in a funny\u2026","rel":"","context":"In &quot;Diary&quot;","block_context":{"text":"Diary","link":"https:\/\/www.gubatron.com\/blog\/category\/diary\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":2467,"url":"https:\/\/www.gubatron.com\/blog\/today-starts-the-stanford-online-ai-course\/","url_meta":{"origin":4178,"position":4},"title":"Today starts the Stanford online AI Course","author":"gubatron","date":"October 10, 2011","format":false,"excerpt":"I just wanted to say that I have high hopes for the impact of this Stanford initiative. AI is one powerful discipline of computer science, one that many software engineers never experience first hand (this will be my chance to formally do AI for the first time, and I've been\u2026","rel":"","context":"In &quot;Geeklife&quot;","block_context":{"text":"Geeklife","link":"https:\/\/www.gubatron.com\/blog\/category\/geeklife\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":235,"url":"https:\/\/www.gubatron.com\/blog\/open-sourcing-since-the-early-days\/","url_meta":{"origin":4178,"position":5},"title":"Open Sourcing since the early days","author":"gubatron","date":"January 11, 2006","format":false,"excerpt":"Back in 1998 I was on my first year of Software Engineering in UCAB, our Algorithms and Programming I (by Prof. Omar Mendez and Alvaro Reb\u00f3n) course was dictated using a functional language which at the time sounded esoteric to us, Haskell. (I'm glad I started with Haskell, We knew\u2026","rel":"","context":"In &quot;Gubatron&quot;","block_context":{"text":"Gubatron","link":"https:\/\/www.gubatron.com\/blog\/category\/gubatron\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/www.gubatron.com\/blog\/wp-json\/wp\/v2\/posts\/4178","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.gubatron.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.gubatron.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.gubatron.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.gubatron.com\/blog\/wp-json\/wp\/v2\/comments?post=4178"}],"version-history":[{"count":3,"href":"https:\/\/www.gubatron.com\/blog\/wp-json\/wp\/v2\/posts\/4178\/revisions"}],"predecessor-version":[{"id":4182,"href":"https:\/\/www.gubatron.com\/blog\/wp-json\/wp\/v2\/posts\/4178\/revisions\/4182"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.gubatron.com\/blog\/wp-json\/wp\/v2\/media\/4179"}],"wp:attachment":[{"href":"https:\/\/www.gubatron.com\/blog\/wp-json\/wp\/v2\/media?parent=4178"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.gubatron.com\/blog\/wp-json\/wp\/v2\/categories?post=4178"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.gubatron.com\/blog\/wp-json\/wp\/v2\/tags?post=4178"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}