image description image description

Tracking Development Productivity

Some thoughts on tracking product development team’s productivity.

Measuring engineering productivity is 60% art and 40% science. This is true even though the engineering and software development communities – both academic and commercial – have spent countless person-years trying to make calculating the ROI of development like a simple balance sheet.

Here is an outline of one way to approach the ‘science’ bits:

You have to begin with measurements and metrics. Typical approaches across a product portfolio involve answering key questions and measuring some key metrics:

Epic and Story Quality and Completeness
Are we putting in the time when developing stories to allow correct assessment of the story points required to complete the story with sufficient quality, testability, and functional completeness?

  • Assessment is not easy. You cannot simply wait for a failed/punted story, failed sprint, or failed User Acceptance Testing (UAT) to signal that the investment in epic and story development is insufficient.
  • But, a history of blowing through the story budget for a story or sprint point budget is a clear indication of poor story/sprint composition, poor estimation, poor execution, or poor quality. So yes, it is hard to get this part right. This issue is exacerbated in a growth company as the development team grows and potential includes employees, contractors, and off-shore development people.


Apples vs. oranges
There needs to be a <<rough>> agreement of what a story point means across product managers/developers/teams?

  • If teams, product managers, and development lead do not agree on a metric here, then establishing a good model of both investment and delivery is very hard.


Do we have a good understanding of each team’s capacity? Capacity is the points per developer/team per day or sprint.

  • The two hours per story point ratio is fine for planning purposes but more difficult when tracking actual productivity. This is due to variations in developer skills, area of the product, subject matter expertise (SMEness), etc. (insert endless development community rant here)
  • If you begin with a generally agreed mathematical capacity (i.e.: #of developers*agreed per development capacity) and adjust for variance from there it is better than nothing.

Example of tracking story points per developer:

Do we have a good understanding of per developer and per development teams’ actual velocity (i.e., the burn down rate of story points per sprint vs. the mathematical capacity)?

  • If the definition of a story point is common across product managers and development teams and story points are correctly accounted for on every story, then the story point capacity per sprint can be calculated and compared to the actual velocity via a burn down monitor.
  • If several sprints are monitored and the calculated capacity or velocity do not come close to the actual burn down, then the capacity should be adjusted down until the two are close. Note: This assumes that there are few to no stories that are dropped from a sprint.
  • Once the actual velocity and burn down match (or come close), you can look at improving both capacity and velocity and figure out what isn’t working (story quality, people, process, tools, etc.) to improve velocity and thus product delivery speed.
  • All of this leads to improved prediction of progress versus simply using a roadmap and sprint plan.


Quality: But does the product actually work?
The quality of committed code also has an impact on velocity, capacity, and burn down. Planning quality requires additional qualitative and quantitative metrics.

Here are some things to think about:

  • Monitor failed unit, automated, performance, or functional tests that cause a high defect count per sprint/story/developer.
  • “Kickback ratio”: a sprint or feature or story may be successfully delivered but the number of defects may outweigh the value; Most DevOps platforms will allow an easy calculation of story points vs defects.

Here is a sample:


  • Percentage of unit and automated tests per story is also a key indicator of a product and product teams’ ability to scale:



  • In this example, the product is likely a weird combination of new code base (<10yrs) but very mature and feature/platform/DevOps stable.
  • Quality includes the story points of technical debt per feature/sprint/release. For example, does one implementation of a story or design have a performance or functional ceiling? If yes, this will introduce technical debt that will need to be addressed. This is not to be confused with laying the foundation for future feature-complete stories/epics in a future sprint or release. A story can be complete in sprint 1 but not feature or epic complete until sprint 5.
  • Like everything, quality should be given a budget (i.e., for every story or feature there are X points budgeted to QA).
    • Note that for large, complex, and new products, it is often good to also budget for test automation. This is especially true for SaaS products where user adoption goes from 0 to 100% overnight. (side note: I sometimes miss the days of on-premise enterprise software where you do a release on 1 March and only a few bleeding edge customers adopt immediately which gives you the time to fix a ton of defects).
  • For SaaS products there are additional metrics around the broader concept of quality including infrastructure uptime and service availability. For example, you could have a functionally complete product but if the customer master-detail grids a non-performant then the epic or story for that UI is a fail and possibly a defect.
  • As a company and product grows, it is important to accept that there will always be many more defects than everyone thinks there should be.
    • While working at a software company in the transportation management market we had a very tough meeting with my CEO and a customer who was upset about some very particular use cases that exposed test coverage failures. The CEO said, “it is fiduciarily irresponsible to have no defects.”
  • Define good. Set a metric to measure what is ‘good’ quality.
    • This is really subjective. You can track performance of quality budget vs. actual defects using the budget mapped to a product, story (and thus the product manager), developer, team, whatever breakdown is interesting. The development team is likely doing this anyway as part of release go/no go.


So, how does this help establishing return on investment (ROI) for product development teams and products?
Let’s start with the pseudo-science part of it:

  • Establish the true revenue per product. This sounds simple but many companies do not actually discretely slot their products or even account for revenue/GP cleanly by product. Sometimes even defining what a ‘product’ is can be difficult, especially for a software company that has a heritage as a professional services firm. I tend to be all-inclusive and define a product as anything that is developed and used by a customer or partner.
  • If the user stories are sized and delivered with capacity and burn down in rough balance, then the weighted average cost of the product management, development, QA, DevOps, etc., can be used to calculate the revenue and GP per product and then per story point and developer.
  • Any accurate model would need to also factor in other costs such as DevOps and operating costs.
  • In some cases, it is easier to be ‘close enough’ to try and identify product investments that are clearly out of line.
  • A good example of products that are out of whack from an ROI perspective are mature products in highly regulated/structured markets where monthly or annually, the rules/rates/products change and the product is not designed or built to make these changes efficiently (i.e., rates for a parcel product or acquisition rules for a procurement product).

Now some thoughts on the ‘art’ bits of ROI estimation.
Even if all the metrics line up and there are obvious winners and losers in the ROI calculation across a product portfolio, there are subjective measures of ROI:

  • Salability value. Features that demo great (either flashing lights or ability to check off some barrier to entry) but will never be used can have exponentially more value than a bread and butter feature.
  • Marketing value. I have often built products or features that were entirely focused on generating buzz. For example, at one software company we built a Windows Azure-based enterprise architecture modeling tool to splash the market. The press releases, blog postings, fantastic demonstration videos, and Microsoft-corporate to love was fantastic. This opened a channel to promote our small company in a crowded market. Never sold a single license but had a massive impact on deals in the pipeline.


We look forward to your feedback and hearing your growth story.

You can find out more about Susquehanna’s experience across 50+ portfolio companies and our broad catalog of best practices to help you accelerate growth here.

Greg Carter

Greg Carter

Cycling fanatic that dreams of being a chef someday