Kanban Training

March 20, 2020

I attended a Kanban training course in early March and took some notes. Here’s a very slightly edited version of my notes.

We performed a small thought experiment and game, with the overall take away being

Smaller batches are better than bigger batches in general

Call these batch sizes Work in Progress (WiP) limits.

We discussed an existing workflow with states and substages and WiP limits:

WiP 5 4 3 4 2 2
Stage Ready Discovery Build Build Deploy Test Release
substages doing - complete ready doing - complete ready ready

WiP is really tough to figure out at the start. We will get this wrong.

A workflow can have multiple points of entry, but it should have one point of exit. It’s possible to have more but aim for one.

You measure certain interactions on a regular basis, and these metrics are:

  • Cycle Time

    • measure of elapsed time (including nights, weekend, etc.) from the “start” to the “end or done” state of an item
    • answers “When will it be done?” for a single item
  • Throughput

    • count of how many items are gotten done per unit of time. This is a simple count but not points or hours or some other measure.
    • answers “About how many items will be done in a period of time?
  • Work in Progress

    • count of all items between the start and the done states.
    • answers “How much is in flight?
  • Work Item Age

    • measure of how long has an item has been on the board.
    • measure of how long has an item been started but not finished.
    • makes teams ask “Why are the items aging and not moving or being delivered?
    • this is probably the most important question to be examining.
  • Service Level Expectation (SLE)

    • eg 85% of work items will be finished in eight days or less
  • Average Cycle Time

    • (Average Work in Progress) / (Average Throughput)

SLE and Average Cycle time metrics are related and affect each other.

We used a game called TWiG to try out different strategies and WiP limits. Two peers and I made a group and tried a bunch of different ideas.

  1. First Game

    We focused on making sure work moved between columns, choose which work to do based on what could finish, and generally prioritized work in the green column to finish over any other work.

  2. Second Game

    We followed the same strategy as first with the only difference being lower WiP limits. We never hit them in the first game except for the red work.

  3. Third Game

    We approached this with a much different strategy. We moved WiP limits to 1 1 1 and tackled any blockers that came up first.

This was a game, so the level of abstraction and simplicity made us question exactly how much this could model real life software development. Can you know exactly how much work someone can do or a task will need to be done? Can work ever move backwards? Does the idea of red workers being good at things in the red column really match or mean anything? What happens with bugs found in the Blue and Green stages? Interruptions seem way more common in reality, but it was nice to see them included here.

We did find have some learnings out of all of this:

  1. This is hard to reason about. We’ll need to pick something and tweak as needed for our teams.
  2. WiP doesn’t always have the intended effect, but lower limits seemed to be better at delivering work to a big degree. You can go too low on the limits.
  3. Try to have work ready to pull from the left. We see the value of “complete” substages in this model.

We examined an example board to talk about a few things.

WiP 5 4 3 4 2
Stage Backlog Ready Red Blue Green

Ready is the first stage with Pull, so it is the first point of commitment. There is no commitment to the work item before it reaches that stage. This gives an opportunity for Just in Time prioritization and commitment.

SLE should be based on existing data. I think it should be required for Kanban. It is how we can tell customers how long something will likely take once committed.

Items in the backlog should be right-sized but not necessarily same-sized. Can this item be completed within the SLE? This is the point with the least amount of information. If the conversation leads to yes, then the conversation is over and the team can start the work. If it leads to no, then is this the right thing to work on? Can it be split or reduced in normal ways?

Things moving between stages, into the Ready column, backwards, or not moving should generate conversations between the product owner and members of the team.

This list was the instructor’s recommendation for what’s most important to figure out first:

  • Work item types
  • Workflow
  • Visualization Policies
  • WiP Limits
  • Pull/prioritization policies
  • SLEs
  • Definition of Done