A Sample Shape Discussion that gives you a pretty good idea

of what kinds of things Shapers are interested in.

For a broader sample, take a look at the two Shape Roundtable books:

Roundtable on Project Management

"This is a masterful book. Kudos to all the editors and contributors" - Amazon Review

Rountable on Technical Leadership

"The advice in the book is some of the best that I have ever read. There is none of the egotistical posturing that pervades so many of the online forums, the contributors are genuinely humble and realistic. I found them refreshing, entertaining and likable." - Poster-G Ashbacher

 

Key Performance Indices


From: Poster-A

What Key Performance Indices (KPIs) will you use to evaluate the performance of software developers, QA engineers, and project managers in a software organization?

The evaluation here can be project based or annual review.

Part of the organization I work for is in variable culture and part in routine culture.

Our technical director said he doesn't want to evaluate the performance of managers and engineers under him subjectively and initiates a plan to come out with KPIs for each roles for performance evaluation.

I am assigned to the work group for the index of software developers. In the one hour discussion today my group discuss things like:

- If you deliver on time, then you get 3 points.

- If you have more than 20 P1/P2 bugs after beta you blah blah.

- If you blah blah, then you get a yellow card. If you blah blah then you get a red card. If you blah blah, then you get a golden medal.


December19

Jerry

Though I applaud your manager's desire to be objective in evaluations, in my experience, KPIs are a double-edged sharp sword. They cut both ways.

So, I hope Shapers will not just give Poster-A ideas on what KPIs you might suggest, but warnings with each suggestion on proper use and improper abuse, not just of the specific KPI, but of the who method.


December20

From: Poster-B

I don't trust "objective" measures of performance, never have and never will. Where I have seen them used, sometimes exclusive of any other determinants of performance, they have always been misused by management and manipulated by the people being measured.

If a manager cannot sit down with a subordinate and honestly discuss performance, then that manager has no business evaluating the performer. I know this doesn't help Poster-A in completing the assignment. It just explains why I won't be making any suggestions regarding KPIs.

Further, it has been my experience that managers usually have much more control of measures than do performers. One of the caveats of quality improvement is to measure the process, never the individual.

Jerry

I made these the first words in this thread because they may be the last words, too.

From: Poster-C

I'm going to slack by answering early, so I don't have to answer later.

Poster-A's example: - If you have more than 20 P1/P2 bugs after beta you blah blah.

Be careful with anything of this form. The quantity of P1/P2 bugs can be easily manipulated. Measure it and give a reward/punishment for the measure and the feedback loop will guarantee you get the numbers the measured person thinks you want.

Poster-A's example: - If you deliver on time, then you get 3 points.

Even this one can be manipulated, especially if the delivery is from one subset of engineers to another. As a tester, I have occassionally been given junk just so the developers can claim they made their deadline. I've also had my deadline curtailed because the developers missed theirs, e.g. development went over by two weeks, so testing will have two weeks less. Neither makes sense from the viewpoint of the quality of the product, but both make perfect sense from the viewpoint of the humans involved.

Jerry

Okay, here's rule number ONE. If you're going to count any kind of deliverable, it has to be of known and constant quality.

Known because unverified quality claims are less than useless.

Constant because you cannot meaningfully compare the quantity of two things of different quality. (How many rotten apples equals one good apple?)

From: Poster-D

I googled and learned that there are a very few references, but KPI is somewhat more than something a manager read in a magazine nobody ever heard of.

This article http://management.about.com/cs/generalmanagement/a/keyperfindic.htm says nothing about individual KPI, only organizational ones. I think Poster-A's managers are misapplying it. The examples given contain a large element of blaming, all "you" messages.

Poster-A: Our technical director said he doesn't want to evaluate the performance of managers and engineers under him subjectively ...

A cop-out. A manager's life would be ever so much easier if he/she didn't have to think or exercise judgment. Just fill out a form, add up the numbers, and the job is done.

The first thing Poster-A's work group must do is identify the organizational goals as they apply to the software developers. This will probably require the technical director's full participation. And I have seen many metrics initiatives die on the vine due to management's non-participation.

Jerry

BTW, Poster-A, feel free to share anything on this thread with your manager--except names. You may want to edit first, but maybe not.

And, if your manager is looking for consulting on this topic, then you can offer to contact some of the names and see if they're interested.

From: Poster-E

I'm interested in your manager's opinion of what constitutes objectivity. Note the irony here.

Poster-A:> If you deliver on time, then you get 3 points.

Hmmm... if you deliver /what/ on time?

Poster-A:> If you have more than 20 P1/P2 bugs after beta you blah blah.

Hmmm... who gets to classify these bugs? Is there any difference if you have 20 P1 bugs and no P2 bugs, vs. 20 P1 bugs and no P2 bugs? How good is it to have ANY bugs?

Is there any consideration given for the idea that some modules might be more complex, more risky, vulnerable to bugs in the development framework, new technology, old technology? That some developers produce much more functionality than others, and thus have a greater chance of running into bugs? That some developer might work on a very heavily-travelled part of the program, and that another has been working on something that rarely gets used or tested? That a developer is a novice? A master? That the code works, but is poorly structured and not maintainable? That the developer was very helpful to the testers, but less productive in terms of (say) lines of code written?

Jerry

That some code is much more clearly designed and written, so it's easier to find the bugs early.

Rule Number Two: Don't compare apples to oranges--and remember that no two code solutions are alike. (If they were alike, then why not just reuse the first solution?)

Poster-E

It seems to me that consensus from the team about the programmer's value--while completely subjective--is meaningful and that "objective" measurements are not terribly meaningful. "Objectivity" seems often about removing important contextual information, when in our business, context is (almost) everything.

Poster-A: - If you blah blah, then you get a yellow card. If you blah blah then you get a red card. If you blah blah, then you get a golden medal.

Now /that/ sounds subjective to me--three different results from exactly the same behaviour. :)

Jerry

It's the tone of voice. Can't you hear it? :-)

From: Poster-F

I'm afraid I'm not going to be much help. These things beg too many questions.

Jerry

That's the biggest help right there.

Poster-F

What does "delivered on time" mean? Delivered in what condition? How reasonable was the deadline?

For bugs after beta, how was the beta structured? Were there reasonable opportunities to find and resolve bugs?

However "objective" the criteria may appear, there is always room for variability and subjectivity behind them. It's not wrong to try to build an objective evaluation process, but I think it's a mistake to make it too mechanical. It only encourages a belief that the method is proof against the frailties of the evaluators.

Jerry

Maybe it would be better to start by recognizing that no KPI can be truly objective, and try to create some subjective ones where it's clear up front whose subjectivity is being used.

From Poster-G

A suggestion that has always fallen on deaf ears in my organization, Something that is very subjective but quite quantifiable:

Smiley faces!

I find the effort to be "objective" laughable. I have yet to see a good definition of an objective measure that can work in all cases. A judgment call is always necessary. For example, a well defined end result (for example your unit works and passes QA and test) for the lowest level engineer can be discussed at the beginning of the rating period, and the engineer can fail due to things beyond the engineer's control. I have seen this many times.

The really great workers and the really poor workers are easily observable (I have been in both camps). The big middle group (I have been in this camp too) is much harder to rate.

I prefer the subjective approach with smiley faces from all people who work with the individual based on their subjective criteria, written down.

For example: S/W Engineers rated with smiley faces:

From the managers, testers, CM, QA, customer, user, and fellow S/W engineers who work with them. Smiles, no expression, and frowns are easily counted. Each group can be given instructions to write down what things are important to them. They use these subjective criteria to help them come to a conclusion on what type of smiley face they want to put down as a rating.

This can be done with all the types of work and people in a project.

You can then define the "rating" with the number of smiley faces and such, another subjective definition. You can do all the "NT" crap to give it the patina of "objectivity." Weights based on obscure criteria, plots of rank versus perceived worth, the whole shebang--Yuck!

Bottom line, ratings are subjective period. I have seen a project meet all goals, timetables, costs, and such, and be an utter disaster. Everyone was happy except the customer. The wrong problem was solved. There was a little problem in the requirements elicitation.

The problem with using smiley faces is that a person who is prickly and meets the majority of the quality goals could be given more frowns than other people. Another problem with smiley faces is that there may only be a few people who rate the person. But then, this problem occurs with other methods too.

Boy, this is interesting in how hot emotionally this topic is for me.

Jerry

But that's just the point of "objective" measures, isn't it, to protect the manager from "emotionality."

You see, if your manager doesn't give you a raise and you get emotional and try to object, the manager simply sends you an email with your "objective" rating, saying, "Gee, I'm sorry, but I had nothing to do with this. It's just an objective measure of your performance."

IOW, the great advantage of objective measures is that they're not objectionable.

But, hey, if we could really get such measures, we could write a program to send out the emails, so we wouldn't need managers at all.

From: Poster-H

For Poster-A's manager: KPI #1 - the total number of KPIs tracked.

Unintended consequence: The manager might think the larger the number of KPIs, the better. That thinking may lead to a new bureaucracy of measurebators who become a plague on the people who do the work.

Harkening back the requirement's thread: KPI #2 - the number of clearly defined problems with concrete reasons why a solution is desired.

Unintended consequence: If people are rewarded for the number of problems clearly defined with concrete reasons, they may pad their numbers by defining the simplest problems.

KPI #3: the number of one-on-one meetings managers have with their employees.

Unintended consequence: Managers may schedule the meetings but discuss the weather.

KPI #4: the number of design and code reviews.

Unintended consequence: Lots of poorly constructed reviews that waste the developers' time.

The bottom line is that people are creative: They can and will distort measurements so the people whose opinion counts the most are favorably impressed with their work.

Jerry

Let's see. Hmm, Poster-H is the leading contributor to this thread because he's suggested more KPIs than anybody else. <Pat on the back, Poster-H (which is just slightly removed from a kick in the butt).

Poster-K

How about -3 points for each usage of a TLA? (oops)

Jerry

And how about -4 points for a FLAW (Four letter acronym word, like "OOPS"?

Poster-K

I'm afraid I won't be very helpful in this thread. Delivering on time, or number of bugs per engineer, for example, are IMHO more based on on the system (*) in use rather than the individual, so therefore not a good KPI. If you want employees to improve, then annual reviews are too infrequent and too late, and changing the system would be more effective (though harder).

"Objective measures" always seem to be applied subjectively. For example, engineer X had more bugs than Y, but that's because his project was "harder", so we won't hold that against him, but Y's project was "easy" (the manager thinks), so we'll downgrade him for his bugs.

Annual reviews actually seem to be used for CYA material [Ed: That's minus 3, Poster-K!] to justify ranking employees, and in spite of all the "objective" measures that may be espoused, the ranking actually rewards the employees who fit in with the unspoken corporate culture the best, and look good to their bosses -- even if they don't look good at all to their peers or the ones below them on the org chart. "Looking good" doesn't necessarily mean being productive or effective.

A few companies do "360-degree" annual reviews, where coworkers and direct reports anonymously contribute to the review process. I wasn't in such a company long enough to get a feel for how effective that was. In my current company, the manager can request input from peers, but can filter out that input if they want to.

(*)And I'd be happy to talk about systems that encourage or discourage bugs/on-time/etc. Perhaps you could review people on how well they came up with suggestions that improved the system. Toyota just pays cash to employees who suggest improvements, though, as soon as the improvement is adopted. (They might even pay cash for suggestions that are not adopted.)

Jerry

Oh, golly, that's another thread: abuses of a suggestion award system.

Poster-K

I doubt that Toyota uses annual reviews since W. Edwards Deming, who got Japanese companies hooked into continuous improvement, lists such reviews as sin number three of his seven deadly sins: "Evaluation by performance, merit rating, or annual review of performance - The effects of these are devastating - teamwork is destroyed, rivalry is nurtured. Performance ratings build fear, and leave people bitter, despondent and beaten. They also encourage mobility of management."

Jerry

There you go, Rule Number 3. And Deming and I are totally in synch, even to the numbering of this rule.

From: Poster-L

My KPI is this:

P(d)=f(n(KPI))

That is, the probability of my departing quickly from the organization is a function of the number of times the concept of "Key Performance Indicators" is mentioned by the managerial class.

One of the few things more demotivating than the annual performance review is the attempt to disguise this innately subjective process with window-dressing designed to make it appear rigorous and scientific. Any truly measurable action ("if you leave work early") will turn out to sometimes be counterproductive (you went home because you were lazy) or highly productive (you left early so you could meet with an old friend who happens to be the world's leading authority on the particular problem you're trying to solve at work), and there is no way for an observer within the organization to tell.

Worse, any measurable action has both immediate and deferred costs and benefits--delaying a design document's release because the design just isn't quite finished has an immediate cost in that the milestone isn't met and some developers may be spinning their wheels for a moment, but it could have a greater long-term benefit in that the project doesn't go down a wrong path. And you won't be able to tell until the project's done, and maybe not even then, since you'll be comparing to a hypothetical standard ("what if the document had/hadn't been released on schedule but incomplete?").

Deming's point was that "measurable performance" will vary from week to week, year to year, project to project, in response to outside influences that are far greater than the individual "merit" of the people involved. I believe he said that no apparent above-average or below-average "performance" should be considered an indicator of anything other than random variation until it has appeared consistently for seven years.

Since this organization is at least in part a variable culture, it stands to reason that any "KPI" observed is going to be more random variation than actual indicator of merit.

Run. As fast as you can. That's my advice.

Jerry

Running a 12-minute mile gives you 3 points.

Running an 8-minute mile gives you 5 points.

Running a 6- minute mile gives you 10 points.

Running a 4-minute mile gives you a new profession.


December21

Poster-M

Here's a notional but perhaps useful KPI: dollars. (Euros, in my case.)

Consider a QA engineer. If he/she was to work as a freelance, how much would her services be worth to the organization ?

I say "notional" because the thread considers employees. This KPI is quite real for those of us who choose to work independently. The aggregate worth of my services to my clients is measured in one neat number - my revenue.

And for a long time I've entertained the thought that it could work with employees. Use internal accounting procedures whereby your QA engineers, etc. "charge" a certain amount for their services to their internal clients. Let them be allowed to charge as individuals, or as teams and split the income, however they like.

There will be internal "market pressures" driving price: if there is a lot of demand for QA services, the KPIs of QA people will go up due to offer and demand. There will be side benefits: if the budgets of other departments for QA services are adequately constrained, these departments will put some effort in getting the most "bang for the buck" out of what they invest in QA. (Good-bye, vague or duplicated bug reports !)

When I bring this up, people give me strange looks and dismissive responses. I'm not sure why. Perhaps Shapers can enlighten me with a critique of this KPI.

Jerry

Perhaps it's because internal dollars are viewed as "funny money."

From: Poster-N

Well, Poster-A, I wish I believed in objective performance measurement.

However, where, how and who does measuring often change measures. Measures can be defined so that the desired results are achieved. Our biases are so deeply embedded that psychologists have spent much time defining and differentiating between sources of bias.

Our biases are so deep that my whole job is built around exposing merely one source of bias ($).

Also, my fundamental guess is that the most important questions are not easily turned into metrics.

1 - Did the manager / analyst / whomever successfully manage the expecations for the project?

2 - Were the project goals accomplished?

So, I don't know how to create an objective measure that really helps.

But if you do find a measure, please consider this fundamental -- which you probaby already know deep in your own experience:

Just measuring will cause performance to change, usually in a way that changes the measurement. For example, if the measure is number of lines of code written, people will write more code when they are measured. Not better code, not more efficient code. Just more code.

So, as they say, be careful what you measure.

Jerry

Notice the CPA in Poster-N's address. You know the story of the CPA test: You ask, "How much is X (for some X)?"

The good CPA answers, "How much do you want it to be?"

IOW, CPAs know how to make measurements measure anything you want, with any result you want. Not a good argument for "objective" measurements. Listen to Poster-N. She knows.

From: Poster-A

Thanks to everyone for your quick responses.

Jerry> Maybe it would be better to start by recognizing that no KPI can be truly objective, and try to create some subjective ones where it's clear up front whose subjectivity is being used.

Poster-A

My work group held a meeting to discuss the indices for the coding phase. During the discussion, we found that subjectivity is unavoidable as many of you have already pointed out.

We come out the following tentative indices and we have put tentative weight on these indices.

1) schedule

2) quality - we're thinking about using bug count or input from QA team.

3) code review result - style, standard, security, performance, scalability, correctness, completeness, ...

4) CM - for example, how well do you package the program, do you follow the rule for checking-in programs

5) support documentation

6) peer feedback from other guys in the project team

7) adjustment based on the difficulty/complexity of the program

Poster-E> Hmmm... if you deliver /what/ on time?

Depends on the topic discussed, for example, for coding phase, it means the source code according to the specification

Poster-E> Is there any consideration given for the idea that some modules might be more complex, more risky, vulnerable to bugs in the development framework, new technology, old technology?

We have added an index to adjust the score difficulty/complexity of the program.

Poster-K> I'm afraid I won't be very helpful in this thread. Delivering on time, or number of bugs per engineer, for example, are IMHO more based on on the system (*) in use rather than the individual, so therefore not a good KPI

We're still thinking about this. For example, if we're going to use bug count, we need to eval the human factors in bug resolution - from black box testing result to find out what's the source of the problem. (coding - whose source file, design - is it a bug in design or coding ... people's mindset may change from fixing one).

Poster-K> (*)And I'd be happy to talk about systems that encourage or discourage bugs/on-time/etc. Perhaps you could review people on how well they came up with suggestions that improved the system

We also discussed this, but found this hard to measure. And set aside this to other rewarding system.

Poster-F> What does "delivered on time" mean? Delivered in what condition? How reasonable was the deadline?

This is a good question, and I can discuss this with my colleagues today. In current situation, it depends on the project. For example, for projects with high time-to-market pressure, the overall deadline of the project comes from the top and thus the deadline for individual programs is not so reasonable. Regarding "Delivered in what condition" we use reviews and tests to verify if the deliverables conforms to requirement.

Jerry

Well, sounds like you're off to a good start. You've got the whole Shape forum pulling for you and your team. Say hello to them for us.


December22

From: Poster-L

Poster-A: We come out the following tentative indices and we have put tentative weight on these indices...

2) quality - we're thinking about using bug count or input from QA team.

My experience with bug counts reinforces Jerry's comments about CPAs. Whenever a tester finds a problem, he/she faces a choice of whether or not to write a distinct bug report for this problem, or lump it with another existing bug report, or to look for root causes and write it up as multiple bugs. I have seen this in action. Management looked at bug counts, concluded they were too high and issued an edict that there would be overtime until they came down. They came down quickly, as testers consolidated multiple symptoms into a single report. Then management decided that the number of bugs being discovered was too low and therefore the testing effort was ineffective; hence more overtime until the expected number of bugs was found. You guessed it; suddenly two or three bug reports would come out of a single test failure.

And, of course, none of this had anything to do (aside from by coincidence) with the actual number of bugs in the software, and whether the software was ready for prime time.

Jerry

But it definitely had something to do with whether overtime was paid overtime or not. Paid overtime (especially since most testers are vastly underpaid), reverses this dynamic. Promised overtime for more bugs produces more money, and thus more bugs. Promised overtime for fewer bugs produces more money, and thus fewer bugs.

Poster-M

I should add that this was not the procedure at some fly-by-night shop; this was what happened in a major, established development shop with a generally well-deserved reputation for delivering quality software. It's just that the delivery of the software took place more in spite of the bug counting than because of it.

3) code review result - style, standard, security, performance, scalability, correctness, completeness, ...

It is my experience that you can do a code review (or any other review of an artifact) for the purpose of improving the code or for the purpose of evaluating the project team, but not both--at least not at the same time. If you review the code for the purpose of evaluating individual performance--or, if there is even a suspicion that this is the case--valuable information will disappear, people will not be forthcoming, reviewers will look at issues and decide to resolve them privately (expecting the same courtesy when their code is reviewed, of course)... and the review will give highly suspect information about both the code and the people.

Jerry

Amen!

This is the difference between grading and marking. You mark the bugs so that people will learn, but if they're going to be used for grading, learning goes out the window as the game becomes hiding what you don't know from the teacher (the grader). You can't do both well at the same time.

This is why I never give grades in my classes. Never.

Poster-K

"... high time-to-market pressure..."

Sounds like the measures you've listed will have to weighted differently for each project. On projects like the above, with short deadlines, should quality (bug-count) be weighted lower than with other projects?

In schedule in general, who is estimating the schedule, and who is working to meet the schedule? Are they the same people? If this is weighted highly, then estimates will get longer so that it becomes easier to meet the schedule. If estimators and implementers are not the same people, then the danger is the schedules will be unrealistic.

Are managers who add features to a project without removing other (unimplemented) features penalized for endangering the schedule?

I get the impression that your company is using a "document-driven", sometimes called a "plan-driven" or "discipline" methodology -- which is ironic, since agile projects do MORE planning (and re-planning) and require more discipline from its participants than those other methods.

It's possible that an agile methodology might improve your products and production, but measures would be different for agile projects. Being too fixed on your current set of measures could "enshrine mediocrity".

For example, agile projects (almost) always have a fixed schedule, and will drop unimplemented features (of lesser value) in order to meet that schedule. Agile projects re-plan every 2 to 4 weeks, and are meant to take advantage of changing business needs, so measures based on "conformance to (original) plan" are inappropriate. Agile projects aim to keep quality high, and get speed for "high time-to- market" projects by simplifying the requirements and/or using tools that enable greater productivity (such as Ruby on Rails).

The AYE wiki has several relevant pages that you might want to read:

http://www.ayeconference.com/wiki/scribble.cgi?read=RankandYank

http://www.ayeconference.com/wiki/scribble.cgi?read=TheBonusEffect

http://bookshelved.org/cgi-bin/wiki.pl?

MeasuringAndManagingPerformanceInOrganizations

http://www.ayeconference.com/wiki/scribble.cgi?

read=WhatMakesDevelopersGood

http://www.ayeconference.com/wiki/scribble.cgi?read=WhatMakesTestersGood

http://www.ayeconference.com/wiki/scribble.cgi?

read=WhatSlowsYourProjectsDown

Quoting from the last link above:

"Seems that an underlying theme of 'what slows things down', based on the comments in this thread and on my experience, is:

a) not enough information

b) incorrect information (or as Jerry sez 'It's not a crisis it's just the end of an illusion')

Examples:

*participants with insufficient technical information (in terms of skills or experience)

*incorrect information on the expectations of end-users and other stake-holders

*not enough info to predict the effects of interactions with other software/systems

*insufficient information on requirements

*incorrect information on the reliability of reusable software components

*incorrect information about project status"

Jerry

All this points to the thing that Poster-A and his team are doing well: not establishing measures, but discussing them and all the ways they can go wrong.

If you do implement any measures, don't stop discussing. Don't ever stop, though of course you may pause from time to time, but not for too long.

Poster-E

Poster-E> Hmmm... if you deliver /what/ on time?

Poster-A> Depends on the topic discussed, for example, for coding phase, it means the source code according to the specification

Ah, but there are so many different ways to do that. I could give you a pile of source code with a certain number of lines in it, while you could provide code that performed the same functions in half the number of lines. Your code could be highly maintainable, while mine is a mess. Your code might include hooks so that we could test the program more easily, and logging so that we could keep track of what was going on; mine might contain none of that. Your code might perform more quickly than mine, might have fewer bugs found in production than mine.

On the other hand, mine might provide more robust security, might replace the operating system's inefficient routine with a faster one. Mine might not be so testable, but might have vastly better error messages and a reporting module in it, so that it saves the help desk $50,000 a year. I might have spotted some things in the specification that were risky, and coded around them. I also anticipated several things that should have been in the specification, but weren't.

Now, I'm clearly a harder-to-manage employee than you are. :) And this clearly an artificial example, but the point is: if you're a manager do you want all this information as a number, or as a conversation? Do you want to know the background as to why things are the way they are, and address them?

Poster-E> Is there any consideration given for the idea that some modules might be more complex, more risky, vulnerable to bugs in the development framework, new technology, old technology?

Poster-A> We have added an index to adjust the score difficulty/complexity of the program.

Now at this point, using my limited skills as an analyst: it appears that the first number is an incomplete measure, so we add an index (another incomplete measure) to adjust the score. We would do that, I think, because we have a feeling that there's something suspicious about the first number. But if we apply that thought, we should be suspicious about the second number too. Shouldn't we also worry about the function that we're applying--do we multiply the first number by the difficulty/complexity value, or do we add them? How do we know which function is appropriate?

I'd have to say that a questionable number in some questionable functional relationship with some other questionable number... well, the questionables won't cancel out. I predict, based on observing lots of managers who used lots of variations on this process: in most cases, people will make guesses as to the first number (which won't feel right), and then make guesses about the second (which won't feel right), and they they'll multiply the two numbers together. The product will be lower or higher than they expected, so they'll adjust one number or both until they have a final number that they're happy with. /And they had this number in mind from the start./

Jerry

This reminds me of the Oklahoma method of weighing pigs, as described in Augustine's Laws. You look around for a stone that seems to be around the same weight as the pig, then you guess the weight of the stone.

Poster-E

Poster-K> I'm afraid I won't be very helpful in this thread. Delivering on time, or number of bugs per engineer, for example, are IMHO more based on on the system (*) in use rather than the individual, so therefore not a good KPI

Poster-A> We're still thinking about this. For example, if we're going to use bug count, we need to eval the human factors in bug resolution - from black box testing result to find out what's the source of the problem. (coding - whose source file, design - is it a bug in design or coding ... people's mindset may change from fixing one).

This has a risk here of setting you up for some pretty massive distortion in what you're trying to measure. As soon as you try to reward testers for bug counts, they'll work hard to please you by providing you with a large /number/ of bugs, but the /significance/ of the bugs might be diminished. Who gets to decide whether they're bugs? Will the developers be subtly, directly, or indirectly punished for bugs? If so, will they argue about bugs? What other effects might result from counting the bugs? If not, how will you convince the developers that the data won't be used against them? Will this promote harmony and efficiency between testers and developers?

Poster-K> (*)And I'd be happy to talk about systems that encourage or discourage bugs/on-time/etc. Perhaps you could review people on how well they came up with suggestions that improved the system

Poster-A> We also discussed this, but found this hard to measure. And set aside this to other rewarding system.

Well, I'm glad that at least it is considered in some other reward framework. But I notice something interesting here: ultimately, the purpose of using metrics is to provide data that drive ideas about how to improve the system. Suggestions that improve the system ARE ideas about how to improve the system (and really valuable, or so I believe). But since suggestions are hard to measure, they don't count in the KPI evaluation. That means that /actual/ ideas are being discounted in favour of /metrics that provide data that would drive/ ideas. This seems to be a much less direct means of improving the process.

From: Poster-O

Jerry: "Perhaps it's because internal dollars are viewed as 'funny money.'"

I have noticed the same thing when organizations try to set up their IT or IS departments as "profit centers" charging their "customers" which are the other departments in the organization. The "bills" just never get paid.

With respect to this thread topic, I have to agree with the others: I do not believe in objective measures for personal performance in the work place.

"As we read the school reports on our children, we realize a sense of relief that can rise to delight that -- thank Heaven -- nobody is reporting in this fashion on us." -- J. B. Priestley

I learned much about so-called "objective measures" in grade school. Teachers (or, in my case, nuns) dutifully sent home report cards every 6 weeks and our parents had to sign them under the assumption that they reviewed them with us. There were several dozen categories and each had a grade that was both a number and letter. And of course there had been "tests" and many other "graded items" that were not subject to questioning.

But it was sometimes confusing to me. In my three years in kindergarten, we were graded on "nap time" and how fast we fell asleep (or, at least faked it -- I was a great faker). But in the first grade, with no warning, we received horrible marks if we fell asleep to the endless droning of the teacher.

I can also cite being graded on "eats lunch." If we had the school lunch, we had to show the nun a clean tray even when we were not given a choice about what was put on that tray. If we brought our lunch from home, we had to show the nun an empty bag. And if my mother was upset with me, she would put something in my lunch she knew I hated (e.g., lettuce on my baloney sandwich). And then I'd be in a real pickle. There were not many places we could successfully hide food, but we were inventive.

But woe unto him who got caught. I eventually learned to hold it in and vomit outside -- it's a wonder I didn't become bulimic.

Jerry

My solution in high school (I don't think we were graded in grade school--interesting oxymoron) was to forge my mother's signature on the card. I used the same signature on excuses for absences, so it was the only one they had on file.

This worked up until the time my mother actually sent them a note when I was too sick to write my own version. When I got back to school, I was called on the carpet for forging this note, as it didn't match my mother's (forged) signature.

Poster-O

The "objective measures" in the work place have nor been much different. The categories are silly and the "rules" incomprehensible.

One place had about 50 categories, each to be rated on a scale of 1 (low) to 5 (high). The employee rated himself/herself as did his/her manager. Then the manager would explain why his/her number was the correct one. It turns out that the rule was that no one ever got a 1 or a 5 in any category because then the manager had to write a report for each 1 and 5 given. Nor could a manager give someone all 4's or all 3's or all 2's because then it might look like the manager was taking things seriously.

Jerry

One of my clients had a similar system, and the people gamed it by writing a program in which you entered the mean rating you wanted to achieve, then randomly (but following the above rules) generated the individual ratings. I sat in on some meetings where the managers in all seriousness discussed with the employee the "significance" of their different ratings. Good thing I've played poker.

Poster-O

My current organization has only one "objective" rating for overall performance on a four level scale: unsatisfactory, satisfactory, exceed expectations, and clearly outstanding. But the managers were having trouble with that, so they introduced, with no guidance, + and - in each category. So silly me started asking, "What in blazes does Exceeds Expectations - or Unsatisfactory + mean?"

Jerry

That practice got me in trouble in high school (you may notice a pattern in my schooling) once the school started mailing my report cards directly to my mother (after the forging was discovered). I generally got all S's (They didn't use ABC grading, but mapped the letters into other letters, like A = S for Superior. They were very proud of their system, claiming it to be superior to other schools' ABCs, though it was precisely isomorphic.)

Anyway, I came home to find my mother holding my report card which contained all S's except for Advanced Algebra, in which the teacher had given me S+ (because, although that grade didn't exist in the system, the teacher said "S simply does not express how much you know"--not, notice, "how much you learned").

All she had to say (and you'll have to take my word for the blaming tone) was, "How come you didn't get S+ in your other courses?").

Poster-O

I have however finally got to the point with my managers that they just keep recycling the same evaluation for me each year. It appears no one other than me and my manager has noticed that.

It's a crazy game we play.

From: Macbeth (SCENE V. Dunsinane. Within the castle)

... it is a tale
Told by an idiot, full of sound and fury,
Signifying nothing.


December23

From: Poster-O

Hamlet, Prince of Denmark: "... it is a tale"

Hey, Ham -- it's OK for me to call you Ham, isn't it? -- I believe you may have wandered into the wrong play.

Jerry

Hey, I was strutting and fretting in the right castle, wrong character.

Poster-O

You know, you are already the wordiest character in Shakespeare's plays. So, shame on you for trying to increase your KPI by stealing Macbeth's chance to strut and fret his hour upon the stage.

Or maybe you and your pals Rosencrantz and Guildenstern got lost on that ship to England and landed in Scotland. Still, I don't think being lost is any reason for you to horn in on one of Mac's great speeches.

Jerry

Did I say Hamlet? Well, I must have been having a Midsummer Night's Dream (it was midsummer's night in Australia), but, even so, Dennis, I think you're making Much Ado about Nothing. Still, it's your Forum, so you can have it As You Like It, even if you want to make a Tempest in a teapot.


From: Poster-D

"Hamlet, Prince of Denmark (SCENE V. Dunsinane. Within the castle)"

I thought not, and found several sources to agree with me (doesn't make me right, just gives me company in possible error). Hamlet's castle is Elsinore, Macbeth's is Dunsinane.

A line from William Shakespeare's Macbeth, from Act 5, Scene 5:

(Macbeth's plans are falling apart around him.)

MACBETH
Wherefore was that cry?

SEYTON
The queen, my lord, is dead.

MACBETH
She should have died hereafter;
There would have been a time for such a word.
To-morrow, and to-morrow, and to-morrow,
Creeps in this petty pace from day to day
To the last syllable of recorded time,
And all our yesterdays have lighted fools
The way to dusty death. Out, out, brief candle!
Life's but a walking shadow, a poor player
That struts and frets his hour upon the stage
And then is heard no more: it is a tale
Told by an idiot, full of sound and fury,
Signifying nothing.

Jerry

Maybe if he'd had more objective metrics, his plans wouldn't have gone awry. "Petty pace" as a measure of velocity? "Brief candle" for project duration? Come on, Mac!

From: Poster-B

Jerry: "I made these the first words in this thread because they may be the last words, too."

Having already had the last words, I hesitate to add more, but here goes.

It seems obvious, as I already knew, that even the most "objective" measures are based on "subjective" judgments, e.g., difficulty of modules, complexity, quality, etc. I can think of two prime examples - COCOMO and Earned Value. These rely on imprecise "guesstimates" cranked into formulas to produce numbers that then become sacrosanct.

And, as Poster-N and others have pointed out, what you measure is what you get.

Measure the process, not the person. Measure for improvement, not evaluation.

At least the boss seems to have gotten one thing right. He asked those to be measured to come up with the KPIs. The impossibility of developing KPIs that the subjects of the measurements will be happy with should become obvious at some point.

Poster-A, before you submit any concrete suggestions you might want to establish what actions the boss might take regarding measures of these KPIs? Fire the lowest performers? Promote the highest performers? Distribute salary increases and bonuses based on KPIs? Would that be OK with the people being measured?

One thing I can predict with virtual certainty - a software development organization that utilizes individual measures to evaluate performance will be a poor performing organization.

Jerry

And a manager who can't identify inadequate performance in an individual without fancy "objective" measures is a poor performing manager.

Poster-K

One metric that I like (but note that it is a TEAM/project metric, not an individual one): Running Tested Features

http://www.xprogramming.com/xpmag/jatRtsMetric.htm

"Nearly every metric can be perverted, since up- and down-ticks in the metric can come from good or bad causes. Teams driven by metrics often game the metrics rather than deliver useful software. Ask the team to deliver and measure Running Tested Features, week in and week out, over the course of the entire project. Keeping this single metric looking good demands that a team become both agile and productive. [...]

"What is the Point of the Project?
"I'm just guessing, but I think the point of most software development projects is software that works, and that has the most features possible per dollar of investment. I call that notion Running Tested [Features], and in fact it can be measured, to a degree.

"Imagine the following definition of RTF:

"1.The desired software is broken down into named features (requirements, stories) which are part of what it means to deliver the desired system.

"2.For each named feature, there are one or more automated acceptance tests which, when they work, will show that the feature in question is implemented.

"3.The RTF metric shows, at every moment in the project, how many features are passing all their acceptance tests.

"How many customer-defined features are known, through independently- defined testing, to be working? Now there's a metric I could live with."

Jerry

I could live not with a count, but with a customer-defined value of each such feature. What's our work worth to our customer?

From: Poster-P

I was at a conference, a couple years ago, watching a panel discussion on how to assess software testers. After suffering in silence until almost the end, I asked the panelists why no one had mentioned the single most popular and straightforward approach to evaluating technical people: observing them and talking to them.

A senior manager from IBM (a VP or something) replied "Oh, that's not practical." I couldn't believe my ears.

I wonder whether his manager ever talks to him, or observes him in action, and whether it might just be possible that his success at IBM has something to do with interactions that he has with his bosses and with people his bosses might hear from.

The fundamental problem that motivates the number fetish may be that some managers are utterly terrified of managing an organization that has people in it. The search for plausible-looking numbers is the search for an alternative to Software as a Human Activity Practiced Effectively. Meanwhile, I bet those same managers will accept being evaluated numerically only if and when the numbers tell a positive story. Otherwise, they will expect that a liberal human mind will transcend the numbers and judge them less harshly.

Jerry

Thomas J. Watson, Senior, who founded IBM, used to have a favorite expression, something about how "manager" starts with "man." (Of course, those were the days when there were no women in IBM. I wonder which is worse: not having women employees, or having women employees and ignoring their existance by replacing them with numbers?"

From: Poster-Q

I have no desire to stifle this extremely interesting thread, in fact I oh so much wish it was there when I was asked to provide the equivalent of PKI, and it's definitely better to have it now then never, but I'd like to point Poster-A to another SHAPE thread that might help with his question:

Congruent Measurement:

[Here, Poster-Q gives a link to one of the archived Shape threads, accessible to all Shape subscribers: Ed.]

And btw. if the goal of the free sample thread is to attract people that might have good use from SHAPE, then I nominate this current thread (however it turns out) as the sample thread , because it's probably one of the more controversial subjects in this (or probably any other) business. Whoever's been asked to evaluate people will definitely remember how hard it was to even start thinking about it.

Jerry

Excellent idea.

Remind me if I forget to do it when the thread closes. It might be a while.

Poster-Q

What strikes me as (sort of) funny about this is that we (well, most of us definitely ... er, actually, at least I do ;) have almost no problem judging our colleagues professionally Every Single Day, as long as this judgement stays in our head and has no tangible consequences on their lives, but when asked to do it , it gets so tough as if we are asked to break several laws of physics. I suppose that there's something to be learned from that discrepancy - about ourselves, mostly.

Jerry

And what do you think that learning is? That thoughts are free as long as they have no consequences?

From: Poster-L

Poster-O: But in the first grade, with no warning, we received horrible marks if we fell asleep to the endless droning of the teacher.

You are lucky you went to parochial school in the old days, when that's all you'd get for falling asleep. These days you'd be diagnosed with "attention deficit disorder" and drugged into alertness.

I slept through a class in Software Methodologies in graduate school. Sat in the front row of the lecture hall, in a desperate attempt to stay awake. Didn't work; the second the prof, who was also department chair and a big wheel in the IEEE, started speaking I was out like the proverbial light. Friends report that at times the prof came down from the stage and stood six inches from my face yelling at the top of his lungs. I wouldn't know; I slept through that as well. I had a good sense of time, though, because I always woke up exactly three seconds before the ending bell.

Funny thing is, I got an A from that prof. Maybe he didn't associate my name with the guy who slept through class. Or maybe he had some other metric, which brings me back to counting things. I'd been working for five years before I went back to grad school and therefore had some knowledge of how real people executed these methodologies. My term paper, therefore, was a survey of the methods described in the class, with explanations of why each would not work in the real world. I carefully footnoted each description of method with a citation of the prof's published papers, because I'd heard that one of his grading criteria was how many of his papers you cited. Never mind that each citation was followed by an explanation of why he was wrong. As I said, I got an A for the course. Ain't metrics wonderful?

Re Jerry's comment about not giving grades--I wonder if that's got something to do with the great positive response I get when I walk into a high school classroom as a sub. The kids really seem to like having me there. Well, I admit it may just be that I'm cool, but there's also this: when there's a sub, there may or may not be any learning, but for sure there won't be any grading today. Maybe that's what they're responding to.

Jerry

Maybe they're responding in anticipation of the fun they're going to have throwing erasers at you when your back is turned to the board.

From: Poster-F

Jerry: "All she had to say (and you'll have to take my word for the blaming tone) was, "How come you didn't get S+ in your other courses?")."

In grade 9 I got the year prize in my school for English with a mark of 94. My father said, "What happened to the other 6 marks?" It's the only school result I remember. (Well, ok -- except the 27% in grade 12 math. That was special.)

From: Hamlet, Prince of Denmark (SCENE V. Dunsinane. Within the castle)

... it is a tale Told by an idiot, full of sound and fury, Signifying nothing.

Um...that's actually from Macbeth. The wonderful soliloquy that begins,

"Tomorrow and tomorrow and tomorrow, Creeps in this petty pace from day to day, To the last syllable of recorded time. And all our yesterdays have lighted fools The way to dusty death..."

I'm reading this in bed, and not about to get up and go downstairs to my office to find the act and scene reference, but I'm sure. It's right after Macbeth learns Lady Macbeth has killed herself. He begins, "She should have died hereafter", [something, something] and then launches into the soliloquy.

Jerry

Take the advice of someone who's been burned and don't do this stuff without looking it up.

From: Poster-A

As deadline approaching, my work group settled on a tentative standard, and we'll review and revise it periodically

BTW, let me add some culture background info:

In my country, we try to be objective in things such as school entrance exam, and gov't jobs.

High schools here are separated to junior high schools (7-9) and senior high schools (10-12). Junior high school graduates need to take a standard written test to enter senior high school. There is a dispute that we should test essay writing or not.

Many people object this, because the score for essay writing is objective [subjective?: Ed]. When I was a junior high student, the standard test is called "Joint High schools entrance exam" and your score in the test is the only criteria to decide what school you can enter. Essay writing was part of the test, and its weight in the total score was something like 5% or 10%. My father objected to the essay writing test because I always got a low score in this subjective exam at school and competition is high, the 5% or 10% may affect what senior school I enter. What senior school one enters is a big factor for one's score in joint college entrance exam. Our president's son entered no. 3 high school in his region because he got a bad score in essay writing. Certainly many parents don't want their sons/daughters to repeat the same error.

Now the standard test is called "Basic Competency Test" and essay writing is no longer tested. Besides, it's not a good idea to use the score as a sole criteria to decide one can enter what school. We added other criteria, for example, if you get top 3 in a city-level competition in things like piano playing or painting, you can get into schools easier.

Jerry

But of course piano playing and painting are totally subjective--almost.

From: Poster-A

Merry Christmas and Happy New Year!

Jerry

Indeed, and to quote Clement Clarke Moore, someone misquoted even more than the Bard of Avon:

But I heard him exclaim, ere he drove out of sight,

"Happy Christmas to all, and to all a good-night."


December23

From: Poster-D

I searched without success for the book and author of this quote "you get what you measure for". I take that to mean your actual performance will be limited to, and by, your metrics. So choose metrics wisely, which in this case means congruent to organizational goals.

Poster-A, my question for you is

For each of the KPI candidates you've discussed, which organizational goals does it support?

Jerry

Great question, Poster-D. Of course, it will be impossible to answer if you don't know precisely what your company's goals are. And how many do?

Poster-D

While typing this, my subconscious came up with Philip Crosby and his book: http://www.philipcrosby.com/pca/C.Press.html



Well, there's a sample. The discussion continues, with more Shapers contributing. The rest is in the archives, along with hundreds more. Join us and share in the fun and learning!



If you'd like to have intelligent discussion and sharing of experience with people like these as a regular part of your personal development Subscribe Now!