ADVERTISEMENT

ESPN Computer Releases Its New Top 25 Rankings - Story by Andrew Holleran • 5h ago

In the past, in the very limited sample we have, it has translated into 1.5 less wins than expected by the model. Wow, what a withering indictment of the model. Brb, off to flip a coin 10 times, get 7 heads, and submit my results to science.
Shows imprecision of the model that bends in the direction of a resume like UConns.

I too build statistical models, as you know. For me we assess our models not only by how they predict, but with how consistent they are with other things we know our believe. There is a sniff test. And UConns raining at a top 10 performer doesn't pass the sniff test.

The model needs fixing. Not only has it been wrong, but it has not corrected when given new information. That resiliency is a flaw. And if we look at the last month or 6 weeks I bet the predictions have been far worse. Out of conference matters, so does recency
 
Last edited:
Shows imprecision of the model that bends in the direction of a resume like UConns.

I too build statistical models, as you know. For me we assess our models not only by how they predict, but with how consistent they are with other things we know our believe. There is a sniff test. And UConns raining at a top 10 performer doesn't pass the sniff test.

The model needs fixing. Not only has it been wrong, but it has not corrected when given new information. That resiliency is a flaw. And if we look at the last month or 6 weeks I bet the predictions have been far worse. Out of conference matters, so does recency
Being 1.5 wins off out of 24 games for one out of 363 teams you are rating doesn’t show anything at all. It’s not even remotely close to being surprising in any way. This is something I really, really hope you already know.
 
I mean at least stick to the actual outliers if you want to make these lol arguments.
 
I'm curious if there would be more accuracy if performance against bad teams was somehow weighted less.

I know there's adjusted efficiency that tries to take this into account, but I'm curious if there's a steeper competitiveness dropoff into the 250+ range than is being accounted for.

I mean, how much is Ohio State's 37 point win over St. Francis (PA) factoring into the model, or UConn's 53 point win over Long Island, or WVU's 34 pt win over Penn, or Rutgers' 40 pt win over Columbia? Do the results of those games really add the correct inputs when evaluating how a team will perform against an opponent in the top 50?
 
I'm curious if there would be more accuracy if performance against bad teams was somehow weighted less.

I know there's adjusted efficiency that tries to take this into account, but I'm curious if there's a steeper competitiveness dropoff into the 250+ range than is being accounted for.

I mean, how much is Ohio State's 37 point win over St. Francis (PA) factoring into the model, or UConn's 53 point win over Long Island, or WVU's 34 pt win over Penn, or Rutgers' 40 pt win over Columbia? Do the results of those games really add the correct inputs when evaluating how a team will perform against an opponent in the top 50?
Torvik already does this. He also has a recency bias. He has UConn.. 8th
 
Being 1.5 wins off out of 24 games for one out of 363 teams you are rating doesn’t show anything at all. It’s not even remotely close to being surprising in any way. This is something I really, really hope you already know.
My point is the error is not randomly distributed and you are relying on the flaw of averages. How many of the last 10 games against lower rated teams has UConn won. The model is performing poorly now. More than random error, it is a flawed model.

Retrospective predictions are useless except for teaching the model to improve. The only prediction that matters is the next one.
 
Last edited:
  • Like
Reactions: Loyal-Son
My point is the error is not randomly distributed and you are relying on the flaw of averages. How many of the last 10 games against lower rated teams has UConn won. The model is ordering poorly now. More than random error, it is a flawed model
You have no evidence of this! It’s literally just “look at UConn this is obviously wrong” even though they aren’t even an outlier and there is virtual agreement between every major model. That’s laughable, why should anyone take that seriously?
 
Like you actually think that UConn transformed from a great team to a mediocre team magically? That’s ****ing stupid man, that doesn’t happen barring major injuries, and your evidence is literally a sample of 10 games!
 
Just as an aside, and not directed at one particular poster..

When literally every respected model has Ohio state higher than you think they should be.. do you ever stop to think that maybe there is a reason for that, and you (the human just eyeballing things) are the one that is wrong?
Does winning the game not count as part of the calculation?

Who gives a **** how efficient they are
 
Like you actually think that UConn transformed from a great team to a mediocre team magically? That’s ****ing stupid man, that doesn’t happen barring major injuries, and your evidence is literally a sample of 10 games!
I don't ever think they were a great team. I think at a key point in the morning inputs they played at their very best. They have regressed to their mean and the model doesn't recognize that because of how schedules are constructed.

Also while efficiency typically correlates with overall performance the teacher of that correlation will vary team to team, across styles of play etc. It may be that UConn is one of those teams for which that correlation is not so great.

Models function on average. Doesn't make then work for every example of. And these models don't build in very much appreciation for heterogeneity.

Just because UConn's rating is absurd as it is, is not an indictment of models or statistics. It just shows that in any individual instance models can err.

Inviting @SkilletHead2 into this conversation.
 
I don't ever think they were a great team. I think at a key point in the morning inputs they played at their very best. They have regressed to their mean and the model doesn't recognize that because of how schedules are constructed.
The construction of the schedule has very little to do with it. Especially theirs; it’s not like their OOC was all cupcakes.
Also while efficiency typically correlates with overall performance the teacher of that correlation will vary team to team, across styles of play etc. It may be that UConn is one of those teams for which that correlation is not so great.

Models function on average. Doesn't make then work for every example of. And these models don't build in very much appreciation for heterogeneity.

Just because UConn's rating is absurd as it is, is not an indictment of models or statistics. It just shows that in any individual instance models can err.
But it doesn’t. There is nowhere near enough data on UConn to demonstrate that their rating is wrong or that any of this is happening. All of the flaws you describe here are possible flaws. UConn’s (again, not even particularly anomalous) rating isn’t evidence of them.
Inviting @SkilletHead2 into this conversation.
 
Does winning the game not count as part of the calculation?
Depends on the specifics of the model, but even if it’s just 100% purely on efficiency then winning the game will always be better than losing the game.
Who gives a **** how efficient they are
People who are trying to predict how they’ll play in the future, i.e. the people making all of these models.
 
Torvik already does this. He also has a recency bias. He has UConn.. 8th

I'm curious the extent he decreases that weight of those lower teams. Even with UConn at 8, I can't imagine they Husky's have performed to his model's expectations over last 40% of their season.

He also has Ohio State at 43 now, which is dropping in the direction of reality.... despite the fact that their only wins against teams ranked better than 48 in his model are 38 Cincy and us, who they beat on a blown call.

Even if these models are very close to reality 95% of the time, that still leaves 15-20 or so teams that are out of whack.
 
I'm curious the extent he decreases that weight of those lower teams. Even with UConn at 8, I can't imagine they Husky's have performed to his model's expectations over last 40% of their season.

He also has Ohio State at 43 now, which is dropping in the direction of reality.... despite the fact that their only wins against teams ranked better than 48 in his model are 38 Cincy and us, who they beat on a blown call.

Even if these models are very close to reality 95% of the time, that still leaves 15-20 or so teams that are out of whack.
I just fully reject the idea that people eyeballing the results and picking teams they think are out of whack means they are actually out of whack.

I bet you that both UConn and Ohio State will perform better over the remainder of the season than you expect.

People would have said Rutgers looked out of whack after the first 10 games
 
Are people going to post this every day for the rest of the season? They have good efficiency, they are high in ratings that rate efficiency, it is what it is
Yes. The models are all programmed by humans, all with apparently a bias that somehow doesn’t value winning efficiency. It’s not that these algorithms are putting them at 65, which at 11-11 entering today would be reasonable, rather they rank them up there with good teams. They are not a good team. Good teams find ways to win. Bad teams make excuses. I’ve worked with a ton of quantitative minded people. They are as ignorant as those who only base their judgements on the eye test. There’s a reason why economists have predicted 10 out of the last 5 recessions.

And once again, yes, you will have to suffer through OSU being held up as the poster child for the ignorance of these models because it’s the grossest example of the ignorance.
 
  • Like
Reactions: motorb54
I just fully reject the idea that people eyeballing the results and picking teams they think are out of whack means they are actually out of whack.

I bet you that both UConn and Ohio State will perform better over the remainder of the season than you expect.

People would have said Rutgers looked out of whack after the first 10 games

It's looking at actuals vs. predictions.

On Jan 2nd, Bart had Ohio State as the #10 team in the country, expected to win the majority of their next 10 games. Even now, after losing 9 of 10, the model still says they should have won 4 more than they actually did, if they were to play those 10 games again. They very clearly weren't the team the model was predicting on January 2nd... question now is whether they're even the team the model is predicting now on Feburary 5th.

The model was pretty badly off on them 13 games in... and after 10 more games, it has adjusted them downward by a lot, weighing their recent futility against their early success. But has it adjusted enough? Time will tell.

This isn't about the "eye test" - it's about teams the model predicts to win a lot, who have ended up instead losing in bunches.
 
Good teams find ways to win. Bad teams make excuses. I’ve worked with a ton of quantitative minded people. They are as ignorant as those who only base their judgements on the eye test. There’s a reason why economists have predicted 10 out of the last 5 recessions.
Adults are talking here.
It's looking at actuals vs. predictions.

On Jan 2nd, Bart had Ohio State as the #10 team in the country, expected to win the majority of their next 10 games. Even now, after losing 9 of 10, the model still says they should have won 4 more than they actually did, if they were to play those 10 games again. They very clearly weren't the team the model was predicting on January 2nd... question now is whether they're even the team the model is predicting now on Feburary 5th.

The model was pretty badly off on them 13 games in... and after 10 more games, it has adjusted them downward by a lot, weighing their recent futility against their early success. But has it adjusted enough? Time will tell.

This isn't about the "eye test" - it's about teams the model predicts to win a lot, who have ended up instead losing in bunches.
You expect that to happen some amount of the time though. If you want to show the model is wrong you have to show it happening more than expected. If I flip a coin 10 times and get 8 heads it’s not proof or much in the way of evidence that my coin is weighted.
 
Adults are talking here.

You expect that to happen some amount of the time though. If you want to show the model is wrong you have to show it happening more than expected. If I flip a coin 10 times and get 8 heads it’s not proof or much in the way of evidence that my coin is weighted.

Which also means it is pointless to defend the model at the team level. Some teams will be pretty far out of whack, because in a small sample size of 20ish games, anything can happen.... in this case, the anything was that the model has largely whiffed on Ohio State.

It happens - the model won't be perfect. As I said, even if it's pretty close 95% of the time, it'll be far off on 15-20 teams. Ohio State just happened to be one of them this year. It's possible to defend the model in aggregate while also saying it can get things wrong at the individual team level.
 
  • Like
Reactions: Loyal_2RU
Which also means it is pointless to defend the model at the team level. Some teams will be pretty far out of whack, because in a small sample size of 20ish games, anything can happen.... in this case, the anything was that the model has largely whiffed on Ohio State.

It happens - the model won't be perfect. As I said, even if it's pretty close 95% of the time, it'll be far off on 15-20 teams. Ohio State just happened to be one of them this year. It's possible to defend the model in aggregate while also saying it can get things wrong at the individual team level.
That’s not “wrong” though. If I predict 5 heads out of 10 and get 8 that’s not wrong in a meaningful sense. It’s the right prediction.
 
The specific purpose of his first post was to invite people to say things he could reply to with anger.

He attempts to boost his self-esteem by ranking people below himself. Proactive defense, lightly triggered belligerence, name calling, all in just this thread. He’s an unhappy person, likely coping with frequent social rejection.
Lol
 
The construction of the schedule has very little to do with it. Especially theirs; it’s not like their OOC was all cupcakes.

But it doesn’t. There is nowhere near enough data on UConn to demonstrate that their rating is wrong or that any of this is happening. All of the flaws you describe here are possible flaws. UConn’s (again, not even particularly anomalous) rating isn’t evidence of them.
Is there then enough data to demonstrate that their rating is correct?
 
That’s not “wrong” though. If I predict 5 heads out of 10 and get 8 that’s not wrong in a meaningful sense. It’s the right prediction.
But it's not an accurate prediction for those 10 flips.

Now flip your thinking. Instead of thinking about the variance or error regarding the individual prediction, think about the models as having error being applied to some teams. Not because of stochastic variation. But because the specifics of the team and the model do not fit. It can be systematically wrong for some teams even if it is an excellent overall model. Since some teams don't perform according to the assumptions underlying the model.
 
  • Like
Reactions: fluoxetine
Looking at their remaining schedule and recent performances, I will say OSU finished 6-14 or worse in conference. What does the Magic ESPN 8 ball predict?
 
That’s not “wrong” though. If I predict 5 heads out of 10 and get 8 that’s not wrong in a meaningful sense. It’s the right prediction.

First off, the model doesn't spit out coin-flip predictions most of the time, so the heads/tails analogy is inapt.

Second, I don't have what the numbers would have been on Jan 2nd.... but given that they were #10 (and facing 8, 17, 29, 36, 50, 58, 60, 62, 81, and 214) it's likely it was predicting more like OSU winning 7 out of 10, and it instead managed 1 out of 10. Even after all of those losses, the newly adjusted model still thought it should have managed 4 more wins than it did.

Given that the model's intended purposes is to be predictive at the game level, the larger "all teams" model clearly wasn't a good fit for Ohio State. One size fits most, sure - but that leaves still leaves a lot of teams falling through gaps.
 
But it's not an accurate prediction for those 10 flips.

Now flip your thinking. Instead of thinking about the variance or error regarding the individual prediction, think about the models as having error being applied to some teams. Not because of stochastic variation. But because the specifics of the team and the model do not fit. It can be systematically wrong for some teams even if it is an excellent overall model. Since some teams don't perform according to the assumptions underlying the model.
Yes agreed. The issue is that both kinds of error are there and we have no way to separate it without doing a lot of analysis and/or building our own models where we try to account for the factors we think might be missing.
Looking at their remaining schedule and recent performances, I will say OSU finished 6-14 or worse in conference. What does the Magic ESPN 8 ball predict?
8.4-11.6
First off, the model doesn't spit out coin-flip predictions most of the time, so the heads/tails analogy is inapt.
Nah. The 50% nature of the coin flip isn't relevant. You could use anything with a known probability to make the same point.
Second, I don't have what the numbers would have been on Jan 2nd.... but given that they were #10 (and facing 8, 17, 29, 36, 50, 58, 60, 62, 81, and 214) it's likely it was predicting more like OSU winning 7 out of 10, and it instead managed 1 out of 10. Even after all of those losses, the newly adjusted model still thought it should have managed 4 more wins than it did.

Given that the model's intended purposes is to be predictive at the game level, the larger "all teams" model clearly wasn't a good fit for Ohio State. One size fits most, sure - but that leaves still leaves a lot of teams falling through gaps.
It still isn't clear. Just providing an example of a team that did way worse than its prediction doesn't show anything. Those teams will exist every year. They would exist every year even if Kenpom's model rated teams perfectly. That's the point I'm making with the coin flip example.

If there is a thing that happens 99% of the time (and we know the 99% is correct), and we do 500 trials, there will be ~5 times the thing didn't happen. If you come along and point at those five cases and say "well obviously your model didn't account for something here" you would be wrong. The model was perfect. I'm not saying Kenpom is perfect (obviously it isn't) but just pointing at isolated examples of the teams that are off from their prediction is not evidence of its flaws.
 
If there is a thing that happens 99% of the time (and we know the 99% is correct), and we do 500 trials, there will be ~5 times the thing didn't happen. If you come along and point at those five cases and say "well obviously your model didn't account for something here" you would be wrong. The model was perfect. I'm not saying Kenpom is perfect (obviously it isn't) but just pointing at isolated examples of the teams that are off from their prediction is not evidence of its flaws.

I'm not even saying that the model itself is flawed - I'm just saying Ohio State was ranked too highly for too long and is finally drifting down after just a boatload of losses. As you say, there will be the cases where the model just doesn't fit the team for whatever reason - it's okay to say that Ohio State is an example of a team that didn't behave as the model predicted without it being a critique of the overall methodology.

As far as kenpom, Ohio State is one of just three teams in the Top 90 with a losing record (35 OSU, 62 WSU, 70 Nova). In bart, we see the same (43 OSU, 57 WSU, 82 Nova). In NET, also (41 OSU, 79 WSU, 90 Nova)... even though that's supposed to be more reflective of accomplishment than predictive. The other two teams are also a bit anomalous, but neither are showing up in the Top 50 anywhere. These are all teams that have good efficiency metrics, but just can't seem to win more than they lose.
 
Adults are talking here.

You expect that to happen some amount of the time though. If you want to show the model is wrong you have to show it happening more than expected. If I flip a coin 10 times and get 8 heads it’s not proof or much in the way of evidence that my coin is weighted.

That’s fair but then on the same hand, your declaration of RPI as a “completely worthless” tool seems unfair. (Notably RPI was used for a different intended purpose than power rankings) but it’s flaws were based on similar types of anomalies. No system is perfect, but adding one more loop to the RPI formula could have arguably worked wonders to correct the calculation’s biggest flaw - penalizing teams for doing what they are supposed to do in beating bad cupcakes.

Simply put - I think RPI’s flaws are more easily correctable than those of the efficiency systems. Instead they invented a whole new system that seems to have even more blending problems as a result of prioritIzing efficiency over results to assess resume quality. That’s my problem with the changes. I think there are still a lot of head scratchers with NET and the quad system seems to double down on them because the teams with anomalies tend to be major conference teams who play a lot of the bubblers.

All this said - while I’ve made a lot of noise about WAB, Bart’s SOR follows a similar concept and I think that one is a great tool. I wish they’d have replaced RPI with something like that.
 
  • Like
Reactions: RUChoppin
That’s fair but then on the same hand, your declaration of RPI as a “completely worthless” tool seems unfair. (Notably RPI was used for a different intended purpose than power rankings) but it’s flaws were based on similar types of anomalies. No system is perfect, but adding one more loop to the RPI formula could have arguably worked wonders to correct the calculation’s biggest flaw - penalizing teams for doing what they are supposed to do in beating bad cupcakes.

Simply put - I think RPI’s flaws are more easily correctable than those of the efficiency systems. Instead they invented a whole new system that seems to have even more blending problems as a result of prioritIzing efficiency over results to assess resume quality. That’s my problem with the changes. I think there are still a lot of head scratchers with NET and the quad system seems to double down on them because the teams with anomalies tend to be major conference teams who play a lot of the bubblers.

All this said - while I’ve made a lot of noise about WAB, Bart’s SOR follows a similar concept and I think that one is a great tool. I wish they’d have replaced RPI with something like that.
The reason I describe RPI as completely worthless is because it's not a true fitted model, it's just an ad-hoc formula that was designed by eyeballing the results. The design criteria were like "is this simple to calculate/understand" and "does this sort of look reasonable". It's not completely worthless for the purpose it was used for but it's just not a proper model. As you note, SOR and WAB are much better versions of a similar idea (that you only get credit for your wins and losses).
 
The reason I describe RPI as completely worthless is because it's not a true fitted model, it's just an ad-hoc formula that was designed by eyeballing the results. The design criteria were like "is this simple to calculate/understand" and "does this sort of look reasonable". It's not completely worthless for the purpose it was used for but it's just not a proper model. As you note, SOR and WAB are much better versions of a similar idea (that you only get credit for your wins and losses).

I don’t like WAB as an assessment tool either because the blending factor with the bubble as baseline creates it’s own set of flaws that seem pretty material when you look at the rankings.

Nothing is perfect, but SOR seems to do as good a job as you could hope for to assess which teams have the best comprehensive resumes for field selection.
 
I don’t like WAB as an assessment tool either because the blending factor with the bubble as baseline creates it’s own set of flaws that seem pretty material when you look at the rankings.

Nothing is perfect, but SOR seems to do as good a job as you could hope for to assess which teams have the best comprehensive resumes for field selection.
SOR is basically the same thing I believe but it's using #25 as the baseline (I am not 100% on this just my memory)
 
SOR is basically the same thing I believe but it's using #25 as the baseline (I am not 100% on this just my memory)

It is. I just think top 25 is a way better baseline for a sorting tool metric than the bubble because wins over solid field teams are more meaningful to the committee for both selection and seeding. I’m not arguing the philosophy of what “should be” one way or the other. Simply stating my opinion that the best metric ought to be one that best correlates to what will actually be prioritized by the committee.

SOR captures this much better than WAB which, as an example, still considers 4 MWC teams to be top 25. SOR only has one. No bracketologists has more than one MWC team in the top half of their bracket.
 
  • Like
Reactions: fluoxetine
It is. I just think top 25 is a way better baseline for a sorting tool metric than the bubble because wins over solid field teams are more meaningful to the committee for both selection and seeding. I’m not arguing the philosophy of what “should be” one way or the other. Simply stating my opinion that the best metric ought to be one that best correlates to what will actually be prioritized by the committee.

SOR captures this much better than WAB which, as an example, still considers 4 MWC teams to be top 25. SOR only has one. No bracketologists has more than one MWC team in the top half of their bracket.
My belief is that 1) who you are now requires both memory of who you were and some type of recency weighting, 2) since outcome are binary you should never go down after an ugly win, and 3) while you may get credit for internal functioning during a loss, you still have to be penalized for losing. This is for a ratings model. Recency also matters for predictive models.
 
My belief is that 1) who you are now requires both memory of who you were and some type of recency weighting, 2) since outcome are binary you should never go down after an ugly win, and 3) while you may get credit for internal functioning during a loss, you still have to be penalized for losing. This is for a ratings model. Recency also matters for predictive models.
For ranking who is the best, sure, this seems reasonable. So yes, for a power ranking objrctive those would be fair considerations.

But even if the “hot” team is harder to beat than the team that won a bunch of games in December, NCAA selection has to treat all games the same otherwise you marginalize the beginning of the season which is to the detriment of all college basketball following. I’m most interested in rankings based on what will happen with tournament selection since that’s the goal of the whole season.
 
For ranking who is the best, sure, this seems reasonable. So yes, for a power ranking objrctive those would be fair considerations.

But even if the “hot” team is harder to beat than the team that won a bunch of games in December, NCAA selection has to treat all games the same otherwise you marginalize the beginning of the season which is to the detriment of all college basketball following. I’m most interested in rankings based on what will happen with tournament selection since that’s the goal of the whole season.
I think the point you highlight about hot team vs early is a real problem made worse by how schedules are structured with OOC games early in the season. Since the tournament is a championship, and champions win under pressure and when it matters late in the season it seems to me that somehow that should be captured.

That's why in the old college football playoff (4 teams) I would NEVER take the loser of a conference championship game. They were playing for a championship and lost. Next team up.

There is no such arbitrary characteristic of best.

There are winners and losers.
 
OSU…WAY Over rated here for sure. Went down to Assembly Hall last weekend to see the OSU game. They don’t seem to have a lot of fight or be very connected especially on the Offensive end, despite a great deal of talent. I've long thought Holtman has been too hands own when his teams have the ball. I will be surprised
if he survives this train wreck
of a season. If that happens, Both their incoming class and current players will be on the open
market.
 
I think the point you highlight about hot team vs early is a real problem made worse by how schedules are structured with OOC games early in the season. Since the tournament is a championship, and champions win under pressure and when it matters late in the season it seems to me that somehow that should be captured.

That's why in the old college football playoff (4 teams) I would NEVER take the loser of a conference championship game. They were playing for a championship and lost. Next team up.

There is no such arbitrary characteristic of best.

There are winners and losers.
I’m not sure I follow. It’s a known fact that all games count the same for making the tournament. From they players perspective, winning those early games is just as important. Teams peak at different times and can certainly surge more than once in a season.
 
I don't ever think they were a great team. I think at a key point in the morning inputs they played at their very best. They have regressed to their mean and the model doesn't recognize that because of how schedules are constructed.

Also while efficiency typically correlates with overall performance the teacher of that correlation will vary team to team, across styles of play etc. It may be that UConn is one of those teams for which that correlation is not so great.

Models function on average. Doesn't make then work for every example of. And these models don't build in very much appreciation for heterogeneity.

Just because UConn's rating is absurd as it is, is not an indictment of models or statistics. It just shows that in any individual instance models can err.

Inviting @SkilletHead2 into this conversation.
Hey Loyal,
On a cruise to the Antarctic right now and only have very short email. Will try to throw my two cents in at a later point.
Best,

Jeff
 
ADVERTISEMENT
ADVERTISEMENT