Youth Soccer Rankings ?

Not arbitrarily defining. The threshold that SR has defined is that the team shows a rating. Before that threshold, the team has a (hidden) rating, but it isn't statistically relevant, and isn't displayed. Once the team has enough recorded games so the results of comparing ratings would be statistically relevant - the rating is shown. You can see this in a team's game history - where they may be showing an overperform or an underperform against a team that currently has no rating. It can do this as it's being compared to the team's hidden rating - which isn't yet solid enough to display.

If you're comparing results between a team that has a rating, and an unrated/unranked team, you can consider it "more of an educated guess". If you're talking about a game (any game) between two rated teams, it's not an educated guess. It's a prediction of how the teams are expected to perform, based on all of their performance up until that game.
You realize this is the same thing that I described. Its just broken out by threshold. Below the threshold is an educated guess. Above the threshold is a resonable prediction.

I dont think you understand data.

I do think that you're learning about data through the ranking app. Because of this you view everything through the lens the ranking app provides.
 
You realize this is the same thing that I described. Its just broken out by threshold. Below the threshold is an educated guess. Above the threshold is a resonable prediction.

I dont think you understand data.

I do think that you're learning about data through the ranking app. Because of this you view everything through the lens the ranking app provides.

I don't think you're reliable at fairly representing what you've already stated. Nor do I think you understand data as well as you think you do.

You've defined the threshold, for yourself, at some level between teams that have played each other often, and teams that have played each other less often. You've said that below that threshold it's an educated guess, while above that threshold it's more accurate. You used the example of SoCal ECNL teams to state the case (which is another misrepresentation, but we'll save that for another time). This is wrong-headed thinking. There's logic in it - it just turns out to be wrong.

The threshold for a prediction to be reliable is whether a team has a rating. That's it. The prediction will fit into the predictivity expectations as shown repeatedly above.

The ranking app provides plenty of examples to show people's gap in understanding of probability & statistics. It's why Mark has stopped even talking about in detail (the calculations, predictivity, etc.) on social media, as it always devolves into people not being capable of understanding even the most basic tenets.
 
I’m seeing a team ranked in the 30-40th a week ago, jumped to 7th within a week after a tournament against older age group in the 3rd flight.
This can happen, without it being a function of the team playing up. When I see a huge swing like that, I check back a week later and oftentimes the team has settled back down quite a bit. Each time the algorithm is run it adjusts things, and you can't get too excited (or too depressed ;) ) about that first initial jump (or fall)
 
I understand what you are saying but blowing out flight 3 teams in the older age group should not give you that kind of boost in your own age group ranking.
Flight 3 means nothing to the algorithm. All that matters is the opponent's rating (the four-digit number on the far right) and how many goals they won by.
 
By flight 3 I meant to say teams ranking around 200th. My point is a club’s A team in the younger age group beating a flight 3 level team one year older should not boost their ranking so much. Why? Negate every A team in that younger group would have beaten that same flight 3 level team by the same margin.
 
I don't think you're reliable at fairly representing what you've already stated. Nor do I think you understand data as well as you think you do.

You've defined the threshold, for yourself, at some level between teams that have played each other often, and teams that have played each other less often. You've said that below that threshold it's an educated guess, while above that threshold it's more accurate. You used the example of SoCal ECNL teams to state the case (which is another misrepresentation, but we'll save that for another time). This is wrong-headed thinking. There's logic in it - it just turns out to be wrong.

The threshold for a prediction to be reliable is whether a team has a rating. That's it. The prediction will fit into the predictivity expectations as shown repeatedly above.

The ranking app provides plenty of examples to show people's gap in understanding of probability & statistics. It's why Mark has stopped even talking about in detail (the calculations, predictivity, etc.) on social media, as it always devolves into people not being capable of understanding even the most basic tenets.
Alright you two. Put your dicks away. I'm pretty sure you two are saying the same thing. The more data received over time, it is likely the prediction will have a higher success rate.
 
No, we're not saying the same thing. He thinks we are too.

Without any context, more data is certainly better than less data. But at some threshold - once past that point, more data does not provide a higher success rate.

Flip a coin 100 times in a row and get heads every time. For the 101st time, what are the chances you will get heads again?

Same with a roulette wheel that keeps coming up red. Sure seems like betting on black is the right thing to do - but the odds of that spin are the same as every other.

Once the model has enough data to make the prediction reliable, within the confidence value they have set, adding much more additional game result data doesn't really do much for the predictivity afterwards. Sure seems like it should - but at the point that the team entity rating has a confidence value they are comfortable with, and it is displayed to the user - it is already within the bounds of the model's predictivity. If the team played 100 more games - it's not like the prediction gets any better quality - it's still highly dependent on the most recent info (nowhere near 100 - it's a lot closer to 10).

People have glossed over it - but check out the provided statistics on predictivity above. The model is slightly more likely to pick the correct winner between two teams in different states, than teams that are in the same state. Seems like two teams playing each other often would be more predictive than two teams that rarely play each other - but this is where our intuition can fail us.
 
No, we're not saying the same thing. He thinks we are too.

Without any context, more data is certainly better than less data. But at some threshold - once past that point, more data does not provide a higher success rate.

Flip a coin 100 times in a row and get heads every time. For the 101st time, what are the chances you will get heads again?

Same with a roulette wheel that keeps coming up red. Sure seems like betting on black is the right thing to do - but the odds of that spin are the same as every other.

Once the model has enough data to make the prediction reliable, within the confidence value they have set, adding much more additional game result data doesn't really do much for the predictivity afterwards. Sure seems like it should - but at the point that the team entity rating has a confidence value they are comfortable with, and it is displayed to the user - it is already within the bounds of the model's predictivity. If the team played 100 more games - it's not like the prediction gets any better quality - it's still highly dependent on the most recent info (nowhere near 100 - it's a lot closer to 10).

People have glossed over it - but check out the provided statistics on predictivity above. The model is slightly more likely to pick the correct winner between two teams in different states, than teams that are in the same state. Seems like two teams playing each other often would be more predictive than two teams that rarely play each other - but this is where our intuition can fail us.
Okay.... obviously your roulette or coin flip is correct. It's not taking in account of how many goals are scored, how many shut outs, how 1 team fared against a certain opponent over the other, what tournament, when the tournament was, who the coach is, et . It's not like flipping a coin. Their are different variables in data that can be used in an algorithm to make success rates higher over time.
 
It's the same concept - follow it through. Once it has enough information in order to set the rating - adding additional information doesn't make the rating any better. That doesn't mean it's "perfect". It just means that additional similar information doesn't make it any more reliable. What keeps it as reliable as it can be is continuous incoming information, and as game data gets progressively older, eventually it is old enough that it is no longer predictive enough to make a difference for the current rating - the rating is dropped; the team becomes unrated/unranked.

All of the millions of factors that can cause a soccer game to end a certain way can certainly affect the score of a game. But once all you're looking at is just the scores alone - and using just those scores (and the relationship between them) to predict future scores - none of the other factors matter a whit. It just confuses the issue to try and factor in how they might affect the rating itself.

Now if you are looking at an upcoming game, you know that the SR for 1 team is 3 goals better than the score for the other team - but you also have additional information that their 3 top forwards are not expected to be there - you certainly can color that 3 goal difference prediction and expect it to be a smaller gap, or even a loss. And you might be right - it might have a been a better prediction, and SR had no knowledge of the additional data you were able to use to assist. It's not like any of that info "doesn't matter", of course it does. It's just that SR on its own doesn't need to take any of it into account - and even if it wanted to - it cannot. All it's ever looking at is the scores & relative performance difference between the two opponents.
 
What keeps it as reliable as it can be is continuous incoming information, and as game data gets progressively older, eventually it is old enough that it is no longer predictive enough to make a difference for the current rating - the rating is dropped; the team becomes unrated/unranked.

So what happens if you weight games from similarly ranked teams of the 20 games used for defining a ranking?

And just for fun let's say a small group of highly ranked teams only played each other for the most part.

Would this result in an artifically higher ranking than a ranking that wasnt similarly ranking weighted?

The answer is yes it would/does. Just to make it simple say almost every result against similar ranked teams was at the expected number of goals. Except one game against a not similar ranked team which was below the expected number of goals. Because the similar ranked results are weighted higher the poor result would get "flushed" out of the data used to define a ranking faster. Now that you understand why this is happening imagine it happening over and over across multiple similar ranked teams that only play each other.

Add in that when two highly ranked teams play each other that the expected number of goals will be low for both teams. This means achieving this will be more likely. One goal is a more likely prediction than five.

Just to be 1000% clear I'm not saying that a highly ranked team wouldn't be highly ranked without similar games being weighted. I am saying that during the time a highly ranked group of teams only play each other (socal ecnl league) that the rankings do get pushed up a bit. If you're thinking you can write off other highly ranked teams that dont play in a feedback loop it would be a mistake. Initially, when you first start playing teams outside of league in something like playoffs. Once results of the first round of playoffs are added into the ranking app it would quickly update and the effect Im talking about would go away.
 
I understand what you are saying. You're articulating very clearly what you believe to be true. It just turns out to be not nearly as significant as the weight you're giving it, at least not for the reasons you're describing.

First - there is no such thing as an "artificial higher ranking", due to any weightings or calculations on any of the game results. It's not like adding butter to a recipe to make it taste better. Each and every portion of the ratings calculation is to optimize the future predictivity - that's it. If they are weighting teams that are closer to each other in rating compared to teams that are further in rating, it's not because of some perceived "fairness" that needs to be baked in. It's not determining that a 4 goal overperform is only 1.5 times as impactful as a 2 goal overperform because they feel it should be. It's not determining that a 2 week old result is 3 times as impactful as a 12 week old result based on feel (all parameters here are just my guesses). It's because creating the ratings with the parameters how they are right now, is currently what they have empirically found to predict the highest number of successful predictions. Full stop. The rating a team has is how it's expected to perform at its very next game, always.

The ratings of SoCal ECNL aren't "pushed up a bit" because the teams are close in rating. First off, the premise is wrong. The teams in that bracket are playing teams for their full schedule that are often as far from them in SR as teams in pretty much any other bracket. Yes - at the pointy end they have ratings that are among the top in the country. But from mid-table to bottom, they are still a ways off. Just looking at 2011G, the top team is over 6 goals stronger than the bottom team. In 2009G, they are a little closer, with top team to bottom team spanning 4 goals, same for 08/07G. Their SR ratings aren't "artificially" higher or lower than they should be, compared to any other hypothetical context we'd want to compare them to. As stated above, their ratings are an expectation for how they will perform for their very next game. Like every team. All of them in the database.

When one of these described teams has a result that doesn't follow what SR would predict - that new result is baked in as soon as its loaded. Whether it's an identical team from down the street with an equivalent rating, or a team that is 8 goals different in rated strength, it can affect the rating. Yes - the farther off a team rating is going in, the less weight is applied to that result. Again, not because of "fairness" or help to highly rated teams (or to further punish teams that have very low ratings), but because those results have been determined to be less predictive of future results.

If the results of playoff games turn out to be incorrect predictions - the team's ratings are adjusted with every game, to bring the model's predicted results closer to actual results - just like for any game. The weighting doesn't change for a playoff game, as there is no knowledge of difference in weighting from any game to any other. The kicker is that - it doesn't matter that the teams have never seen each other, or played on different sides of the country. It's not like the ratings are off and have to be normalized by the games being played - they are already as "correct" as they always are. But in playoff time - yes, the overall predictivity goes down a few percentage points. Only the better teams are playing, total number of goals goes down, and randomness has a higher effect on each game outcome.
 
I understand what you are saying. You're articulating very clearly what you believe to be true. It just turns out to be not nearly as significant as the weight you're giving it, at least not for the reasons you're describing.

First - there is no such thing as an "artificial higher ranking", due to any weightings or calculations on any of the game results. It's not like adding butter to a recipe to make it taste better. Each and every portion of the ratings calculation is to optimize the future predictivity - that's it. If they are weighting teams that are closer to each other in rating compared to teams that are further in rating, it's not because of some perceived "fairness" that needs to be baked in. It's not determining that a 4 goal overperform is only 1.5 times as impactful as a 2 goal overperform because they feel it should be. It's not determining that a 2 week old result is 3 times as impactful as a 12 week old result based on feel (all parameters here are just my guesses). It's because creating the ratings with the parameters how they are right now, is currently what they have empirically found to predict the highest number of successful predictions. Full stop. The rating a team has is how it's expected to perform at its very next game, always.

The ratings of SoCal ECNL aren't "pushed up a bit" because the teams are close in rating. First off, the premise is wrong. The teams in that bracket are playing teams for their full schedule that are often as far from them in SR as teams in pretty much any other bracket. Yes - at the pointy end they have ratings that are among the top in the country. But from mid-table to bottom, they are still a ways off. Just looking at 2011G, the top team is over 6 goals stronger than the bottom team. In 2009G, they are a little closer, with top team to bottom team spanning 4 goals, same for 08/07G. Their SR ratings aren't "artificially" higher or lower than they should be, compared to any other hypothetical context we'd want to compare them to. As stated above, their ratings are an expectation for how they will perform for their very next game. Like every team. All of them in the database.

When one of these described teams has a result that doesn't follow what SR would predict - that new result is baked in as soon as its loaded. Whether it's an identical team from down the street with an equivalent rating, or a team that is 8 goals different in rated strength, it can affect the rating. Yes - the farther off a team rating is going in, the less weight is applied to that result. Again, not because of "fairness" or help to highly rated teams (or to further punish teams that have very low ratings), but because those results have been determined to be less predictive of future results.

If the results of playoff games turn out to be incorrect predictions - the team's ratings are adjusted with every game, to bring the model's predicted results closer to actual results - just like for any game. The weighting doesn't change for a playoff game, as there is no knowledge of difference in weighting from any game to any other. The kicker is that - it doesn't matter that the teams have never seen each other, or played on different sides of the country. It's not like the ratings are off and have to be normalized by the games being played - they are already as "correct" as they always are. But in playoff time - yes, the overall predictivity goes down a few percentage points. Only the better teams are playing, total number of goals goes down, and randomness has a higher effect on each game outcome.
You like to make absolute statements without anything to back them up. I've gone through the math and what I described is occuring. You can try to minimize it by saying things like it wouldn't make that much of a difference etc. But the reality is it's occuring.
 
I understand what you are saying. You're articulating very clearly what you believe to be true. It just turns out to be not nearly as significant as the weight you're giving it, at least not for the reasons you're describing.

First - there is no such thing as an "artificial higher ranking", due to any weightings or calculations on any of the game results. It's not like adding butter to a recipe to make it taste better. Each and every portion of the ratings calculation is to optimize the future predictivity - that's it.
Is it possible that the weighting Carlsbad7 is focused on optimizes the future predictivity overall but in doing so sacrifices some predictivity for subgroups that tend to play more games against similarly-rated opponents? I ask because in the stats you cited above the model is significantly less robust for the top 100. I assume that if they could tweak the model to make the top 100 predictions more accurate without sacrificing accuracy overall (or elsewhere) they would. Maybe the weighting is part of that trade-off -- helps the model overall but hinders it in other respects?
 
Is it possible that the weighting Carlsbad7 is focused on optimizes the future predictivity overall but in doing so sacrifices some predictivity for subgroups that tend to play more games against similarly-rated opponents? I ask because in the stats you cited above the model is significantly less robust for the top 100. I assume that if they could tweak the model to make the top 100 predictions more accurate without sacrificing accuracy overall (or elsewhere) they would. Maybe the weighting is part of that trade-off -- helps the model overall but hinders it in other respects?
No its not like that.

You could have a group of 200 level ranked teams that only play each other and they would experience the same effect. The issue is similarly ranked teams only playing themselves and how the ranking app weights similar ranked games.

The reason why its an issue with the highest ranked teams is there isn't any higher to go so the effect becomes more visible. Once teams start playing teams outside of their group the effect goes away.
 
You like to make absolute statements without anything to back them up. I've gone through the math and what I described is occuring. You can try to minimize it by saying things like it wouldn't make that much of a difference etc. But the reality is it's occuring.
You like to ascribe patterns and reasoning where there may be none, and misunderstand what the math actually represents.

Is it possible that the weighting Carlsbad7 is focused on optimizes the future predictivity overall but in doing so sacrifices some predictivity for subgroups that tend to play more games against similarly-rated opponents? I ask because in the stats you cited above the model is significantly less robust for the top 100. I assume that if they could tweak the model to make the top 100 predictions more accurate without sacrificing accuracy overall (or elsewhere) they would. Maybe the weighting is part of that trade-off -- helps the model overall but hinders it in other respects?

Assume that the weightings right now are optimized for the full data set, to keep the predictivity as high as it can possibly be (a good assumption, that's the primary intent). If instead they looked only at the top 100 nationally and discarded the rest, it's certainly possible the weightings would be somewhat different to make the predictions more accurate than they are now - while making overall predictions less accurate. So could they bifurcate the weightings to apply them differently depending on which group a team entity currently sits in? I don't know, but I certainly think it's possible. But my impression is that it wouldn't make much of a difference. The top 100 doesn't have slightly less predictivity (3 of 4 correct predictions compared to 5 of 6) because of inherent differences in the teams. It seems to be because the game outcomes are always closer and there are less goals scored. SR has stated that that this closeness allows simple randomness to play a bigger part in game outcomes, which ends up with lower predictivity. I can ask them for their opinion, and if they've considered it.
 
SR has stated that that this closeness allows simple randomness to play a bigger part in game outcomes, which ends up with lower predictivity. I can ask them for their opinion, and if they've considered it.
Someone over there is close to their keyboard today, I got a response almost immediately. :) They don't believe there is any difference in the signal to be gained. It's as stated above, the fact that games between the top 100 are closer with less goals - an optimized model is pretty much already there.
 
It's the same concept - follow it through. Once it has enough information in order to set the rating - adding additional information doesn't make the rating any better. That doesn't mean it's "perfect". It just means that additional similar information doesn't make it any more reliable. What keeps it as reliable as it can be is continuous incoming information, and as game data gets progressively older, eventually it is old enough that it is no longer predictive enough to make a difference for the current rating - the rating is dropped; the team becomes unrated/unranked.

All of the millions of factors that can cause a soccer game to end a certain way can certainly affect the score of a game. But once all you're looking at is just the scores alone - and using just those scores (and the relationship between them) to predict future scores - none of the other factors matter a whit. It just confuses the issue to try and factor in how they might affect the rating itself.

Now if you are looking at an upcoming game, you know that the SR for 1 team is 3 goals better than the score for the other team - but you also have additional information that their 3 top forwards are not expected to be there - you certainly can color that 3 goal difference prediction and expect it to be a smaller gap, or even a loss. And you might be right - it might have a been a better prediction, and SR had no knowledge of the additional data you were able to use to assist. It's not like any of that info "doesn't matter", of course it does. It's just that SR on its own doesn't need to take any of it into account - and even if it wanted to - it cannot. All it's ever looking at is the scores & relative performance difference between the two opponents.
It's not the same concept. No one said it was perfect. the data will improve its predictability over time to a certain extent. How much will it improve is the actual question. The last 3 games my son played, the AI was correct 100% down to the goals. If you think it's same as flipping a coin, good for you. Good luck.
 
You're either intentionally or unintentionally misunderstanding what I'm saying. The flipping a coin example is to show how adding even unlimited amounts of data - may not make a better prediction for what's going to happen on the very next event. It still is going to be one of the two possibilities - and there's a 50/50 chance. The corollary is that once you've flipped enough coins, you have a very good idea about the possible options and the relation between them. It's not very many. Any more isn't going to buy you much in determining potential outcomes.

The SR model is doing the same thing. It's looking at tens of thousands of matches, and using the data to see what happens in the latest match. It's more complex than flipping a coin - but it's the same thing. It's just a huge matrix of inputs to compare with a matrix of results. Very early in, you're going to land on potential & likely possibilities, and soon after - adding additional inputs doesn't change much in determining likely potential outcomes. It's not 50-100 games. Heck, Mark has shared that they're not even looking back further than the last 20 games, and 5-8 games is usually enough to meet their statistical goals and post the rating publicly.

It's not AI. But it is a pretty fantastic model to predict youth soccer game results based on previous results. I smirk when a team goes in to a game with a certain score expectation, and it's way off until almost the end of the game, yet somehow it seems to get close. I enjoy it more when something unexpected happens, like last weekend when one of our teams was expected to have a weekend with a pair of pretty significant losses against highly ranked teams, yet walked away with a draw and a 1-goal loss.
 
Back
Top