Youth Soccer Rankings ?

First - let me say congratulations on what looks to have been a fantastic start to this season. The team is clearly playing very well, and has become the standout team in their league. All of that should be a much more important realization than any of this silly rating discussion.

I guess in the end it’s more high stakes for these two teams to play lower teams should they not beat their predictions. I think those goal differentials are more achievable in 9v9 but not so much in 11v11 once these teams get their bearings on playing a big field.

Maybe it would help to think about it slightly differently. The current rating of the team (any team) *is already* a reflection of how they've played in the past. Their exact performances in the past, the goal differentials against teams of certain ratings, all of the previous team results - this is reflected in their current rating. To keep their current rating - they essentially have to do nothing more, and nothing less, than they've already done. Looking at the current predictions in the future and thinking "that will never happen, they are too onerous" is too negative - those predictions are based on what has already happened looking backwards.

Now if the goal is to improve the current rating, and take a team for example from 41 to 43 within a certain time period - the math is pretty simple. For those same expectations, they need to overperform by about 2 goals every time (on average). If they can do that for a month or two - that's exactly what the rating would show. And if they have injuries or other challenges and they underperform by 1 goal (on average) for a month or two, their rating should go from a 41 to a 40. Whether it's done by ensuring the blowouts come in exactly on target, or a close game becomes a little less than a close game, the benefits (or penalties) would be the same. The interesting part is by the end of that journey, the predictions will start to take into account the new performance, and then the team will need to perform at those new expectations going forward, to keep that improved rating.

I think it’s also kind of always been my gripe with the website and now the program that teams can play in lower leagues and hide in bottom bracket tournaments and it inflates their rankings because of goal differential.

This is a bit inflammatory, and no matter how many times these insinuations are stated as fact - I haven't been able to find any actual examples that show this occurring. I looked in Southwest, Texas, Florida, NY, IL, and I couldn't come up with a representative example of a highly ranked team (Top 25 nationally) that had a limited game history that was just against weak opponents in their own league where they had an inordinate amount of blowouts. There may be ones out there, and I'm hoping that someone can point me in the right direction. What teams or leagues are you referring to? But in this thread - Solar (ECRL and ECNL) are both counter examples that show this not to be true. There is a ton of game history outside their own leagues that ensures that the ratings assigned take into account all sorts of competition, which both help and hurt from time to time depending on the opponent.

A cynic might say that Slammers RL are guilty of what you're accusing others of doing. Undefeated in league, 3 goals against them all year, yet the team isn't chasing enough worthwhile opponents to actually test their mettle in tournaments where they have a reasonable chance to lose. How many of these talented girls should be on an ECNL team instead once reaching 2009G?

I’m very familiar with how the ratings work, one higher rated team has a predictive value and another has their predictive value and that difference is goal differential. But a 2010 RL team rated 35 is predicted to lose to the top 2014 team. I can’t imagine a group of 8 year olds beating on 12 year olds.

There is a well-respected club near me that runs their youngers 1 year up all the way through U12 for their top team, and they are quite successful. In some tournaments they are entered 2 years up. Turns out that a very good 2013 team can in fact embarrass an average (or weaker) 2011 team. Watching a bunch of tiny sprites fly around the field against a bunch of larger but often slower and less talented players, can be fun to watch. Once kids get much older than this though, it starts to become a safety concern and the size/strength differences can't be ignored forever. I'm with you, 8 year olds playing 12 year olds would be a bit of a stretch. And validating whether a "35" at 2014 is exactly the same as a "35" at 2010, isn't something that can probably be confirmed in the real world. The cross-play from age group to age group to age group is just too far a chain of results to ensure that there isn't drift between the rating scales of the two teams. So it's a bit of a moot point. We don't know whether the 35's mean the same in that comparison, and while we can debate it - it can never be settled authoritatively.

I think there should be some type of cap for rating (wins, losses, goal scored, goals against, age, and difficulty of bracket)

You, and I, and probably most people other than Mark and any of his team members/helpers, have almost no idea about what goes in to the algorithm(s) - other than it's related to goals scored in prior games. The types of tweaking, caps, weightings, timings, and all of the other manipulations that can be done to past data to see if it better predicts future data is exactly what Mark has been doing for years. And continues to do in this app. What would you do if you had a system with millions of games, and had the chance to run countless models on the data to see how well different factors and operations can be applied to better predict game results (that have already happened)? All of the hypotheses that one could come up with ("weight last months result 1/3 less than this months, or weight it 1/2 less." "Drop results older than 6 months to .3 weighting vs. .2 weighting" "Refactor goal differential so 5 goal differences are only slightly higher than 4 goal differences to minimize blowout effect") are the types of things that you get to optimize for when you have the data. And all of that was massaged over time at YSR, but now is even more possible with the app. Just look at the predictions - there is now an estimate for what the score is likely to be. There are percentages given for chance of win / tie / loss. These don't come from a random number generator - they are provided as the back end of a ton of past data on team performance.

Is any of it ever going to be smart enough to predict the future with no variation - of course not. The real world doesn't work like that, for reasons that are obvious to all. But the griping about a simplistic algorithm or bad results feels off-base - the griping instead should be if the predictions turn out to be off or unusable on a regular basis. If someone has a better way of looking at team data on their own and predicting future performance more effectively than this - prove it to yourself! Use the data you have, look at the upcoming games, and document your predictions. If it turns out that the app isn't any better than what's possible without - save the $10 and don't give it a second thought.
 
A cynic might say that Slammers RL are guilty of what you're accusing others of doing. Undefeated in league, 3 goals against them all year, yet the team isn't chasing enough worthwhile opponents to actually test their mettle in tournaments where they have a reasonable chance to lose. How many of these talented girls should be on an ECNL team instead once reaching 2009G?

You have to remember slammers had 2 ECNL teams and 2 ECRL teams per age group. These are the girls who didn’t get picked for either ECNL teams. So there’s no sandbagging on this one, just a lot of talent at slammers and not enough appeal for players to go elsewhere. Team did win Surf Cup and as the 3rd Slammers team was placed in their appropriate bracket. Games were competitive in the subsequent games including the Beach match that was a 1-4 loss. Even though this team has had closer games when they played previously against each other, but I digress

As for Solar RL, I think there’s plenty of data to show they should drop off with 4 losses to teams rated significantly lower than them and a series of underperforming match outcomes yet they remain very highly ranked as an RL team. But either way, time will have everything play out and stabilize.

We’re looking at a transition year moving into 11v11 and some teams are not reflective of their current team because of that transition as well as roster changes that normally come with moving to 11v11 and new leagues.

I don’t know how much historical data weighs versus current results, but there’s teams that are stacked high because of their matches from nearly 2 years ago (slammers included). What I’d like to know is, when we are rating teams, are we scoring them based off what their rating was at the time they played versus what their rating is now?

An example is Slammers McCarty Black was highly rated (top 13 in SoCal, top 25 Ca) when it was 9v9, but that team no longer has that roster (95% of the team went to NL/RL teams). So naturally their ranking drops, but at the time they played higher rated teams, is the algorithm accounting for their rating at that time and not the current rating? Because I’m seeing the matches between that team and the current team showing as a deduction in points even though at that time, the predictive value probably would not have changed either team’s points value.

Don’t get it twisted, I’m a supporter of Mark and what he’s doing, but these are questions that people are asking
 
I don’t know how much historical data weighs versus current results, but there’s teams that are stacked high because of their matches from nearly 2 years ago (slammers included). What I’d like to know is, when we are rating teams, are we scoring them based off what their rating was at the time they played versus what their rating is now?

An example is Slammers McCarty Black was highly rated (top 13 in SoCal, top 25 Ca) when it was 9v9, but that team no longer has that roster (95% of the team went to NL/RL teams). So naturally their ranking drops, but at the time they played higher rated teams, is the algorithm accounting for their rating at that time and not the current rating? Because I’m seeing the matches between that team and the current team showing as a deduction in points even though at that time, the predictive value probably would not have changed either team’s points value.

All of the game history is assigned to a team entity, and the rating of that team entity is affected by every game as it happens. To see what ratings are affecting a team, it's just the listing of all games on the team page itself, with the links to the source data included at the bottom. From the app's perspective, a team is just that - a collection of game data. As game data is added (either 1 game by 1 game over time), or in bulk (as a new data source from a tournament or even a league is assigned to a team entity), that new data is used to adjust the team rating. There is no concept of roster changes, 9v9 vs 11v11, or anything else. When a new game result is added, all that is necessary is to tie this team entity to the opponent team entity, and to see if the score of the match is greater or lesser than the predicted score of the match, which is the difference in ratings between the two entities. If the opponent isn't rated at all, it seems to sit there dormant and not affecting anything until or if the opponent ever does become rated. That's really it.

Where some of the opaqueness comes in, is trying to figure out what happens when old data is assigned to a new team. If I realize that last year's team X in this tournament is actually the same as team XX that I'm looking at, and I pull in 4 games from February 2022, how does that affect current rating? Basically - is it looking at a comparison of what the rating was for both opponents back in February 2022, accounting for that new data, then coming back to present day to calculate current rating. Or, is this new data applied to current rating, but unweighted very significantly because it's 8 month old data anyway. My hunch is that it's the latter. Otherwise - there would have to be a daily history of existing rating kept for each team forever - and that seems unlikely from a data management standpoint. Especially since we know the app is calculating ratings in real time as soon as new data is added; there isn't a daily batch process. We know that there is some history kept for the rankings graphs to be plotted - but it's unclear whether the rankings are calculated once and then static, or if those rankings are variable due to past ratings being variable.

What I do feel is that older game data may matter less than one would think. We had significant team history issues in the migration from GotSoccer to GotSport. Team history between several teams in the club got munged together and then actually assigned to the wrong team when entered into GotSport. It went on long enough before being noticed, that by the time it was, it actually become easier to just leave them as is (initially wrong) than try and sort everything back out. This not only hosed GotSport rankings for awhile, but affected the team's ratings in YSR (and now the app). But after a few months of new and accurate game data, the rating for our teams shot up significantly (5 points in less than a year). To move it 1 point, it really can be done in just a few weeks of games. A team that performs at level X for 6 months just isn't going to be at the rating X+2 because of what happened a year ago or earlier, the ratings are much more fluid.

I'm trying to make sure I'm looking at the same team you are referring to. Is it CDA Slammers North McCarty Black Whittier? It looks like one that is playing in the SOCAL Fall League, most recently losing to Chelsea SC Langsford 1-2 on 10/30? It looks like it's now 115 in state. The game results look pretty spot on going back all the way to May, I count 32 games - of which 28 of them they performed as expected. In 4 of them they underperformed. You have to go all the way back to January before finding any games where they overperformed enough to affect their rating much positively. But back to present day - they currently show a 38.07, and that's the rating that would be used to predict a game tomorrow, regardless of any history.

I see that there is a "CDA Slammers Whittier McCarty Drop" team entity in the Unranked teams area. It has game data assigned to it as recent as February 22, at the SoCal State Cup G2010 Super. Earlier data includes the 2021 Silverlakes Fall Showcase, the 2021 Players Challenge Cup, and some more. None of this game data is being used to rank any team at all right now, as it's not assigned to any team that has played in the last 7 months. Does this represent the older team you're referring to?
 
All of the game history is assigned to a team entity, and the rating of that team entity is affected by every game as it happens. To see what ratings are affecting a team, it's just the listing of all games on the team page itself, with the links to the source data included at the bottom. From the app's perspective, a team is just that - a collection of game data. As game data is added (either 1 game by 1 game over time), or in bulk (as a new data source from a tournament or even a league is assigned to a team entity), that new data is used to adjust the team rating. There is no concept of roster changes, 9v9 vs 11v11, or anything else. When a new game result is added, all that is necessary is to tie this team entity to the opponent team entity, and to see if the score of the match is greater or lesser than the predicted score of the match, which is the difference in ratings between the two entities. If the opponent isn't rated at all, it seems to sit there dormant and not affecting anything until or if the opponent ever does become rated. That's really it.

Where some of the opaqueness comes in, is trying to figure out what happens when old data is assigned to a new team. If I realize that last year's team X in this tournament is actually the same as team XX that I'm looking at, and I pull in 4 games from February 2022, how does that affect current rating? Basically - is it looking at a comparison of what the rating was for both opponents back in February 2022, accounting for that new data, then coming back to present day to calculate current rating. Or, is this new data applied to current rating, but unweighted very significantly because it's 8 month old data anyway. My hunch is that it's the latter. Otherwise - there would have to be a daily history of existing rating kept for each team forever - and that seems unlikely from a data management standpoint. Especially since we know the app is calculating ratings in real time as soon as new data is added; there isn't a daily batch process. We know that there is some history kept for the rankings graphs to be plotted - but it's unclear whether the rankings are calculated once and then static, or if those rankings are variable due to past ratings being variable.

What I do feel is that older game data may matter less than one would think. We had significant team history issues in the migration from GotSoccer to GotSport. Team history between several teams in the club got munged together and then actually assigned to the wrong team when entered into GotSport. It went on long enough before being noticed, that by the time it was, it actually become easier to just leave them as is (initially wrong) than try and sort everything back out. This not only hosed GotSport rankings for awhile, but affected the team's ratings in YSR (and now the app). But after a few months of new and accurate game data, the rating for our teams shot up significantly (5 points in less than a year). To move it 1 point, it really can be done in just a few weeks of games. A team that performs at level X for 6 months just isn't going to be at the rating X+2 because of what happened a year ago or earlier, the ratings are much more fluid.

I'm trying to make sure I'm looking at the same team you are referring to. Is it CDA Slammers North McCarty Black Whittier? It looks like one that is playing in the SOCAL Fall League, most recently losing to Chelsea SC Langsford 1-2 on 10/30? It looks like it's now 115 in state. The game results look pretty spot on going back all the way to May, I count 32 games - of which 28 of them they performed as expected. In 4 of them they underperformed. You have to go all the way back to January before finding any games where they overperformed enough to affect their rating much positively. But back to present day - they currently show a 38.07, and that's the rating that would be used to predict a game tomorrow, regardless of any history.

I see that there is a "CDA Slammers Whittier McCarty Drop" team entity in the Unranked teams area. It has game data assigned to it as recent as February 22, at the SoCal State Cup G2010 Super. Earlier data includes the 2021 Silverlakes Fall Showcase, the 2021 Players Challenge Cup, and some more. None of this game data is being used to rank any team at all right now, as it's not assigned to any team that has played in the last 7 months. Does this represent the older team you're referring to?

Yes, that team dissolved right after their last match with Beach FC Ayala. Coach pulled the team out from state cup for a variety of factors. That group of girls became the Arsenal ECRL team for a few months before there was a coaching change and the team dissolved there after with girls now spread amongst various teams (Blues NL, Slammers RL, Pats NL, Arsenal NL). However, that team name lives on as his flight 2 team, McCarty White is now using the McCarty Black team name (signifying that this is the top team from the branch).

The Arsenal 2011 Pre-ECNL team is now the 2010 Arsenal ECRL team (and despite trying to merge these two, they continue to be separated).

So back to the prior results, you can see the Slammers North team performed very well until February 2022 at which point that roster was no longer the same (only 1 player from that roster is still on the team). But when it was performing, the rating was higher. You can see the same type of thing happen with other teams (Legends ECNL was ranked lower than McCarty’s team at that time, so those victories improved Legends ranking when they were played, however, if you look today, they are actually bringing Legends ranking down.
 

Attachments

  • 6166D73B-130E-4F0F-AADD-8B9A4526FA0B.png
    6166D73B-130E-4F0F-AADD-8B9A4526FA0B.png
    402.3 KB · Views: 5
Here’s a screenshot of what the rankings were in January 2022 back when McCarty Black had its original performing roster that was higher ranked than the Legends ECNL team at that time. Today, Legends is being penalized for their 3-1 victory over that Slammers Whittier Black team
 

Attachments

  • 97D8C42F-917F-40A7-8426-3177B6A5D030.jpeg
    97D8C42F-917F-40A7-8426-3177B6A5D030.jpeg
    376.1 KB · Views: 9
I'm looking at the Legends FC ECNL history that you attached, and I see what you mean about some of the games back in the April 2021 time period. They show beating the Slammers North McCarty Black Whittier team (3-1), and a draw (1-1). At the time, if Legends were higher, it would have helped their rating. But now the ratings are reversed, and it looks like that loss is instead hurting their rating. It's a fair question - and it's not impolite to ask Mark directly for his thoughts on how this is accounted for or dealt with. My hunch is that it simply doesn't matter that much, as games older than 1 month, 2 months, 3 months, 6 months, 12 months, get discounted in weighting so much that the effect on the present rating is minimal if it's even noticeable.

This is testable, if you are curious and want to do it with a team you have permission to tweak ratings for. Just delete all source data from over a year back, and see how much (if any) the rating changes. Any change will show immediately; ratings are calculated on the fly when source data is added/removed. Then just add that source data back to the team so there is a fuller game history once again.
 
I'm looking at the Legends FC ECNL history that you attached, and I see what you mean about some of the games back in the April 2021 time period. They show beating the Slammers North McCarty Black Whittier team (3-1), and a draw (1-1). At the time, if Legends were higher, it would have helped their rating. But now the ratings are reversed, and it looks like that loss is instead hurting their rating. It's a fair question - and it's not impolite to ask Mark directly for his thoughts on how this is accounted for or dealt with. My hunch is that it simply doesn't matter that much, as games older than 1 month, 2 months, 3 months, 6 months, 12 months, get discounted in weighting so much that the effect on the present rating is minimal if it's even noticeable.

This is testable, if you are curious and want to do it with a team you have permission to tweak ratings for. Just delete all source data from over a year back, and see how much (if any) the rating changes. Any change will show immediately; ratings are calculated on the fly when source data is added/removed. Then just add that source data back to the team so there is a fuller game history once again.

Mark’s a busy guy. I try not to bother him unless it’s necessary. I’m sure he gets plenty of emails. He’s usually very prompt when I’ve reached out in the past, and I’m sure he’ll iron this all out.
 
The Arsenal 2011 Pre-ECNL team is now the 2010 Arsenal ECRL team (and despite trying to merge these two, they continue to be separated).

This should be fixable with a few clicks (by either of us, or anybody else who can confirm the team info). I can find the Arsenal 2011 Pre-ECNL team.

Arsenal 2.jpg Arsenal 1.jpg

But I can't find anything that looks like a 2010 Arsenal ECRL team. There is nothing named "Arsenal" in 2010 Girls in either Ranked or Unranked teams that looks to be what you are describing. Can you share any link to where they are currently playing, or any other info that might help troubleshoot why they aren't showing up?
 
This should be fixable with a few clicks (by either of us, or anybody else who can confirm the team info). I can find the Arsenal 2011 Pre-ECNL team.

View attachment 15026 View attachment 15025

But I can't find anything that looks like a 2010 Arsenal ECRL team. There is nothing named "Arsenal" in 2010 Girls in either Ranked or Unranked teams that looks to be what you are describing. Can you share any link to where they are currently playing, or any other info that might help troubleshoot why they aren't showing up?
Sporting California USA = Arsenal
 
Wait, I understand what you mean now. Apologies. The team isn't called Arsenal at all now. Is this the team? And it should have the Arsenal 2011 team data tied to it?

sporting california.jpg
 

OK - here are the two sets of game sources:

2011 team: 2011 Sources.jpg

2010 team: 2010 sources.jpg

The 2011 team played as recently as 6/22. The 2010 team played as far back as 3/22 on their own. It looks like these teams both entered the 2022 Classic as two separate teams (5/21 - 5/22). From this data alone, it shows that one team didn't cleanly become the other. Some of their game history is clearly unique. If nothing changes going forward, the 2011 version of the team will just eventually time out of the rankings, and the 2010 version that is continuing to play, will continue to have games added to their own game history. If there are individual events (like for example, the Swallows Cup 2022 from 6/18) that were really the 2010 team and should be pulled over - it's just a click to do so. For what it's worth, it looks like they were playing against exclusively 2010 teams in that tournament (Slammers 2010 twice, Pateadores 2010, CFA).
 
OK - here are the two sets of game sources:

2011 team: View attachment 15030

2010 team: View attachment 15031

The 2011 team played as recently as 6/22. The 2010 team played as far back as 3/22 on their own. It looks like these teams both entered the 2022 Classic as two separate teams (5/21 - 5/22). From this data alone, it shows that one team didn't cleanly become the other. Some of their game history is clearly unique. If nothing changes going forward, the 2011 version of the team will just eventually time out of the rankings, and the 2010 version that is continuing to play, will continue to have games added to their own game history. If there are individual events (like for example, the Swallows Cup 2022 from 6/18) that were really the 2010 team and should be pulled over - it's just a click to do so. For what it's worth, it looks like they were playing against exclusively 2010 teams in that tournament (Slammers 2010 twice, Pateadores 2010, CFA).
The team from April / May left. But according to Mark, the team name keeps the game. It doesn’t follow the roster even though it’s 100% a different team with different coaches AG versus DO’b. The 2010 RL was taking over by the 2011 pre-ECNL team I think around august was when they started playing, maybe late July. This is why you don’t see any SoCal games from this season for that 2011 team.
 
The team from April / May left. But according to Mark, the team name keeps the game. It doesn’t follow the roster even though it’s 100% a different team with different coaches AG versus DO’b. The 2010 RL was taking over by the 2011 pre-ECNL team I think around august was when they started playing, maybe late July. This is why you don’t see any SoCal games from this season for that 2011 team.

A team is just a collection of game results, that represent the performance of that team. When teams change, the results can stay with the changed name, or they can stay with the original name, especially if that original team gets rebooted with the same name. But in general - as long as people are happy that the results are tied to who people can agree the team actually *is*, any option that represents that best is fine. Here's info from the FAQ for the app.

Team History Results

The coach wants to keep the black team history together and we fixed the team history to reflect that. This comes down to the philosophical question "what is a team?". Is it the group of players or the name of the team? Our policy is that if all concerned are happy then we will move history from an old team to a new team. However, if anyone objects then the team name defines the team and results with the same name will be grouped together.


Say that the Springfield Purple Puppies all pick up and move to a new club, the Smallville Raging Butterflies. There is no issue with the Butterflies adding (and keeping) all of the game results from the Puppies. Or they can decide not to, if they feel the team has changed enough (whether just roster, by name, or whatever). If the Springfield club reboots the team name, and has a new group of Purple Puppies, it's OK for them to add (or keep) the game results that were assigned to the previous incarnation. That would be OK as well. All Mark is saying is that if there is contention, and both the new Raging Butterflies, and the new/old Purple Puppies have any issue with how things are being assigned, the "winner" of that contention is the one with the same name, and the Purple Puppies would keep game history that was named Purple Puppies.

Since in Arsenal/Sporting's case, according to the info here, the whole team moved over, the game data shows the Coach is there, and the old team did appear to stop playing and become the new team - there is no "old" team to complain about results being assigned properly. So based on the info you provided, I moved the results over to Sporting California USA RL (2010 Girls).

sporting data sources.jpg

Turns out that this helped that team's rating/ranking quite a bit; they immediately jumped from 112th in state to 88th in state (for 2010G).

sporting new rank.jpg

Their most recent performance in RL appears to be underperforming as compared to that new rating (judged by the 4 "red" games, no green, and rest black), so over time the rating will continue to adjust to whatever the current performance is as new games are added each weekend.

sporting recent RL results.jpg
 
I wanted to come back and bump this thread with some new information about the Soccer Rankings (SR) app. I weighed the options of just starting a new thread, but figured it might make more sense to have the information consolidated here where there has already been so much discussion about the ratings/rankings/algorithm/etc.

So today Mark made a pretty incredible discovery, and I'm giddy because it was at least partially based on a suggestion I gave him. But before I get there, a little background might be helpful to ground the discussion. So first off, the way this system works is pretty well known and well described at this point, at least to folks who frequent this board. Game data is pulled in from a various electronic sources, and assigned to a team entity. If a correct team entity for the data can't be identified, it creates a new team entity. Rinse and repeat, continuing to add game results to each entity. If the game results have a rated team on the other side of it, the rating for each team is adjusted based on the new results. The ratings of the two teams are compared, and if the actual goal difference is more than expected by the existing ratings, the one who overperformed has their rating bumped up a smidge. If the goal difference is less than expected. the one who underperformed has their rating bumped down a smidge. If the goal difference is pretty much spot on with what was expected, neither team's ratings will move much at all. (more details on this up on the FAQ for the app)

There are a couple outcomes of these ratings, but essentially they are useful for predicting what is going to happen when two rated teams compete. Those predictions can be used to flight tournaments, choose proper league brackets, or as a fun prediction for how an upcoming weekend may be expected to play out. Now these predictions are never going to be 100% accurate (right every time), or 0% accurate (wrong every time); but the better the data, and the better the algorithm, the better quality the predictions can be. For definitions, Mark uses "predictive power" to state these same concepts. 0% predictive power means a coin flip (getting no better than 50% correct). 100% predictive power = god. You can convert predictiveness to the % of results correctly predicted by dividing by 2 and adding 50%. So 70% predictive power would translate to getting 85% of predictions correct. In all of these trials correct is defined as picking the correct winner, for games that result in a winner. If the wrong winner is chosen, it's a failure. Tie game results are excluded from these predictivity results.

With this setup, predictivity of the app isn't an estimate or a guess - it's a specific number that can be calculated as often as desired. Run through all the stored games in the database right now, and compare the predicted results using the comparative ratings, and the actual game results, and divide the correct predictions over all of the games being predicted, and 1 number gets spit out. Turns out this number, as of today, is 66.7% predictive over all games, which translates into picking the correct winner of the soccer game 83.35% of the time. So as expected, it's way better than a coin flip, and will pick the right winner about 5 out of 6 times. This predictive number is a validation that the ratings derived from the algorithm themselves have a certain level of accuracy. If the ratings were wildly inaccurate, the predictive number would trend to 0%; if the ratings were supernatural, the predictive number would trend to 100%. But by any measure, the real, provable, actual predictivity number is pretty darned good (and better than a well known other ranking system by more than 50 points, it's insane). For any skeptics that doubt that youth soccer can be ranked/rated, or even skeptics of this particular algorithm / ranking system, the predictivity number is what mathematically shows the expected probability - and it's an admirable number.

But that still isn't the interesting discovery. Here comes the interesting discovery. There is an intuition, even by proponents of this type of comparative ranking that uses goal differences, that the quality of the data (and the predictions) depends on how close the compared teams are to each other, and how many expected shared opponents they have. The more interplay, the better - the less interplay, the more drift. I believed that to be the case, as it seems reasonable. For example, if teams are in the same league, or same conference, or even same state; they play each other enough, that their comparative ratings will be honed and sharpened by each other, and would have a higher predictive value. And conversely, if you're comparing teams that are not in the same league, same location, may have never seen each other before, and have few if any common opponents - it makes intuitive sense that their comparative ratings would drift a bit more, and would be somewhat less accurate. Remember, this actual predictivity, this quality of each prediction, can be calculated by looking at the existing data for games that would fit into this category.

So what I suggested to Mark - and to be fair, he had also thought of himself within the past few days as well - was that he should exclude all in-state games, and measure the predictivity of interstate games exclusively. CA teams playing AZ, TX playing OK, or any other permutation in the country where the opposing teams are in different states. What this would do, is measure how good the predictions are, when there is very little shared information going into the upcoming game. Interplay is low. This represents what happens when you go to a big tournament elsewhere, as opposed to predicting what will happen with a local league game. He coded the query, ran the data, and a few hours later the number was spat out. And it turns out that for these interstate games, the algorithm is 67.0% predictive, which translates into picking the correct winner of the soccer game 83.5% of the time. So all of the intuitive worry about drift, or more local data being more refined than less remote data, turned out to be a false intuition. The comparative ratings, when used even across different states, provide just as good (and in fact a teensy bit better) predictions as when they are applied to local / in-league contests. If a team has sufficient data to be rated, that rating can be trusted regardless of extensive interplay or not. It's an incredible finding, and it validates all of the work and effort Mark and his team have done over the years to polish and refine the algorithm, tying game data to a useful rating.

And now to a real-world use, it looks like we're predicted to lose both games this Saturday with my youngest's team, so what's the leading recommendation to fill my thermos?
 
I wanted to come back and bump this thread...
Great post.

Anecdotally, I do think there is a socal under score. It could just be because this is where we live, but I have noticed that while the predictions are largely true when playing intra-socal, when socal teams play out-of-state tournaments, you can bump up most of their scores by a point or two. It could also just be that socal MLS Next is overloaded at the top. Or maybe we just play better on the road... who knows.
 
Anecdotally, I do think there is a socal under score.

I think there are a couple of factors here. The first one, is separating the idea of winning a game by more than expected or less than expected, from the idea of winning the game or losing the game. They are overlapping and related, but they are fundamentally different. The measured predictiveness is whether a correct winner was chosen - not the likelihood that it got the goal difference close to correct for that particular game. If you're supposed to win by 2 and you win by 4, that's a correct prediction. If you're supposed to win by 3 and you win by 1, that's a correct prediction. That overperformance/underperformance is exactly what's factored into the team history to maintain the rating - but the closeness to that rating isn't the success measure being discussed ("Did we win?").

Another relevant factor is that this predictiveness can be measured with different populations of games, to understand how the quality of the predictions changes for different types of opponents/games. All of these different populations report different predictiveness numbers, but in the big picture, in most cases they are pretty close. (e.g. girls games show about 2% higher than boys games, cross-year games show about 2% higher than all games, cross-gender games show about 20% lower than all games - in that case still picking 71% of the winners).

But the different population that may be relevant to the socal teams you're discussing, is that for teams in the top 100 by age (top 100 nationally, not top 100 in state), the predictive power is 50.2%, choosing the correct winner 75.1% of the time. That's still more than 3 out of 4 correct picks, but it is noticeably less predictive than the 5 out of 6 correct picks that can be seen in the overall population. I didn't check all age groups, but just flipping through 2010G for top 100, it shows that ~20 SoCal teams are in the top 100 nationally. If you just sort by CA 2010G teams, you have to go all the way down to the 22nd place team (Slammers FC RL), to see a team that is out of the top 100 (103rd in that case). It looks like only 3 of those teams are not in SoCal. (There are 483 ranked 2010G teams in CA, out of 2878 ranked 2010G teams nationally, or 16.7%)

That tells me two things. The first is that people following these ratings specifically for their favorite SoCal teams are going to see a measureable amount more of incorrect calls, compared to anyone following all of the other teams that aren't standing so close to the top step. It also shows that the top SoCal teams really are that good. Their high collective ratings are not a measurement error - having 20 of the top 100 just in SoCal is a testament to the quality of the game in that area - it's something to be proud of.
 
I think there are a couple of factors here. The first one, is separating the idea of winning a game by more than expected or less than expected, from the idea of winning the game or losing the game. They are overlapping and related, but they are fundamentally different. The measured predictiveness is whether a correct winner was chosen - not the likelihood that it got the goal difference close to correct for that particular game. If you're supposed to win by 2 and you win by 4, that's a correct prediction. If you're supposed to win by 3 and you win by 1, that's a correct prediction. That overperformance/underperformance is exactly what's factored into the team history to maintain the rating - but the closeness to that rating isn't the success measure being discussed ("Did we win?").

Another relevant factor is that this predictiveness can be measured with different populations of games, to understand how the quality of the predictions changes for different types of opponents/games. All of these different populations report different predictiveness numbers, but in the big picture, in most cases they are pretty close. (e.g. girls games show about 2% higher than boys games, cross-year games show about 2% higher than all games, cross-gender games show about 20% lower than all games - in that case still picking 71% of the winners).

But the different population that may be relevant to the socal teams you're discussing, is that for teams in the top 100 by age (top 100 nationally, not top 100 in state), the predictive power is 50.2%, choosing the correct winner 75.1% of the time. That's still more than 3 out of 4 correct picks, but it is noticeably less predictive than the 5 out of 6 correct picks that can be seen in the overall population. I didn't check all age groups, but just flipping through 2010G for top 100, it shows that ~20 SoCal teams are in the top 100 nationally. If you just sort by CA 2010G teams, you have to go all the way down to the 22nd place team (Slammers FC RL), to see a team that is out of the top 100 (103rd in that case). It looks like only 3 of those teams are not in SoCal. (There are 483 ranked 2010G teams in CA, out of 2878 ranked 2010G teams nationally, or 16.7%)

That tells me two things. The first is that people following these ratings specifically for their favorite SoCal teams are going to see a measureable amount more of incorrect calls, compared to anyone following all of the other teams that aren't standing so close to the top step. It also shows that the top SoCal teams really are that good. Their high collective ratings are not a measurement error - having 20 of the top 100 just in SoCal is a testament to the quality of the game in that area - it's something to be proud of.
CA is like GA, FL, TX you can play soccer all year long + theres high density populations which translates to several local high level teams to play.

Some locations like CO, IL, NY have population density but they cant play outdoor soccer year round. Usually they augment with things like futsal or at the highest levels have indoor fields but this is a 2nd choice to outdoor. (Assuming outdoor performance is the overall goal)

I grew up in the midwest + at the time didnt understand how CA teams (all teams not just soccer) were able to dominate. After living in CA for the last 25 years + with kids in the sports funnel it all makes sense now. This doesnt mean players outside places like CA wont accel. But from experience I can say that you just cant understand the level of competition happening in CA until you're in it.

To highlight above I remember a game a couple of years ago + I recognized 3-4 pro sports parents pacing the sidelines watching their kid play. Since then seeing pro sports parents has become more and more common. Not that this wouldnt happen in other places. It's just more visible in CA + Socal specifically.
 
USA Sports Statistics (company that makes SR), just posted a preview on FB of the 2015 rankings that will be officially released to the app on Aug 1. To perhaps nobody's surprise, looks like 3 of the top 10 nationally on both the Girls and Boys side are from SoCal, including #1 for 2015G.

boys.jpg girls.jpg
 
Back
Top