Club Team Tiers

Against my better judgment, I figured I'd share another example of this bug that I saw this afternoon. Here is one new team entity, with 3 data sources brought together manually (Davis Legacy 07/06B White). When brought together, the ratings/calcs look weird. #1 defense in nation, #13 overall, but #212 offense, and only #624 in schedule strength. I'm pretty confident that when I look back at this same team a day or two from now, everything will be much more believable and better representative of the data sources. For the umpteenth time in this thread - I don't believe these possible errors amount to much at all, as they are quickly normalized - but it would be even better if we couldn't cause them to happen occasionally in the first place.

davis1.jpgdavis2.jpg
 
Another example of something that doesn’t make sense is 2014 Pateadores NBCM. They are currently 17th in California. Look at their record and compare it to Strikers Barahona at 15th. I don’t think they should even be in the same ball park. Sorry to pick on the Pats but the discrepancies I notice are mainly with Pats.
 
Another example of something that doesn’t make sense is 2014 Pateadores NBCM. They are currently 17th in California. Look at their record and compare it to Strikers Barahona at 15th. I don’t think they should even be in the same ball park. Sorry to pick on the Pats but the discrepancies I notice are mainly with Pats.

I'm not sure I agree, I don't think it looks that out of whack. They show a 41.5, and just look at the results from Surf Cup at the beginning of the month. They beat a 39.7 by 2, a 40.7 by 3, a 39.3 by 3, a 40.6 by 2, and a 35.8 by 8. A 41.5 rating passes the eye test, as it is about where one would estimate just by looking at the results. Now that team entity only has info for 2 tournaments, so while it has determined it's enough to show a rating, it's quite possible additional games coming in will provide some clarity on where they should be rated.


pats 2014b3.jpg
 
I'm not sure I agree, I don't think it looks that out of whack. They show a 41.5, and just look at the results from Surf Cup at the beginning of the month. They beat a 39.7 by 2, a 40.7 by 3, a 39.3 by 3, a 40.6 by 2, and a 35.8 by 8. A 41.5 rating passes the eye test, as it is about where one would estimate just by looking at the results. Now that team entity only has info for 2 tournaments, so while it has determined it's enough to show a rating, it's quite possible additional games coming in will provide some clarity on where they should be rated.


View attachment 22832
Agreed based on the limited recent games, they might be. But this team has been around for a while. It’s the same team as Pateadores SC NB…in the unranked section. I suspect a bad actor getting rid of old data to game the system.
Perhaps there should be a stricter criteria for top 50 teams.
 
Agreed based on the limited recent games, they might be. But this team has been around for a while. It’s the same team as Pateadores SC NB…in the unranked section. I suspect a bad actor getting rid of old data to game the system.
Perhaps there should be a stricter criteria for top 50 teams.

You can't help yourself, can you? There is almost certainly no bad actor, there is just apathy from anyone looking at their own team data in terms of merging them. Is it healthy to always think there is a conspiracy/nefarious threat behind everything that you don't like or understand?

Pateadores is always going to be clunky with any aggregator software like this. There are a bunch of similarly named, but different, clubs - and they all register their team names into leagues/tournaments with various permutations, making it non-trivial to bring teams together that are probably the same (but named slightly differently). Other clubs that have this issue pretty regularly because of the number of different teams, number of affiliates, and frequently changed team names include Total, Strikers, Legends, Albion, City SC, Slammers, and probably a few more. For just the Pateadores in the main listed club "Pateadores SC", they show data for 75 different girls teams and 115 different boys teams from 2016 through 2008. And there are 7 club listings, including that one main one, in just CA (Pateadores HB, Pateadores IER, Pateadores IRV, Pateadores Long Beach, Pateadores Newport Costa Mesa, Pateadores Santa Clarita Valley, and the main Pateadores SC). If someone affiliated with the club cared about getting this data shown correctly, they'd need to stay on top of how these potentially hundreds of teams are registered in their various tournaments/leagues/events, name them consistently, and bring them together as needed. But most certainly don't see this as a priority, so it comes down to any interested parents to do it. In most cases - it works pretty well, as it only takes a singled interested party to keep things clean, but for clubs like this it can get large enough and complicated enough that team data isn't as collated as one may want.

In this case, if you are convinced that these two teams are identical, and that new data should be added to the other team - all it takes is to add it and hit save. If you're right, it's likely that you just helped make the data more accurate. If you're wrong, someone might potentially change it back or send a note to support ("WTF - who messed with my team!"). And if it turns out to be inaccurate, it is just as easy to remove the data afterwards - nothing is permanent.

Here are the two teams:

pat10.jpgpat11.jpgpat12.jpgpat13.jpg

Here's how to add/merge that data:

pat14.jpg

Here's the resulting team, with all data sources in one:

pat15.jpgpat16.jpg

Now it looks like the team has done amazingly well, almost suspiciously well, in the Surf cup - showing 3 green overperforms. It might be that the same team just did well, or the team entities might really not be the same, and a different roster is now cleaning up in tournaments.
 
When I see this and I know for sure they're the same teams, I add the games/sources manually...
That’s the thing I am not 100% sure. When you add missing sources, it asks you if you are 100% sure. I am only 90% sure.
It does look suspicious when the former Pats NB became “unranked”. How can a team with 2 years worth of games become “unranked”?
 
Now it looks like the team has done amazingly well, almost suspiciously well, in the Surf cup - showing 3 green overperforms. It might be that the same team just did well, or the team entities might really not be the same, and a different roster is now cleaning up in tournaments.
I doubt there is a different/better roster at NB. If anything they should have lost players to the Pats pre ecnl team nearby and gotten worse.
But this just shows you can improve your ranking by getting rid of old bad results.
 
That’s the thing I am not 100% sure. When you add missing sources, it asks you if you are 100% sure. I am only 90% sure.
It does look suspicious when the former Pats NB became “unranked”. How can a team with 2 years worth of games become “unranked”?

Any/every team that has no recent results attached to it is unranked. Every single team entity becomes unranked at about 6 months without current data. It's not suspicious. When new data comes in and it's not attached to a team, and it alone isn't sufficient to support a ranking, it is also unranked. At some point, any team entity that has no results attached to it for something like 18 months - 2 yrs (I don't know the specifics), that team entity doesn't even show up in the unranked list - even if it is still in the underlying database. Other teams are still in unranked not because they may not have recent enough data, but because they don't have *enough* recent enough data for a rating to be relevant. They have to have enough games in that time frame vs. other rated teams. It's not a lot, it seems like 5 - 8 are often more than enough.

To improve the reliability of the ratings, the more data that can be moved from unranked over to an already ranked team, the better. The vast majority of this is going to happen automatically. This still leaves a huge portion of potential effort for anyone with the local knowledge of where unranked data should be - to put it exactly where it should be. And when you are searching for unranked data to add to a specific team, it pulls from not only all of the shown unranked data, but even seasons/data from prior that no longer show in the unranked lists. So if you search for data on Raging Purple Butterflies 2007G, it may pull up potential matches for that name going back quite a few seasons.
 
But this just shows you can improve your ranking by getting rid of old bad results.
So a team that plays better, help me here, is rated better, and that is shocking? Current results are always going to be the most relevant, and the age of the results in the history affects its value towards the rating at a pretty significant velocity. Anything older than 6 months really doesn't seem to move the needle much at all. And if a team is progressively improving at a significant rate, i.e. much more than the average one would expect as they age - yes - if they become a *new* team every 6 months, the rating of that new team (once it has sufficient games), would be expected to be somewhat better than if that team kept the prior 6 months history where they were performing noticeably worse.

But keep in mind - all of this is to predict whether that specific team, at this very point in time, would be expected to beat another specific team, at this very point in time. Only one prediction is the "right" one, what is the expected strength of the team right now. Yes - if they have overperformed in the last few games, and they can wipe the old history and still have enough to show as ranked, it might be a higher rating than if they included all of their game history. But overall - the more game history (up to a point) assigned to a team, the more likely the prediction of the team's strength to be accurate.
 
So a team that plays better, help me here, is rated better, and that is shocking? Current results are always going to be the most relevant, and the age of the results in the history affects its value towards the rating at a pretty significant velocity. Anything older than 6 months really doesn't seem to move the needle much at all. And if a team is progressively improving at a significant rate, i.e. much more than the average one would expect as they age - yes - if they become a *new* team every 6 months, the rating of that new team (once it has sufficient games), would be expected to be somewhat better than if that team kept the prior 6 months history where they were performing noticeably worse.

But keep in mind - all of this is to predict whether that specific team, at this very point in time, would be expected to beat another specific team, at this very point in time. Only one prediction is the "right" one, what is the expected strength of the team right now. Yes - if they have overperformed in the last few games, and they can wipe the old history and still have enough to show as ranked, it might be a higher rating than if they included all of their game history. But overall - the more game history (up to a point) assigned to a team, the more likely the prediction of the team's strength to be accurate.
Regarding rankings app sometimes having trouble identify teams if the team name is changed, doesn’t every team has a gotsport number? If so, why doesn’t rankings app use the got sports number to identify teams?
 
Regarding rankings app sometimes having trouble identify teams if the team name is changed, doesn’t every team has a gotsport number? If so, why doesn’t rankings app use the got sports number to identify teams?
Because the data sources don't use that number. There's nothing to link that ID to.
 
Regarding rankings app sometimes having trouble identify teams if the team name is changed, doesn’t every team has a gotsport number? If so, why doesn’t rankings app use the got sports number to identify teams?

If the team entity is on GotSport, yes, they have a GotSport number. However - the linkage of that number to the right team is more screwed up than you might imagine. It used to rely on those numbers very significantly, but it turned out that it was doing more harm than good in many cases. GotSport charges $25 per change for anyone to actually fix a piece of game or team data that they notice is wrong - so nobody does, and the data gets pretty bad over time. So that GotSport number is certainly pulled in as a data point any time that it exists, which is pretty much every time if it's from a GotSport page itself. It's one of the ways that SR can figure out its predictivity vs GS - just run a query comparing gotsport numbers & ratings and see what actual results are (it's not good). It would make everyone's life easier (including the app developers) if there was some trustable ID. But with the fragmentation of US soccer at the moment, it's not likely in the foreseeable future. SR is probably the best place right now that actually lists/catalogs all competitive soccer teams with recent results.
 
Against my better judgment, I figured I'd share another example of this bug that I saw this afternoon. Here is one new team entity, with 3 data sources brought together manually (Davis Legacy 07/06B White). When brought together, the ratings/calcs look weird. #1 defense in nation, #13 overall, but #212 offense, and only #624 in schedule strength. I'm pretty confident that when I look back at this same team a day or two from now, everything will be much more believable and better representative of the data sources. For the umpteenth time in this thread - I don't believe these possible errors amount to much at all, as they are quickly normalized - but it would be even better if we couldn't cause them to happen occasionally in the first place.

View attachment 22816View attachment 22815

As we'd hope, this same team no longer looks strange; here's the current page for it (#63 in state, and correspondingly believable numbers across all of the rest).

current one:
davis3.jpg

initial strange one:
davis1.jpg
 
Back
Top