Let's take a different tack. Let's assume there are teams A & teams B. They play every day, for 100 days in a row. One needs to predict the result for the 101st game between these two teams. The prediction is probably going to be pretty good, but it is of course not going to predict reality without error. We can track results, and there will be a percentage that it gets right, and a percentage that it gets wrong. Odds are it's going to be right a reasonably high percentage of time - but whatever the percentage is, it can be calculated.
Let's take those same two teams, team A & team B. They instead play every day, for 200 days in a row. One needs to predict the result for the 201st game between those two teams. Like before, the prediction is going to be pretty good, but as before, it isn't going to predict reality without error. We can track results, and figure out the percentage it gets right and the inverse percentage it gets wrong.
Is the prediction for the 201st game better than the prediction for the 101st game? Maybe it is very slightly better, but it is well within the margin of error and randomness that it probably doesn't matter a whit. Same for 501st game, 1001st game, etc. So if 100 games is as good as 200 games, how low can the amount of games go while the prediction (predictivity) still is going to be very similar? Against most people's (including mine) intuition, the number is quite low - and after 6-8 games, the prediction for youth soccer isn't going to get any better. The reasonably simple model has as much information as it needs to predict how the next game will go.
And it turns out that they don't have to be teams A and teams B. If you rate team A against all the teams it plays, and you rate team B against all the teams it plays, figuring out how A & B will do against each other is the same mathematical model. You're right, that there intuitively would be some drift - and teams that are so far apart from each other that it is more than 6 steps of Kevin Bacon to find intermediate opponents, could have ratings that aren't relevant to each other and shouldn't be compared. But it sure looks like the interconnectivity of youth soccer in the US means that the differences in geography and opponents are such that teams can be compared, no matter what. The model still works, and all indications show that to be true.
One way to figure out if this intuition is true, false, or undetermined is to figure out teams that play each other rarely and compare the predictivity to those that play each other more often. Splitting the games into those that are in-state versus those that are out-of-state is exactly that. It's two data sets, one with teams that play each other more often - and one where they rarely play each other. The intuition and expectation would be that the ones that play each other more often would have measurable additional predictivity. But the results remain clear - they don't. One way to poke holes in that hypothesis is to do what you've done and say that inter-state games are going to be more predictable (e.g. the comparative ratings are more accurate) because the games are easier to predict if 1 state is stronger than another state. By doing this, the assumption is that the difference between states somehow outweigh the amount of assumed drift by teams that don't play each other often. It is quite the assumption, but let's assume that it turns out to be true - in that case the assumption must be that the potential drift due to little to no direct opponents isn't nearly as significant as the difference between states.
The kicker is that it doesn't matter which assumption is accurate - all that matters is the predictive results for all games played, and the various stratifications that are shown to show the different classes of results. Whether states as a block are more predictable than individual teams that don't play each other as often, or states as a block are less predictable than individual teams that don't play each other as often, only one of those can be true. At scale, I don't think it matters much either way, and the differences in predictivity are minor, if measurable - but either way, enough data and more questions can be asked and answered.
The better a rating/ranking system is, the better the predictivity should be, and the inverse is also true. As you almost assuredly understand, it is very possible to rate/rank youth soccer teams, against all of the hand-wringing and teeth-gnashing of those that feel that it's impossible, wrong, or unpredictable. And getting a game prediction wrong is very, very possible - and it does not show that the model (any model) is wrong, it just is a piece of data that can be measured and collated, giving a precise answer for how many games it is expected to get right, and how many it will get wrong. Those that discount all of this entirely either haven't thought this through, or are incapable of understanding math. Those that feel that this is accurate - but it's a bad idea in general to rate/rank teams because knowing the actual strength of teams is in itself harmful - have a defensible position, and it's certainly their prerogative. Personally I think they should be also arguing for not keeping score of goals in games, but I'd assume they'd think that would be a crazy extension of their thinking.