To adjust a pandemic outcome from the industry composition of its economy, we use the following multivariate linear regression equation. ys=α+xsβ+es where β is vector of coefficients, one coefficient for each of the share variables in xs. Because the share variables and the regression residual have mean zero among the fifty states and DC, α is the national average outcome y. We interpret xsβ as the part of the outcome explained by industry composition and ys - xsβ =α+es as the outcome adjusted for industry (or health) composition. We estimate α and β using ordinary least squares in the pre-pandemic data for the fifty states and DC.
For our GDP by state component, we used the same regression method with the vector elements Mining, Oil and Gas, Accommodations and Food, and Arts and Entertainment.
Because COVID infection mortality risk is extremely age-related -- 8700 times higher in age 85+ than in 5 to 17, according to the CDC – we applied an age-adjustment to the number of observed deaths in each age group to bring the numbers in line with a standard U.S. population. Because CDC suppresses totals of less than 10, we combined ages less than 35, but because there are few deaths in that age range it should not affect the accuracy of the adjustment.
To further adjust these numbers for substantial differences in metabolic health across states, we applied the same regression methodology we used in the economic section to the age-standardized rates above using CDC-reported prevalence of obesity and diabetes, the conditions most strongly correlated with COVID-associated deaths.
etc, etc, etc