R-squared and Residuals

The previous post went into all the details and resources that made up our religious resource ranking of Jewish life in colleges. And while plenty of questions and concerns will surely be raised, I’m confident enough in our methodology and approach to move ahead with some fun things we can do with this data.

One interesting thing we can now look at is the distribution of rankings per college and how that’s correlated to Jewish students per college, which we could call the resources per capita. To see how well those two factors are correlated, we can draw a best-fit line through the data points, and see how well it fits – measured by the R^2 value. R^2 values go from 0-1, and values closer to 1 mean the correlation of the data fits better. Here, if there R^2 value is 1, that would mean that the resource ranking is directly correlated to the number of Jewish students per college. Another metric we can use is the residual, which is the vertical distance of each point from the line, which demonstrates the difference between the observed value and the estimated value for each data point. Residuals closer to 0 are best fits, and larger residuals demonstrate bad fits – either much higher than the estimated value, or much lower than the estimated value.

First we’ll look at Jewish students in general – as depicted in this graph of resource rankings vs. number of Jewish students. 1873

The R^2 value of this data is .1873, which means the data isn’t so correlated – which means the number of Jewish students doesn’t have so much to do with the religious resource ranking of the school. This makes sense, because there are schools like Penn State and UCF which each have 5,000+ Jewish students and few to no religious resources (no kosher food, no Orthodox community) (so why do they get so many Jews?? beats me..), and there are schools like YU which only have around 1,200 Jewish students and rankings of almost 100. So Jews as a whole don’t really go to colleges based on the amount of religious resource rankings, or religious resources aren’t really distributed on a per capita basis. This isn’t so surprising, but it’s a baseline for us to use when looking at other subdivisions of Jewish students.

Next we’ll look at NCSY alumni in college – as depicted in this graph of resource rankings vs. number of NCSY Alumni (from 2009-2012).

Here there’s a better correlation, .3222 (vs .1873 for all Jewish students), which shows that NCSY alumni choose colleges based on religious life more often than Jewish college students as a whole. This makes sense because many (40%?) NCSY alumni are or become Orthodox – so you’ll see YU and Stern high up on the x-axis, as well as Queens and Maryland. One other factor which might be at play here is that NCSY’s parent organization, the OU, also runs JLIC (which factored heavily in the rankings) which could lead JLIC and their associated religious resources to be invested in places with many NCSY alumni. They are also invested in directing their alumni to Jewish colleges (YU, Touro, etc.) and colleges with strong religious communities. But as this graph shows, that strategy clearly doesn’t work for the majority of NCSY alumni, 66% of whom do not go to Jewish colleges or colleges with JLIC. And so you’ll also see plenty of colleges with minimal religious resources which attract handfuls or dozens of NCSY alumni (the huge cluster on the bottom left).

Finally, we’ll look at Orthodox students in college. Here’s there’s a slightly better correlation, with an R^2 value of .3246.O lin

But by now you can see that a linear relationship just isn’t the right fit (assuming there should be a correlation at all). Besides for visually not fitting the data, it also makes sense logically because if you think about it, the difference from 10 – 30 students is very different and much greater than the difference from 150-170. Furthermore, when communities have a decent amount of resources already, they’re less in need for more resources. For example, a place like Penn which has tons of religious resources and students probably needs a JLIC couple less than UMass does, since they have undergrads and grad students and local rabbis who can provide much of the same support. On our graph, we’ll now try a logarithmic distribution, which accounts for the heavy increase at the beginning and the petering out at the end. To do that, we’ll set the x-axis to logarithmic (which also makes sense with the clustering of data point around 1-5 and a few over 1,000) and graph a logarithmic best-fit line (which looks linear on the logarithmic scaled axes).

And now you can see a really good correlation, supported by a R^2 value of .686. Even without seeing the number, you can see how well the data’s fits the line’s curve, with a clear linear relationship. (Although a logarithmic correlation with NCSY Alumni data still only got us .2734.) This correlation 686shows that the more Orthodox students a college has, the more resources there are for them — or (and this goes back to the chicken and egg question), that more Orthodox students go there because there are more resources for them. Whichever one it is (and we assume it’s a combination of the two), many Orthodox students are at colleges with many religious resources, and few are at colleges with few religious resources. Another contributing factor is that our religious resource rankings are geared towards Orthodox students, including Orthodox rabbis and communities in the rankings (but not other denominations), which might skew these results accordingly. This is also due to the geographic distribution of Orthodox communities, which is much more concentrated and Northeast-centered than Jewish students or NCSY alumni in general. You can also see that the graph is a lot less crowded, depicting how Orthodox students only go to a limited number of colleges. In fact (and we’ll go into this in a further blog post..), there’s an even more exaggerated version of the 80/20 rule, where over 90% of Orthodox students go to the top 20% of schools.

As good as the fit is, it’s not perfect; you can see that many schools fall below the line, and many schools fall above it. Why is that? Why would some schools have way more resources than they really deserve/need? Well, one reason is because resources aren’t allocated based on need, on number of students, or even on number of religious students – they’re not really centrally allocated at all! Often times it’s based on wealth, on prestige, on connections, and on politics. That’s why you could have a school like Yale with only a dozen or two Orthodox students and only a thousand or two Jewish students, but with a tremendous amount of resources, rabbis, infrastructure, and programming. Another factor is that many resource factors have huge fixed costs – like buying a building, or paying for a rabbi – and you can’t buy half a building or hire half a rabbi (you actually could hire part-time employees, but JLIC does not), and so they can’t be allocated efficiently. There’s also the issue of lag – that it takes some time from when Orthodox students start going to a school, until resources are allocated there. One could also argue that it’s Orthodox students’ faults – or their parents and guidance counselors, for not directing them to schools with appropriate levels of religious resources. And finally this imbalance exists because no one ever maps out this data and thinks about where actual need exists, and because all the different players and investors and organizations never work together to think about maximizing their effectiveness.

An interesting next step is to see which colleges are below the line – which would mean there’s a need, that there aren’t enough resources for the number of Orthodox students, and which colleges are above the line, which would connote a surplus of resources for the number of Orthodox students. You can see this by calculating the residuals for each college, or the distance from the expected value on the line. You can see a lot of colleges with a few Orthodox students fall below the line on the bottom left corner, and you can see a bunch of college in the middle of the graph fall above the line. You can also multiply the residuals by the number of students, so you can see the total need or surplus per campus. (As an aside, the commuter colleges fall out below the line but that’s because they aren’t ranked correctly, since they make use of resources locally or at home – kosher food, minyanim, etc. – although that’s not quite the same as a community and resources on campus.) Using this, we can now actually see which are the colleges which need more help? Which are the colleges to which we should be sending students looking for religious life? And we can start to think about the reasons for these distributions and how we can leverage them to better serve all Jewish students on campus.

ochartI’ll leave you with one final graph, which is the residuals multiplied by the number of Orthodox
students (for all the colleges which had at least 1 Orthodox students) – this graph only shows the colleges which were below the line, and it’s on a logarithmic graph so it’s spaced better. Right away you can see the largest residuals were the aforementioned commuter schools, which have plenty of non-campus religious resources locally, where most of the students live. Leaving them aside (as well as Cooper Union, which really shares resources with NYU), you can see a whole bunch of colleges which all have a significant need and a significant number of Orthodox students: GW, UC Berkeley, Stanford, MIT, OSU, Emory, Muhlenberg, Drexel, etc. In an upcoming post we’ll hopefully get back to these colleges, and offer some potential solutions!

Leave a comment

Your email address will not be published. Required fields are marked *