So, here is a scenario: Student Benchmarks come in for the Fall, you pour through the data, scramble manically through spreadsheets and printouts of last year’s Spring State Assessment, only finding some of the many results. You ask yourself: Does this make sense? Do these Benchmarks line up with the State assessment? Where did I put my coffee mug? Frantically, you try to identify any growth in your Lowest Performers (aka “Bottom 25”) to show the board that yes, your curriculum is having an effect! But is it? Let’s just look at this test over here…. Wait! Is it 7:30am already? The presentation starts at 9am! Gosh, it would have been much easier to be able to identify who these students are before the results came in and already have them in an appropriate program…
What is Data Science?
Data science is a "concept to unify statistics, data analysis and their related methods" in order to "understand and analyze actual phenomena" with data. (Definition Source). As an example I was reading an article in Forbes a few months ago regarding Political Campaigns and how they use data to determine which potential voters to target. They would take a list of factors such as demographics, occupation, club memberships, giving records, social media mentions, etc. that are all compiled together to determine a quantified value and then campaign decisions are made based on that quantified number. There is a lot more to it, and I would recommend reading the actual Forbes article and this article by the Associated Press about the process (it's fascinating and scary at the same time). If the process is known (which it is) and the data is available (which if you are using a company like SchoolStatus, it also is) then districts can use their student’s data to track macrotrends using predictive analytics. This then allows a district to be more proactive regarding students rather than reactive.
Examples In Districts
In the last year, a growing trend amongst some of our customers is the idea of an “At-Risk” report. The intent of this report is to quantify students who are at risk of something based on some form or combination of factors that the student is currently exhibiting for the purposes of intervening before a problem takes place. There are many factors I’ve witnessed districts monitor, but some common ones seem to be: students below an ADA threshold, students above an infraction threshold, students failing certain classes, and also some form of demographic indicator (i.e. SPED, LEP, etc.). This is great! The data is available and we can certainly help in presenting this information to you. Why is this great? Because it is identifying students quantitatively and objectively for the purpose of early intervention.
Dr. Data hard at work, 2017
Dr. Joy Smithson, our Lead Data Scientist, has provided another example some of our districts have been implementing in the past years. That is to use benchmark information as it was intended: to identify those students in most need of intervention, and to project students’ proficiency in specific content areas. Dr. Smithson has been working with these districts to identify cut scores on the State Assessment compared to Fall Benchmark Scores to determine potential growth rates. One District even created a Score Card to help them see just that! Per Dr. Smithson (Italics added for emphasis):
A couple of districts are using this benchmark information as it was intended: to identify those students in most need of intervention, and to project students’ proficiency in specific content areas. Specifically, students’ spring scores from the previous school year (Spring Term, SY 2016) are compared to students’ fall scores for the current year (Fall Term, SY 2017) to determine who regressed and who demonstrated growth. Furthermore, students with scores approaching an adjacent proficiency level are identified using cutoff scores, as those students might need a nudge to achieve the next proficiency or to avoid regressing to a lower level. The same report is built after the second 9 week assessment, such that students’ performance from the spring of the previous year gets compared to students’ performance in the winter.
One district even created a proficiency scorecard to see how many points they would earn for students’ performance if they were awarded a grade from the state right now based on current students’ benchmark proficiency. Seeing their scorecard in real time helps them see where they are in relation to their school or district’s goals. The district can then turn to the student-level data to drive the results-focused discussion, because that’s ultimately where the rubber hits the road. The scorecard tells them how many points they’d earn now, but the student-level growth projection indicates the actual students who are most likely to advance or regress.
The first paragraph is identifying students, and the second using predictive analytics (“Data Science”) to see which students are most likely to advance or regress.
Both of these examples (using an At-Risk report and using predictive analytics to determine likelihood of advancement or regression) use their own student data to identify correlated trends in student populations in order to attempt to act proactively rather than reactively. But remember that just because you have a lot of data, doesn’t mean that it is the right data to consider. The first blog post I ever wrote for SchoolStatus was titled “Top 4 Statistics Pitfalls and One Super Easy Solution” and the third pitfall was: assuming that a correlation exists without confirmation. It is easy to assume correlations exist because in our perception it seems obvious, but the data might tell a specific story if we analyze it. I don’t know if the above examples have been thoroughly tested yet or not, however one of the options our districts have is to work with Dr. Smithson to help identify those very same correlations so that predictive analytics can be utilized with some confidence with the student data they currently have. Regarding the At-Risk report and other drop-out factors, I recommend that you read the report titled “Do We Know Who Will Drop Out?: A Review of the Predictors of Dropping out of High School: Precision, Sensitivity, and Specificity” by Bowers, Sprott, and Taff. It is a fascinating paper comparing over 110 drop out flags across 36 separate studies ranging from a population size of 99 to about 50,000 students. It’s a good read and cautions against being fixed at one measure of risk.
I call this using data backwards, because for many districts, we tend to focus reactively when presented with student data. Using data like a Data Scientist can help look at trends proactively to help identify potential issues before they become an issue. And at the end of the day, what really matters are the students and seeing them succeed. Whether trying to predict drop-out rates or growth vs. regression, using your student data like a Data Scientist uses data only really matters if we can use it to help individuals. All the data in the world does nothing if students don’t succeed as a result. Click to Tweet.