Slate Admissions Model

Overview

The Slate admissions model is largely based off the production version of the Colleague admissions model, although this one is structured slightly differently and sources from the Slate database directly. Although Colleague can answer most of our reporting questions as our system of record, there are multiple factors that cause it to be problematic for admissions funnel reporting:

  • The ELF process that moves data from Slate to Colleague only runs a handful of times per day. As such, we don’t have official admissions numbers as of midnight each day.

  • The ELF process also has historically not run on the weekend and thus is not caught up in Redshift until Tuesday morning’s load. This is especially important to compare historical numbers during times that critical application deadlines occurred on the weekend, such as May 1st for the upcoming fall term.

What’s so wrong with the Colleague Extract?

Historically, deletes in the base fact table for the Colleague admissions funnel (edw_admissions.fact_student_application) has been incredibly problematic as far as deletes go. The reason for this is because of what happened during RevSU times back in the mid-late 2010s before AWS was a thing. Some, but not all, applications were deleted out of Colleague (I don’t really have context to why this happened - it was way before our time and most of the people, especially contractors, involved in that project are long gone). Since all of our models are typically based off of a full compare of what Colleague says vs Redshift on a given day, this leads to erroneous deletes in the EDW for years up to 2018(ish) which causes counts to be off.

To work around this, the idea is that we can extract all application records from Slate directly at 12AM every night. This enables SU to have both an accurate representation of historical application updates that may have occurred during times that the ELF process did not run on the weekend in the past as well as a 12AM snapshot of how Slate looked the day before.

Loading Approach (Fall 2022)

  1. Work with the admissions data steward, Danielle Richmond, to create queries from Slate to accomplish two things separately (separate because there is no simple way to join these two grains together in Slate - we will get to that later)

    1. A Slate base query that collects all Slate application IDs, student IDs, application statuses, and application status date times that occurred.

    2. A Slate decision query that collects decisions for a student based on their application. This is most important to determine whether a not a student has accepted their offer to come to SU as well as whether or not they have withdrawn (that is, declined, updated their term, updated their major, etc) their application.

      !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
      Keep in mind that these queries are currently configured to pull on the active term(s) moving forward as of the current date, which is designated by SU and Slate. So, for example, as of 11/08/2022, the active terms moving forward are 23WQ, 23SQ, 23RQ, and 23FQ. If you need to backfill all of history, you will need to change the query in the extract conductor for the base dataset at the bottom of the script to remove any active term. More about active terms are described later in this document (step 4). Please contact Danielle Richmond if you have any confusion about this.
      !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

      Here is a general concept of the load which is discussed in further steps:




  2. Using etl_meta.extract_conductor, place these queries in a ‘special’ queue. In this case, look for the authentication variable ilike ‘%slate%’. The idea here is that the Slate extract is in its own iterator that happens before any of the other typical Colleague tables get extracted to be as close to 12AM as possible. Once those extracts complete, we run the rest of the regular nightly Colleague extract as such:

  3. As you can see above, there’s a second step that rebuilds a view combining those two source Slate query outputs together. That view is stage_slate.v_slate_ug_all_formatted, and the code can be viewed in the DDL column of DBVisualzer. The purpose of this view is to combine all regular application statuses for a student along with their decisions (accept, cancel, etc) in a columnar format that is very similar to how data is presented to us via Colleague. Note: Just because this process was initially developed this way does not mean it needs to be a permanent solution! As of Fall 2022, the purpose of this exercise was to make all data extracted from Slate as close to Colleague format as possible in order to rebuild a new version of the funnel from the Slate feed and make sure numbers match (or differ in the favor of Slate’s accuracy)!

  4. In Matillion, the orchestration load_merge_fact_student_application_slate will load everything staged from the view above and load it accordingly into the target fact table solely based on what terms are active based on what is an ‘active’ term in the query. There is logic built into this job to do so. We don’t want to do a full compare like the rest of our models - this is only do a full compare based off the terms that are in the extract.

  5. From there, the reporting layer (edw_admissions_reporting.admissions_funnel_slate) is populated by the following orchestration: edw_admissions_reporting.admissions_funnel_slate

Some Items of Note…

  1. Readmitted Students: These are a known issue with Slate. When a student is readmitted to Seattle U, Slate does not reconfirm them (application_status_code = CF or PP%). That application status code is a huge deciding factor for whether or not a student is net deposited. This needs to be fixed upstream moving forward before the Slate model can relied upon.

  2. As mentioned above, this current extract does not integrate any queries for grad students. Presumably, those will need to be two more queries that are integrated into stage_slate.v_slate_ug_all_formatted and thus part of the corresponding fact table. A full backfill will be required to get both integrated!

  3. The code for the DDL and current backfill scripts can be found in DevOps. https://seattleu-its.visualstudio.com/informsu_edw

  4. The point in time model (sometimes reference as PIT or PiT) for this specific extract should be able to be easily replicated by what exists for edw_admissions_reporting.admissions_funnel_point_in_time. You’ll just need to ensure you create a new view for the dates to process from and kick it off accordingly as part of your backfill (preferably in a new version of Matillion).

 

What’s Next?

Irina had a good idea to put this here after the meeting we had on 11/9 before I left. Vetting is a really tricky balance at SU - it’s easy to overdo and even easier to not do enough of. Since readmits are still an issue, here’s what I suggest (and take it with a grain of salt):

  1. Work with Dylan to create a Power BI model that sources from edw_admissions_reporting.admissions_funnel_slate.

  2. Set up a daily audit that compares the output of the following items:

    1. Distinct students who applied

    2. Distinct students who gross deposited

    3. Distinct students who net deposited

    4. Distinct students who were waitlisted

    5. Distinct students who were on a waitlisted invite list

    6. Distinct students who melted

It is expected that these numbers will not match - in fact, I presume that in most (if not all) cases, the Slate data will actually be better. This audit, however, will instill confidence over the subsequent months that the Slate feed into its own Admissions model will be more reliable.