Often overlooked, digital file receipt is an integral and increasingly technologically sophisticated aspect of Mortgage Re-Underwriting Due Diligence. It serves as the gateway through which all documents must enter the platform, and therefore has a tremendous impact on workflow. Formats vary, however, and ad hoc development for non-standard productions is common. One such production received by Oakleaf Group consisted of several thousand pages, a web-based interface, embedded documents, and hyperlink dependencies. The complexity and size of this production, compounded by metadata anomalies, resulted in multiple issues:
1) The metadata included both the guideline, as well as the individual program guides shuffled together, receiving updates concurrently with updated metadata in shared columns; 2) The document updates were piecemeal and asynchronous. An update to credit policies on Monday, income and asset requirements on Wednesday, and another credit update on Friday was conceivable. To further complicate things, there was no singular update-document type index, leaving updates to ostensibly the same update-document non-standardized; 3) Would-be embedded documents had entirely different metadata notation to the update-documents, rendering their organization ambiguous; 4) These updates included both Active and Draft versions – a distinction not denoted in the metadata.
The document needed to be reimagined as not simply a disaggregation of updates, but as a living document, singular and dynamic, subbing in and out sections as they were updated and became obsolete. This paradigm shift reoriented our development strategy towards compactness, linearity, and singularity.
The first step in realizing this new strategy required teasing out the two distinct documents (the guideline and program guides). By identifying substantively different structures in the naming conventions of the apparent bookmarks and checking the non-overlapping values on the rest of the fields, it became clear that the so-called ‘Replica Id’ separated the metadata into distinct major documents. With the two singular documents identified in the metadata, we then needed to construct an internal document index within each of the major documents. This involved a mix of splitting and combining numeric data indexes from apparent bookmark fields. This update-document index, verified by our subject matter experts, enabled our planned substitution scheme.
Next, the problem of floating document attachments was realized and resolved by identifying a grouping index, an apparent attachment identifier field. Using this, we pushed the constructed document index, along with the bookmark and date data from the source update-documents, down to the apparent attachments.
Finally, we needed to identify draft versions and active versions of the documents, or else incorrectly replace an active version with an archived or draft version. With no apparent metadata to support this distinction, we turned to the files themselves. After manually identifying examples of both kinds of documents, we used their extracted text as keys to mine the text and classify each update-document. We then merged this constructed active/draft index with the original metadata to subset and purge the draft update-documents, along with their respective embedded dependencies. With the metadata sufficiently enriched, we needed to turn this one static metadata document into thousands, an instruction set for every day in the range of updates. After applying a text overlay to the compiled document-updates, we compiled the whole-document guidelines, incorporating or substituting update-documents as they became active. We further included the apparent document index and document classifiers as bookmarks for ease in navigation for the user.