Leaders in Lending
Leaders in Lending

Episode 59 · 6 months ago

AI Lending 201: The Evolution of AI and Machine Learning at Upstart

ABOUT THIS EPISODE

When you’re applying AI from scratch, there are a few lessons to keep in mind. One that stands out is to ensure the machine learning solution is well suited to the problem.

Here’s the story of how we evolved our ML strategy at Upstart.

Hear about how to apply AI in the lending environment from Leaders in Lending host Jeff Keltner, Senior Vice President of Business Development at Upstart:

  • Challenges to overcome at the beginning stage of the ML journey
  • Why feature engineering and first-party data build on each other
  • The evolution of Upstart’s underwriting model
  • Shifting from manual to automatic identity verification
  • How to attain more interesting predictions and apply them in the credit industry
  • Mitigating risk with creativity

More information about Jeff and today’s topics:

To hear more from Leaders in Lending, check us out on Apple Podcasts, Spotify, or on our website.

Listening on a desktop & can’t see the links? Just search for Leaders in Lending on your favorite podcast player.

You are listening to leaders and lending from upstart, a podcast dedicated to helping consumer lenders grow their programs and improve their product offerings. Each week here, decision makers in the finance industry offer insights into the future of the lending industry best practices around digital transformation. In more let's get into the show. Welcome to leaders and lending. I'm your host, Jeff Keltner. This week I'm going to do a little follow up on a discussion I had a couple of weeks ago we called ai one hundred and one, where I went kind of through the basics of Ai, and this week I really want to walk you through how upstart has applied ai in the lending environment, specifically the parts of the lending experience we've leverage ai to help us improve the lessons we've learned kind of the progress we've had. I think the journey of how we've been applying these capabilities in the context of lending is really informative for anybody in the banking lending space, or or any space really, about how the journey through the application of ai can look and some of the things to watch out for. So this will be a bit of a case study, but a real world application of the concept of Ai specifically to the consumer lending space. So I'm going to kind of flow you through the journey of how we did it, kind of chronologically, if you will, where we've really been focused on applying AIS. I think that's kind of an interesting way to follow the story. And so you know, what up start? First started probably the the core thing we believed was that using AI and and other techniques, we could really fundamentally improve credit outcomes the predictive capacity of credit modeling. That really with the belief that many more Americans are credit worthy than have traditionally prime credit scores. And if you could find those people with slightly lower scores but who, what would repay a loan, you could offer them credit, you could reduce a price of credit. And so we set about moving beyond a more traditional, simple approach to assessing credit worthiness to really predicting credit worthiness using machine learning techniques. And you know, the first thing we encountered was one of the standard problems for machine learning, which is the kickstart problem of hey, if you want to use your own data to train a model, and you know, if you want to figure out who's going to pay back alone, but you've never made alone. You don't have any data to use. And so our first models were a small number of variables, mostly third party data, so data from outside lenders, either through Credit Bureau Retros or through publicly available data from different lending plotforms, and it was used to make relatively simple predictions, maybe just good bad, maybe probability of singular probability of default over the course of a loan. And so we had kind of a beginning stage where we had, you know, not enough data to use a lot of variables, not enough data to use the most sophisticated techniques and not enough data to do really complicated predictions and not our own data. And yet we could build a model and that model showed really tremendous uplift. One of the things to think about when you have a model like this is also how you limit your risk you're taking on for a model that's newer and untestined. Right, so you've got this new model, it shows good promise and maybe you're training data and I should say to spite the lack of first party data. Lending is one of these problems, particularly credit underwriting, where the training data is actually really robust, even without your own first party data, because there is a history through the credit bureaus of WHO's been offered loans how those loans are performed that can be accessed in an anonymized way to build some of these models, and so we had a pretty good model. But when you're first getting started you kind of want to find ways to limit your risk, and I'll get to my lessons at the end, but this is related to one of them, which is it creative ways to limit risk of the model. So there's all sorts of things you do in testing, cross validation, accuracy metrics to limit the risk of the model being inaccurate. Another way we work with early on was to put some more hard requirements into the credit policy in addition to just the risk model. You can have a world where the only thing that matters is the risk prediction by the email model. We started off with relatively high traditional credit score requirements, relatively low debt income thresholds, where we were really operating in a constrained universe. But that gave our partners on the lending side...

...and on the investing side some confidence that we had controls around the risk of the model. Being inaccurate and of course, as as that model gets more effective, as you get more confidence in the model, as it's proven its accuracy, it's as it proved over time, you can loosen those and so now we have many of our lenders and investors who don't have any credit score requirement, some that don't have any debt income requirement and the course, that's not to say that those variables aren't important in the prediction of risk. They're just not being used in a simplistic way with a minimum credit score, despite what the risk model might say, and so we're really relying more on the machine learning to make an assessment of risk. And that's the whole that's the whole ball game here really in many ways, which is if you had a minimum credit score of six hundred and eighty, as many lenders do to start, you know that the six eighty, two hundred and sixty area has a has a lot of good barbers in it. And if you rely on the model of finement, then you're really removing that six hundred and eighty requirement, maybe making it a six hundred and sixty hard requirement, and relying on your machine learning model to identify the good risk barbers and that six hundred eighty to sixty pool and how do I find those? And over time that, of course, you can keep reducing that six hundred, sixty, six forty all the way down until you fully remove that requirement. And so that was really how we got started kind of applying this. And then, course, over time, as you get more data, you're able to do a number of things right. As you have more data, you're able to use more sophisticated techniques, most traditional lenders using logistic regressions. You can move up into more gradient boosting, you can use more levels of trees, you can make more sophisticated models that can find higher order interaction effects. Of course, to have interactive effects between five or ten variables you need lots of data where you have these data points on lots of different loans, and so you you kind of have this natural progression. As you get more first party data, you have the ability to make more sophisticated models. You probably continue doing work and what we call feature engineering. So we continued to look for ways to combine the variables. Now, theoretically, a machine learning model would find all of these connections between variables. A very simple example being a debt income ratio. It's a constructed variable, right. You get a debt load, you get an income, you make a ratio of the two. So you've taken two variables and combine them into one. And in a theoretical way, given enough training time and enough data, a machine learning model would determine all of the possible interaction effects, like a debt income ratio or payment income ratio being another that you could use. It would deduce all of those on its own. But in many ways you can think of feature engineering is a shortcut for the model to say hey, we know that this, this combination of variables, is really predictive, so we're going to add it as a new variable to the model. Is there's a lot of work on the machine learning side that goes into finding those kinds of constructed variables, engineering things which a lot of work around cash flows. Can we look at expenses and debtloads and income and figure out cash flow available to fund loan payments, things of that nature? So a lot of work goes on and all all builds on itself. As as you can do these things, you can add them to the model. The model gets more accurate allows you to approve more people, generates more first party data. So there's a real fly wheel, and so that was really the first place that we applied machine learning and I will talk about to really critical moments for us in that journey from simple models, simple predictions, third party data to more of where we're at today. And so the two big moments which really represented very large increases in accuracy of our model. I kind of there's a lot of small incremental changes over time and there's moments where you see real improvements and model accuracy because you've shifted the way you do something. And and two of those I really want to talk about. The first was the movement from third party data, so loans that were made by other platforms that we access through the publicly available data or anonymized Credit Bureau Retros, to the use of First Party data, where the training data for the machine learning models were loans that had been originated by upstarts partners through the upstart lending platform,...

...and that gave us, you know, a huge increase and accuracy because of course we can get more data on those loans. More if you think about the train to get it again as a spreadsheet with every loan for underwriting is kind of a row and every piece of data about alone as a column. We have more columns on our own data. Then we do want third party data. So we have this ability to get much more accurate. Now you need enough rows in your first party data to train the model, and so it's first we had zero rows. We had to use third party data. You know, after two months you have some rows, but they're both not that many loans and they're not seasoned enough to have a good sense of performance and so you're not using first party day. But we hadn't. We had a moment where we switched to first party data on that took a while to get to and of course we're testing that model and back testing it against historical loans and how accurate could have been. And when we switch that was it was a large increase and accuracy and, I guess a moment where you really feel like you've in some way start controlling your own destiny a little more because you really have your own of your own data training the model. But that was an important evolution because we could not have started there and had to naturally use third party data before we had generated enough data to shift to first party data to train the model. And the second really interesting shift is as we shifted because of the complexity of the model, as we were able to increase that and the amount of data, we are able to shift to much more interesting predictions that we were making. And so I think this is really relevant because many, many people who are applying machine learning to credit probably start with relatively simple predictions. You can imagine a prections, just will this person pay back or not? Good, bad, yes, no, simplest kind of prediction you can make. You might move to, you know, we move to a more more substidetial probability of default. What's the probability? Zero? Zero being no chances person De faults. Probably never get that. One being this person is absolutely going to default for this particular loan obligation. And so, you know, moving from that, you know, binary prediction to a probability of default, to what became important to us was understanding the real components of the economics to a lender of the loan, which is, not only will this person default or not, but at what time might they default because of thirty six month loan that pays back for thirty five months and in charges off the last month payment is a very different economic outcome to a lender than alone. That nest never has a payment made right, and so understanding what the timing curve isn't it turns out, when you start to look at this, that each borrower, each applicant, has a differ front probability of timing curve, and understanding that more and more fine grand way allows you to make better risk predictions of better decisions. We also went not only from a monthly probability default but also added in kind of monthly probability of prepayment, because prepayment also, of course, impacts the economics to a lender, and so we can combine those things and so you can kind of see this evolution over time of as the models get more sophisticated, they can make better predictions. Right, we're still a long way from perfect prediction, of perfect model and underwriting. What would either say yes or no. Would never get alone wrong. Every yes would pay back a hundred percent. Every know would would have been somebody who would have de fault that had you given alone, and you have very low interest rates for everybody. We're a long way from that kind of idealized state, but we made a ton of improvements all along the path there and seeing really tremendous impacts of the business for our partners, and I think that evolution is a core part of how machine learning in the real world works, that you don't kind of start way of magic wand and have a great answer. You have to start simple with the capabilities that you have, the data that you have, and build and mature over time, and that's really what we've seen in our underwriting model and is obviously in a much better stay now and we continue to spend a lot of time because we see so much opportunity left between feature engineering, new data sources, maybe new credit bureau data sources with alternative points of data, different things we can do to improve the predictive power of that model and improve there for our ability to help our partners serve consumers and and make sure they're earning the right economic rewards for the risk that they're taking it. So that was really the first place we decided to...

...apply machine learning. The core of what brought us into the space was was underwriting. The second place that we really tackled it and got into is interesting because it was something we learned, which was what we call the verification stage. So with our with our platform for our lending partners, an applicant comes in, they submit their information, the underwriting model runs and presents to them offers of credit contingent upon verification of the information that's provided. So we were given an identity, we were given an income. We need to then go back and actually verify that the things the bar were told us was true and that therefore we can make that loan, our partner can make that loan in good faith. And when we started again, you talk about ways to limit risk, we did all manual verifications. We had documents uploaded for every applicant. We were reviewing, you know, bank statements or W two's. We're looking at IDs for every phone calls with almost all of the borrowers and that was, you know, it was a process we felt was important to make sure we were really validating identity. And somewhere around I think two thousand and sixteen, late in the year, we kind of came to this question of like, Hey, what would happen if we allowed a fully automated approval? Right, like we get a lot of fraud signals and some cases are extremely low and we're very confident that the income is accurate, that the identity is accurate, and we're still doing this phone call and making an upload. If we just allowed some people to go through what we called instantly, where we didn't have any manual processing, Nope, no thing to upload, nothing to review, and we started again talk about waste. To limit risk, we start with small, small percentage of Barrowers that we allowed to go through this, even of all the borrowers that we thought we probably were comfortable doing this. When we started with a small percent we started with only small dollar loan. So you know, maybe under five thousand loans. Ten percent of the people that you think have really that you have really low scores, that look really confident that they're that they're not fraudulent, haven'ts represented. What would happen? And so we launched that pilot and we saw really interesting results on two fronts right the first was that conversions jump dramatically, between two and three x increase in rate offered to loan funded and originated versus when we had documentation requirements. We will that's big up. If you could double your business on the same application base, that's a pretty that's a pretty big win. Then we actually saw that there was, you know, a real question of are they're going to have hits to fraud or credit performance when you do this, and we actually anyway solve the opposite, which brought us to the realization that good borrowers are not only rate sensitive, they want to see a good rate, they're also effort sensitive. They don't want to put in a lot of effort, and so there can be adverse selection and when putting an ownerous process in front of a good barrower. And so we started to say, well, how do we this is working so well? Maybe we don't want it to be just ten percent of borrowers. Maybe maybe we don't want it to be just under five thou loans. Maybe we won't find ways to put more people into this low risk world where we can have them instantly approved and flowed through. So we started to think of this as a real machine learning problem. Right, and how do we look at the data sources, the fraud data sources, the identity data sources, the income data sources that we have that are feeding data and and make the most accurate prediction of if we think there's been a misrepresentation of something on the application and if we think we are comfortable, given the levels of risk of misrepresentation, not having further documentation or verification, and that want from you know, two thousand and sixteen no loans until the very end of the year when we were working on it, to now more than seventy percent of originated loans that our partners have through our platform have no documentation, and that's with the maintenance of, you know, below thirty bips of fraud on the platform, and so that's been a really interesting evolution. It's really been the secondary that we've pointed to, the engine of machine learning, if you will, as how do we identify those applicants where the automated data sources that don't introduce user friction actually provide us sufficient data to do all the verification work? And here we're looking at, you know, a bunch of third party Apis a core part of that, as also looking at...

...transactional data from a depository account through our work with with third party that can they can access that user credentialed access. Of course that's important because they need to tell us where the money goes anyway. So it's very natural for them to give us access to a bank account information. We can use that to look at things like income. But that was a really interesting M l problem and I will talk about it. Interesting for a couple reasons. Number One, you know, we started again very tight risk controls and iterated over time. But it's a heart problem because the data is not as clean and underwriting you really get a very clear answer at the end of the day of whether a bar or paid back or not. I mean at the end of the term of the loan you have a very specific did we make the right decision? Under riding. You've got, well, you know, outcome variable that we can look at and say, Hey, that that, that's a very clean piece of data. In an income verification, for instance, when you stop asking for documentation because you're very confident, based on automated sources, that the income is accurate, you don't have a validation of that after the fact, when you ask somebody that you think is fraudulent to provide a level of documentation and say this looks like very, very concerning for fraud. I want to really make sure this is Jeff I made him up load of driver's license. He never did. You don't really know that that's fraud. Right. So you you have this really interesting problem where you don't have great instance. Of course, in the case of fraud in particular, you don't want to have a lot of positives in your data set because that means you, you know, you approve fraudulent loans. So the goal is not to do that. And so this is an interesting space where the data is is less clean, the outcomes are less clear in the training data, and so it really takes a lot of work to figure out how do we build the right models, how do we tune them? How do we put the right controls and rules? And we still have a lot of hard rules in their kind of like your minimum credit score. There's a number of those that are in this system that leverages machine learning to do this automation, because it's just it's a less clean problem. But this, you know, is a problem that we looked at and said say that the opportunity to multiply, if you think about what I said, doubling to tripling the throughput from, you know, credit approve to originated loan is really dramatic in your top line growth and your revenue, but also in your cost because those are, you know, loans that are going through with very, very low operational cost compared to a long where there's a phone call or a document review that takes human time, which is one of the most expensive things we have. So we've seen great results from this, but it is an example of a machine learning problem that's a little trickier, a little foreignier. The answers aren't is clear. The opportunity is probably just as big. In many ways is an underwriting, but it is at the core, a harder problem to solve and I think this is one of those areas where you have to ask yourself what's the right what's the right problem to apply machine learning to and and how do I do that? And one of the weaknesses in the the application of machine learning to this and how do I control those risks? So those were our first two areas, really underwriting. And then this's what we call the verification process. that the up starts applied machine learning. We continue to find new and different areas to apply it. So to that to the work currently, I would say are using but are much earlier stage in our evolution of kind of applying the concepts and capabilities of machine learning to our our marketing targeting. This is another great problem that's very well suited to machine learning. And Ah, who can I advertise to, through what channel, with what message to actually drive not only in this is where you the question of, like, what variable of my targeting? Did they come to the site? Do they apply for a long do they get alone? They get a proof for a loan? Do they actually originate alone? You really have interesting questions of targeting, the very interesting challenges and marketing around. You know, if I reach out to somebody on three or four different channels, which of those is the one that's making the biggest difference and that is causing in action? And how do I optimize my spending to drive behavior? The really interesting questions in machine learning can address and we're time tackling those now, although I don't not as clean an answer. And then in servicing, where you have this question of WHO's most likely to go delinquent? Do I see particular portions of my portfolio at higher risk? Are...

...there things I can do means of outreach messages programs that I can leverage to improve the outcomes of those borrowers to help them make sure they can meet their obligations? How do I target my servicing resources at the right at the right consumers, the right customers at the right time? It's another problem that's really well suited machine learning and we are slowly applying that same engine, instead of capabilities, to these sorts of problems, as well as to new asset classes, all of which, moving underwriting from a personal loan to refi loan is a lot of work because there's a lot of different variables, there's a lot of different outcomes, its different training data. So you know, I think we feel at the very early stages, but hopefully that story of our evolution through applying machine learning those two areas is interesting and I wanted to give you, guys, three takeaways I had from kind of watching is. I'm not a machine learning expert or a person, but I am kind of have become relatively conversant, but I wanted to kind of give you three of my takeaways from the experiences we've had that I think are applicable to anybody in banking lending who's thinking about how to apply machine learning. Number one, machine learning is not a magic wand and I sometimes get this sense when I talk to people that they feel like machine learning is like a singular thing and you point it at a problem or a data set and you press the button and then, pressed though, something's something great happens, and there's a degree in which you can do that and often improve on the status quo. But doing machine learning and doing machine learning well are very different things. As my CEO sometimes like to say, both Serena Williams and I play tennis, but I don't like my chances against turn a match, and think that's very true and its the same thing is true machine learning, which is it's easy to say you do it, it's easy to do it at a certain level, but doing it at the highest levels takes time, effort, dedication and it's a very different thing than just getting into the game. And so understand, if you want to get into machine learning, that it's it's not a it's not a it's not a set it and forget. It's not a press the button and solve the problem. It's going to take work, it's going to take iteration, whether you do that internally or whether you find the third party to work with to do it. Just understand that it's a it's a process, it's a journey, not just a destination and it will take time to optimize and get right. You have to learn as well as the machine has to learn how to really take advantage of and apply machine learning capabilities in different contexts. That being said, a second lessons. Number One, it's not magic one. Number two, be really thoughtful about the problems you want to solve. What's well suited for machine learning? Where is it not the greatest solution is so, you know, I talked about underwriting, which is like one of the in the in the universe of problems machine learning as well suited to and that has data that's well suited to machine learning. Underwriting has got to be one of the top examples of a really well structured problem for machine learning. Verification Much harder, and so, you know, think about where to start, where you can get the best effort, best results and returns on investments in the early days and also, I think, another example of a place where people apply it, you know, Ai, and maybe they thought it was a I know machine learning was like kind of the chappot world, which I always was a bit misguided. Chat boxes a very hard problem. You've got to take natural language processing, you've got to have lots of possible things you could do. It's not a clear problem. You got kind of lots of layers of that. What did they say? What does it mean? What do I want to do in response? Whereas like things like underwriting or some process automations are much more cleanly suited problems towards machine learning and so figuring out if you're going to invest in the application of ai or machine learning to your business, what are the what are the processes, the capabilities where this is well suited and we can really make a difference in the near term? I think is a really critical question to think hard about, because the answers are not always obvious and making the wrong choice can make you think that the technique doesn't work for you and it just may not work where you're trying to apply it. So not a magic wand make pick your places carefully. And number three, be creative and how you control the risk that you're taking on moving into something like machine learning or ai can feel like a big risk and they're always are risks involved, to be sure, but I think there are lots of really creative ways, as you're going through this process, to figure out...

...how do I control the risk I'll give you a few examples. There were some in my store earlier. But am I, if I'm doing it underwriting my putting hard credit policy that I slowly expand over time so I have real rails that are traditional model type. Variable's credit score is DTIS that we're comfortable with controlling risk and then slowly expanding those do I apply things like in the verification model to a to a portion of my applicant poll? So I'm testing out to see the accuracy before I expand it more broadly. Am I limiting the products I'm doing it on? You know, a lot of our partners started with us in personal loans and that was a relatively small portion of their portfolios. So even while I said, hey, we want to really get some data and understand the performance this model, we can. We can use it for a hundred percent of the personal loan portfolio because it's not a huge portion of the balance sheet of the institution. So there's ways from a portfolio size or product size you can really control risk and again having kind of hard policies and verification. We still have a number of hard policies and having good fallbacks for hey, if the model is not certain. Do I have a manual process or a traditional approach I can leverage so that I don't have the one hundred percent rely on this approach? The still what we do in verification, where when there's a high, high degree of concern, we don't use an automated process. We have a person reviewing documentation, talking to that applicant so we can get comfortable with the data being provided. So you figuring out how to fit machine learning into your overall process and find the right ways to not let the fear of, you know, risk or concerns stop you, but finds are ways to limit that risk, to manage that risk, allow you to learn and iterate into more expansive use. I think is really important. So that's a little bit of upstart story of the application of artificial intelligence or machine learning and the credit world and my key lessons learned. You know, it's not a magic wand. It takes time and effort but the value is real. Number two, you know, think really long and hard about how and where to apply machine learning and how well suited the problems and data you have are to the application of these technologies. And three, be creative about how you control and manage risk, but don't let risk stop you from getting started. I think the benefits of machine learning for every industry, but but banking and lending in particular, are really tremendous for those who can figure out how to do it right. So figuring, figuring this out, learning this motion and starting your process early, because I said this iter its a get getting started early so you can learn how to leverage these capabilities is going to be tremendously valuable for the institutions that learn how to do that effectively. Upstart partners with banks and credit unions to help grow their consumer loan portfolios and deliver a modern, all digital lending experience. As the average consumer becomes more digitally savvy, it only makes sense that their bank does too. Up Starts AI landing platform uses sophisticated machine learning models to more accurately identify risk and approve more applicants than traditional credit models, which fraud rates near zero. Upstarts all digital experience reduces manual processing for banks and offers a simple and convenient experience for consumers. When, whether you're looking to grow and enhance your existing personal and auto lenning programs or you're just getting started upstart can help. Upstart offers an into solution that can help you find more credit worthy borrowers within your risk profile, with all digital underwriting, onboarding, loan closing and servicing. It's all possible with upstart in your corner. Learn more about finding new borrowers, enhancing your credit decisioning process and growing your business by visiting UPSTARTCOM Ford Banks. That's upstartcom forward banks. You've been listening to leaders and lending from upstart. Make sure you never miss an episode. Subscribe to leaders and lending in your favorite podcast player using apple podcast. Leave us a quick rating by tapping the number of stars you think the show deserves. Thanks for listening. Until next time. The views and opinions expressed by the host and guests on the leaders and lending podcast are their own and their participation in this podcast does not imply an endorsement of such views by their organization or themselves. The content provided is for...

...informational purposes only and the discussion between the host and guests should not be taken as financial advice by companies or individuals.

In-Stream Audio Search

NEW

Search across all episodes within this podcast

Episodes (86)