Leaders in Lending
Leaders in Lending

Episode 1 · 8 months ago

Back to Basics: AI Lending 101

ABOUT THIS EPISODE

We've talked about AI a lot on this show, but we’ve never explained what it is exactly.

In this first part of a series of episodes explaining AI, we’re going back to the basics.

Host Jeff Keltner, Senior Vice President of Business Development at Upstart, explains the key components of AI, the use cases in lending environments, and some common challenges with machine learning and AI.

Topics covered:

- Four fundamental components of AI

- Use cases in lending

- Common challenges

- The hard work and dedication needed to get it right

To hear more from Leaders in Lending, check us out on Apple Podcasts, Spotify, or on our website.

Listening on a desktop & can’t see the links? Just search for Leaders in Lending on your favorite podcast player.

And you can improve underwriting accuracy. That has tremendous ability to increase approvals to lower rates so you're more competitive in the market, making better offers to more borrowers and also, at the same time, making returns to the institution more predictable. You're listening to leaders and lending from upstart, a podcast dedicated to helping consumer lenders grow their programs and improve their product offerings. Each week here, decision makers in the finance industry offer insights into the future of the lending industry, Best Practices around digital transformation and more. Let's get into the show. Welcome to leaders and lending. I'm your host, Jeff Kelner. This week will be an episode with me talking to you about the basics of AI. We just spend so much time on this podcast and it up started talking about Ai. Really wanted to give an overview of ai for lenders, specifically talk about what a I is, some of the key components, talk about the use cases in the lending environment, specifically where I see this being applied, and then just talk and touch a little bit on some of the challenges you should be aware of when you're thinking about the application of machine learning or artificial intelligence to lending. So that will be today's episode. We'll dive through these topics and hopefully you can learn a little something that's useful for you as you think about how ai can be applied to your business. Okay, so let's start with an introduction to AI. So ai or machine learning. I kind of prefer the term machine learning, but at the core this is about taking data on the past and using it to build a model that can help us predict the future. And so we'll talk about the core components that. And the first one is the data you're learning on, what we typically in machine learning call to training data, and I like to think of your training data as a big spreadsheet with a lot of rows and a lot of columns. And in this case the rows are the the the items you're learning from. In case of lending, maybe historical loans, historical applicants, their one unique entry into that database, and then the column is each piece of data you have about the individual and the case of an underwriting model, it might be each individual variable you can get in the credit bureau. Could just be a credit score. For some simpler models could be several hundred pieces of data, could be alternative types of data, but the columns are kind of each thing you know about that individual in the case of lending, that you're going to use to try and make a prediction in the future. One of the things that I know, and so there's obviously a big question we'll talk about out. Where do you get the training data to start from, which is a large problem machine learning. There's also this questions like how do I get my columns right? And one of the most interesting things, I think that will make a lot of sense to people, is this concept of what we call feature engineering. It's a fancy word, but it really means taking some of the columns we know and other pieces of information and using it to generate additional columns of data. The most simple version of this, almost every lender uses, is just a debt to income ratio, right, like the two things you know about a borrower might be their debt and their income, and we use those two together to make a variable that's more predictive than either their that or their income, which is the ratio. And of course there's lots of ways to do with that income ratio. You can go engineer a payment income ratio, a debt income ratio that excludes certain things. Right, then maybe looks at housing or doesn't look at housing, or look at certain kinds of payments. And so another example of this is just credit utilization. Right, we might have total available credit, total outstanding credit, and we can make a credit utilization of different kinds. And so there's lots you can do to take the training data you have enrich it by kind of engineering new columns, by looking at the pieces of data we have. The second key thing is, now that I've got my training data set right, my big spreadsheet that I'm going to use to train a model to make predictions about the future, I need to ask it what am I going to ask it to predict? Right, and said may seem simple, but there's actually a lot of thought that can go into what are we asking the model to actually to do? What's the work we want it to do? What's the thing we wanted to predict? In underwriting, the most simple example is, is this a good loan or a bad loan? Yes, no, binary will default, won't default. It's a very simplistic thing to ask it to do. You might start to have more you know, I can be a binary good, bad. You could also get to something more granular like probability of default.

What's the likelihood, from zero to one, that we think this application, this loan, is likely to go back? Right? That's a little more granular than good bad. It could give you ten percent, twenty percent, thirty percent, forty percent, which gives you the ability to maybe make a more granular decision, to understand the risk level you're taking on more specifically, to price more specifically. And the context of fraud, you can have something as simple as you know fraud, you ask fraud know, you probably want different kinds of fraud signals. So different things you could ask. Is this person lying about this or lying about that, versus is this a fraud's line? Application in the case of marketing is a great example where you may be asking a model to predict well this person respond to a piece of marketing. Will they come to the site and actually apply? Well they likely to be approved? Are they likely to convert? Those are all different questions we could ask the model to predict for US each of them, with with pros and cons, right and so this is one of the things you'll find in machine learning. Is it. It's it's not simple to apply and you've got to understand the limitations of your training data and that may inform the outcome that you're trying to predict them. I going to ask this model, is this person likely to get alone what I market to them, or just are they likely to come to my site and respond to it and they move? May both be useful questions, but you need to understand what you're asking for. The third thing to what I think about is what's the form of my model? So if I now know, you know what what I've got to train the MODEL, what I'm asking it to predict. My next real question is what's the structure of the kind of algorithm? If you think about a most common form of machine learning, and I do call this machine learning, like a linear regression right, the form of the model is a line. It's got two variables, write the xny access and we're going to draw a line through our points and try and give our best approximation, and that that's a linear regression. There are obviously much more complicated forms, logistic aggression being kind of the next step up, and then you can get into binary decision tree, the boosted trees, you can get into neural networks, and in the details of these model forms are kind of outside the scope probably of this podcast, at least today's session. If if you guys are interested, we can dive into that more and a different session, but it is important note that. You know, you need to think about what forms. Each form has some limitations. How many variables can it effectively utilize? How much training data might it need to be able to make a good decision? Neural networks don't work with small amounts of data. They need very large amounts of data. So you need to kind of make the right decision on the form based on the kinds of things you're trying to predict, the kinds of training data you have in other factors. So you know that's another of the key things that figure out in the machine learning contact is what's the what's the right form of my model? And this may shift over time. Right in the last things. How do I measure if it's being accurate? And this is a place where for many lenders, I think, you start to get metrics that are not standard. We might be used to seeing the fault rates in a portfolio by credit score and you can generate those, but typically machine learning scientists data scientists look at more specific metrics, Kstat, log loss, you know, RMSC, the room me and squared error, so more statistical metrics that are looking at. You know, if you think about that example of a bunch of dots in a line through it, how far away on average are the dots? Is roughly what you can think of as our mess as like. How you know what's that was the average air that I've got across all my traindy. That's why I'm getting a metric of how accurate the model is at making a prediction. Of course, there's also sometimes times that you might want to have different kind of metrics. Up starts built several internal metrics that we think are more useful metrics for the accuracy of the model in the context of lending. So another key question if you said I figured out what my training data will be, I figured out what I want to predict, I figured out what form is best for me to use and I got to figure out how I want to measure the accuracy of this model. Is it improving or not improving and how is it changing over time? And of course, the challenge for all this for a machine learning use case is that a machine learning. This thing is constantly learning, it's changing, and so the answer to what's my chaining data? Do I need to do more feature engineering? What's the right form? What's the right accuracy metric? Maybe things that shift over time right, and so it's it's not a static answer. It's really an evolving thing and you know, there's still a lot of work that goes on in our world and in all these areas to figure out what's the best way to...

...build a machine learning model to achieve the outcomes that we want. So that's the basics of machine learning with a bunch of training data. The basic process of machine learning is also beyond the scope of this but it's essentially, if I'm going to go back to that example of a bunch of dots on an XY graph or we're trying to draw the line through it, it's just trying a whole bunch of different lines and trying to find the line that has at least error by the accuracy metric we've chosen, where the dots are closest to the line. And of course it's it's much more complex and multidimensional space with thousands of variables, but the principle is the same, which is we're trying to use all the data points, all the columns to make predictions about some outcome that we have for our training data. But we want to predict for a few your people. So we know if historical loans were good or bad. We want to predict a future loans will be good or bad. We know all the things we knew about the old loans. We want to build the most effective formula to make that determination using pretty sophisticated techniques. So it's not simple Algebra. It's not going to be just a line, but I think if you think of that linear regression example of a line through a bunch of dots and trying to find the best fit line, it's a pretty good mental model for conceptually what's happening. So it's, I think, the basics of I think some of the for core fundamental concepts for machine learning, our official intelligence, that are useful for learners. Let's talk about where you can use this in the context of lending, and I really see use cases across every aspect of the lending life cycle, if you will, where this is applicable. But let me we talk about for that I really come to my mind as the key things we want to do and I'll just kind of go through them sequentially as a consumer might or a customer might of a learning process. The first is like marketing, and you can imagine the number of problems that you can apply to machine learning. Which which marketing copy is doing the best? which things are most effective, might be different for different kinds of people. So we might want to make a prediction of what what piece of copy should we show, or what channel a should we send? A direct mail or email, or what are these what are these models effective at? Could be very interesting. And so you've got this kind of who am I targeting? And here, obviously what we're trying to do is drive down the cost of marketing per originated loan. If I can better target who I'm marketing to, what channels I'm marketing to them, then what messages I'm marketing to them with, than I can really drive down my cost to acquire. I can improve the economics of my program or, alternatively, continue. What you typically find is this gets better. You go hey, I'm okay with the economics, so if my targeting gets better, right, and targeting can actually improve things like even your approval rate, if I know who to market to better. It might be who's going to respond or WHO's going to actually qualify for the loan. If I can get that better targeted than I can actually lower my cost things. I can I could actually mark it to more people cost effectively. So that's really valuable. The second area, which is kind of the one that brought up start into the business in general, is the under rutting, the assessment of risk. How risky is it to make this loan to this person or this business? In here? There's so much room. Generally, what you find in the world today is that there's a lot of unexplained risk, right and you know there's never you see lost pools that are ten percent or twenty percent. That's really an estimation saying hey, we don't really know, like there's could be yes, could be no. A perfect model and our mon on our world would always say yes or no, have no losses and give everybody the lowest rate. That would be a perfectly predictive model and we're a long ways away from that. So there's a lot of unexplained loss in the world of underwriting and if you can explain more of that loss, if you can be more accurate in your assessment of how risky given loan is, you can typically, without increasing losses, increase approvals because you can find in that if you can take that ten percent lost pool and figure out who the ninety percent are and who the ten percent are more accurately, you can lower the rates for the ninety. If you can to that eighty percent good, twenty percent bad pool and say hey, I know who the ad are. I might have had to decline that whole pool previously to keep my loss rates reasonable. If I can find some portion of the aaty, maybe forty out of the eighty, I can actually add those to my approvable world. I can lower my my expense to the borrower by lowering interest rates where I better understand the risk and I can get a more predictable yield. If I really understand it a more granular level what the risk in my my portfolio is, then I can actually be more flying tune and assigning my...

...interest rates and therefore understanding my interest rates and my losses. I'm more fine tuned in understanding what my yield as a lending program is going to be. And so when you can improve underwriting accuracy, it has tremendous ability to increase approvals, to lower rates, so you're more competitive in the market, making better offers to more borrowers and also, at the same time, making returns to the institution more predictable. You can obviously dial the returns knob either way in terms of the credit offers you want to make, but you have more control over that when you have a better understanding of risk. The third area, which I think many lending institutions consider like underwriting, we view it a little differently, is what I'll call the onboarding process, and that's really everything from I got an application, I assessed all the information as provided to me and made a risk assessment. I consider that underwriting or a risk. Now I've got the risk of, like, what's something in that file misrepresented as the income what it said it was. Is the person who they said they were? They using the money for what they said they were going to use the money for? Is there something I didn't know about them? That's that's likely to impact my my decision right, and that's another area where I think we see a lot of opportunity to apply machine learning to both increase the accuracy of these decisions of these estimations of risk here, while reducing the friction to the bar. And so this is really interesting set of problems for machine learning or for any lender in general, which is that your goal isn't just to minimize fraud. We could do that by having super strict requires. Everybody comes in with a birth certificate and a social security number and a passport and a driver's license and utility what. You can make a really cumbersome process that would really eliminate any chance of fraud or lying or misrepresentation about income or anything else. But that process would have almost nobody completing it, and so you have to really balance here the the concept of the friction or the effort that you require of the applicant to get through the process with the accuracy of the model. In so that's a really interesting trade off. What we're saying, Hey, you know, at a certain level of risk of misrepresentation, it might be worth it to let the person go through. Ninety nine point nine, ninety nine percent sure. Am I okay with that? Probably yet, and so figuring out that balance. Obviously that's probably too far. Eighty five percent sure, probably not far enough for most things, and so you've got to find this balance for each piece of information you're trying to verify between the cost of requesting more effort to be done by the bar were to validate the information provided and the benefit to you and accuracy or reduction to loss if that's a it's a really interesting fine balance. It's not as clear cut as, say, underwriting decision, where we have a yes, know was a good we know in the end if a loan repaid or not. So we can determine if we made a good call. Somebody is much harder and fraud and marketing. Obviously you have much more clear outcomes and much more clear things. We want to lower the cost. We want to people who get loans to be the ones who receive marketing. That's, I think, a little easier to handle in this and then the fourth area. So if you go through marketing, Risk Assessment on boarding. I think the fourth area I would talk about is servicing, and that's really a question of WHO's maybe at risk. Based on doing regular looks at their credit files, a customer may be becoming delinquent. We want to proactively reach out if somebody's delinquent. What kind of outreach? What time of day, what kind of messaging is likely to actually resonate with them, so we can improve our role rates, improve our ability to recover borrowers who maybe go delinquent back into the fold and also reduce the cost of doing that so we're not wasting a bunch of time calling people who would rather be texted or emailed. We're utilizing our resources effectively to get the outcomes we want to borrowers being able to back come back onto repayment. So all of those areas I think are really right for the use of machine learning to make better predictions about who to market to and how, who to give offers are credit to and what kinds of offers, what kind of request to make through the verification on boarding process of the customer to validate information provided, and what can we use through automated mechanisms or or just believe based on the information we already have? and WHO and how should we be reaching out to people who we think are either are delink one or risk of becoming delinquent or they're...

...all prediction problems right. They're all things that we have historical experience of training data that we can apply to a priction. We to make. Is this person likely to go to link? When is this person likely to respond and get alone? Is this person likely to repay? That helps US improve the outcomes of a lending program. So I think you'll see machine learning, ai applied across all of these. Were different states of sophistication within the industry at each of them, but I think those are really all very ripe areas for the application of this kind of technology. The last thing I want to talk about a little bit with some of the core challenges. When you say, Hey, I'm going to go applied machine learning to one of these challenges, one of these areas, I'm want to go do it for an I'll talk a little about different examples from each of those four. But what what should I be aware of? What should I be thinking about? Is Its potential problems or challenges or decisions I need to make. So the first one is machine learning takes a lot of data and one of the core questions you have to ask is do I have enough data so I and buy enough that I really mean both the rows, the examples for underwriting. It might be historical loans, and do I have enough columns, enough pieces of data? And this is an interesting challenge. There's also the question of am I going to use what I will call first party that of my own data? In the case of underwriting model, maybe it's my own historical portfolios to train. Might get to use third party data, where I might go do a retro with a Credit Bureau and I can pull a bunch of information on that. In the case of a retro, there's limitations, right. I may have fewer columns available, I might have less granular repayment data. Might know good bad, but I might not know month by monthly payment history. So things like predicting timing of possible defaults, predicting prepayments might be things you want your model to do and it might not have the data from a third party source. But your first party source might not have enough rows to actually train a more sophisticated model. So it's really interesting challenges. And how do we get to enough volume of data for the purposes we want? One of the other challenges when it comes to quantity of data is do I have enough good and bad examples in my favorite example here is fraud, right, where, if you're doing a really good job fighting fraud, you don't have a lot of examples of what successful fraud looks like because you caught it and it didn't get through, and those people may not come to you as much and so you may have lesser experience of yeah, this was fraud, and so your models might not be able to train as easily. This is a place where there's often a lot of utilization of third party data sources and models, maybe even training models, because it's you know, if you're doing it well, you don't have enough. Same thing can be true loans. If you're identifying all the goods, you don't know the bads. In this can can be challenging, particularly if you want to expand the kinds of loans you're doing. I'll talk about that and minute. And generalizing your model. Another challenge, I think, in terms of getting enough data is a question of if I want to use alternative data, information maybe is not only available, I probably am tied to first party data right as if something that I can't get from a credit bureau about applicants historically, then the place where I can have that data is when I collect it, but of course then I only have it on the people for whom I've collected it's this represents that we call off and it kickstart challenge kind of like how do I get enough data about what these variables, these pieces of information, might tell me about credit worthiness, likelihood to respond, likelihood of fraud, without having had them on historical portfolio? And so there's a real challenge. And how do you get to a place where you have enough points of data of the different kinds to build models? And that's why as you as you use models more, you get more data and your ability to do more interesting feature engineering might increase because you have more data points and you're starting to see trends. As you look at it, your ability to use a more sophisticated model type or shift to a more sophisticated outcome prediction that you look for, moving from a binary good, bad to maybe a probability of default to maybe a timing of default or timing and prepayment. Those things might shift as you have enough data within your system to actually make a more sophisticated prediction using a more sophisticated form. Because you've been lending us. One of the challenges and linning is you have to lend money to get the training data to train these models. So it's a real, like I said, kickstart problem of how you get going in in case of lending, it case takes money and takes putting risk in the game. The second thing you've got to ask is like, not only I have enough data, but it is the model I'm building going to g generalize from the data that I have...

...to the examples that I will see in the future? Because I've got this historical pool and I wanted to know, like this, this pool and accurate representation of what the pool of people I might be making decisions about in the future going to look like. And therefore, will this model, mottle, be as accurate on future applicants as it has been on is it is on the training data set? And this is one of the core problems for any machine learning challenge, is like is the future going to look like the past in some real way? And here there's a couple interesting things to think about. A great example this is, let's say we want to use machine learning to expand our credit blocks. Right, well, that's great, but if I've never lent to people below a certain credit score, then I don't I don't know if my model generalizes that well from where it's been trained. With a training data is to people who have lower credit scourse. It seems like maybe it should work. You know, here's a place where you might ease it ten or twenty points and allow ourselves to build training data on a slightly lower credit segment. You might work with a third party who's been lending in different segments. You might use a retro from a from a bureau data. But you've got a really question. How do I make sure that the data I have is generalizable if I'm moving to new geographic markets, if I'm getting two new kinds of credit products? Another great example this is bias you might have and your training data in terms of the macroeconomic environment. If you if you're training data, if you're using your own lending portfolio and it's predominantly in positive economic cycle, it may not be a good generalizable model for what happens in a more stress economic scenario. As one of the core problems for machine learning models that have been built recently is how do I account for both? Is My model going to maintain accuracy but it's going to be overly optimistic as we enter you know and more stressed environment, and how do I account for that in terms of my overall credit decisioning process, in terms of my model adjusting for that kind of challenge right is a really good one. That's a great example of generalizable one of the you know, the core things that you can do is looking for the edges of your model where it might not generalize, making sure, as you build your training data set, that it is, you know, representative of the overall a problem you're trying to solve, the question you're going to be asking and making sure that, again, that the outcomes that you're using historically are a representative of what you want the outcomes to really look like, not that maybe biased outcomes that you're using to train, because you will, you will learn from those things. So I think going from a we have enough day to train a model to have we we built a model that's going to generalize. If we've seen enough examples of fraud, have we what's going to happen when there's competitive dynamics in the market that maybe change? How do I think about responding to those? is a really tough challenge. The last thing I want to say is not a specific challenge, but I think it's an area, as you get into into machine learning, that is often underappreciated, which is just the execution quality required to do this well. I think I think it's some circles. Now there's a sense of like machine learning is like a tool and we planted at a problem, that it solves the problem and that there's some sort of commodification of the execution within Ai, that hey, hey, this person is using machine learning, that person's using machine learning, they're using machine learning. is going to kind of tell you the same thing, and I can say that is there's not at all the case. Right, like any challenge area, like a sport, like anything else, it takes hard work, dedication and time to get good at this and you know, even at a company that's been doing this for nearly a decade, we still have a long way to go even in the core problems we started on, because you just are constantly finding ways to improve the feature engineering, the model, forms of training, the training data, you're able to use the techniques. It's really hard to do this well. It's not simple, it's not commodified in the sense that anybody and kind of taken off the shelf model. I'm going to go grab exg boost library throughout some historical lun portfolio and it's going to like you can get real improvements, I think, in many cases from the status call. In that way, I will say some of these challenges apply. You can get things that look really good in training data and they look really good for an accuracy metrics and don't don't apply well in the real world because you didn't have a generalizable training data set. You Miss Something and then I think there is just a...

...tremendous difference between a first iteration of a model and what you can develop over time through hard work and execution in the space. So I would just say don't underestimate the need to put time and effort. Your first iteration is never the best right. There's always places and ways to improve. We still think dramatic ways for us to improve it upstart and all of the models we use, because it's just not an easy thing to apply this. It's an evolving stay to the art and it's a really interesting space, but it's also a challenging space. I would just you know, I think it's easy to underestimate how hard it is to do these things, how hard it is to get enough of the right kinds of data that's a broad enough to make a model that generalizes into the real world and that executing that in doing all of these things, picking the right outcome prediction, picking the right model form, finding the right accuracy metrics, building the right data sets right, both through feature engineering and other other efforts, is nontrivial work and it's hard to do right in it it really accruise tremendous benefits to those who do it well. So those would be my three things to look out for for. Right is like you know, do you have enough data of the rock or right kinds of data? You know, breadth and depth of data, that is the model constructor using going to generalize from the training data sets you have to the general problem you're going to solve or their places where maybe there's challenges you to think about. Be Very cogniz of that. And number third, three is like don't underestimate the import of execution and time to execution. Like the machine learning scientists are are learning, just like the machine learning models are learning. Their learning over time and their data set, their training data, is models they built and models they've trained in the results of those models, and so it takes. It takes iterations for your machine learning team to build the ideal models and sometimes it just takes data. Like we need to accumulate data to actually be able to improve on these different areas, to have more trained data into alows use different forms, make more precise predictions, to have more refined accuracy metrics. So all those things are really critical to get right. There easy to get wrong, but I think there's such tremendous opportunity to apply AI and ml to, you know, marketing opportunities, to underwriting risk, to onboarding and friction production, to servicing, that it's clear to me that all the winners and learning will be leveraging these technologies across all those areas over the next five or ten years and I think it's really behooves anybody in the lending industry to understand these technologies better, understand how they work, to understand not in depth but the core of how they work, like we've talked about today, to understand where they can be used and understand the questions you'd be asking to make sure you're executing them well and finding the right models or partners building models, are building the right teams and processes to build the models that are going to be effective for your use case, because I think it's easy to go the other way. So that's my quick intro to machine learning and AI for lenders. Would love to get some feedback on if you guys enjoyed this discussion. We can certainly dive deeper on specific topics of fairness or model forms or anything else. Just let me know what you guys want to here about. We're happy to delve a little bit more. And the area of Aix, I think it is a kind of leap frog technology in terms of capability for producing predictive, accurate, predictive models, and that is going to really revolutionize many industries, but lending in particular, and so it's a great time to be learning about machine learning, an AI and how it's going to apply to your industry in your business. Up Start Partners with banks and credit unions to help grow their consumer loan port folios and deliver a modern, all digital lending experiments. As the average consumer becomes more digitally savvy, it only makes sense that their bank does too. Up Starts AI landing platform uses sophisticated machine learning models to more accurately identify risk and a prove more applicants than traditional credit models, which fraud rates near zero. Upstarts all digital experience, reduces manual processing for banks and offers a simple and convenient experience for consumers. Whether you're looking to grow and enhance your existing personal and auto lending programs or you're just getting started, upstart can help. Upstart offers an into in solution that can help you find more credit worthy borrowers within your risk profile, with all digital underwriting, onboarding, loan closing and servicing. It's all possible with upstart in your...

...corner. Learn more about finding new borrowers, enhancing your credit decisioning process and growing your business by visiting UPSTARTCOM forward banks. That's upstartcom forward banks. You've been listening to leaders and lending from upstart. Make sure you never miss an episode. Subscribe to leaders and lending in your favorite podcast player using apple podcast. Leave us a quick rating by tapping the number of stars you think the show deserves. Thanks for listening. Until next time. The views and opinions expressed by the host and guests on the leaders and lending podcast are their own and their participation in this podcast does not imply an endorsement of such views by their organization or themselves. The content provided is for informational purposes only and the discussion between the host and guests should not be taken as financial advice by companies or individuals.

In-Stream Audio Search

NEW

Search across all episodes within this podcast

Episodes (78)