17.5 Profit Analysis
There are often financial rewards of correctly identifying true positives and costs for predicting false positives. This section will describe how to calculate net profit on the results of your models.
Name | Description |
---|---|
Fixed (setup) costs | Cost to setup the project or campaign. These costs are incurred whether you treat observations or not. Examples include printing brochures, buying machines to conduct campaign, and buying resources such as prospective customer names. |
Treatment cost | The cost incurred to pursue each predicted positive. For example, if we are predicting fraud, this would be the cost to investigate each fraud. If we are attempting to sell a product, it is the cost in time and money to approach each person. |
Net revenue per response | Net profit after expenses for each correct positive. For example, it would be the financial payoff of correctly finding a fraud. In the case of selling a product, it is the profit from making the sale. |
Performance Car Sales Example
Assume you conduct data mining for a seller of sports cars. Your company purchased information on 2000 potential customers from a marketing research firm that captures information on potential buyers for specific markets. Your firm then did test marketing by approaching these potential buyers with the promotion. Records were kept of the results.
Assume that based on the test marketing results, you built a model to predict which potential customers would purchase a car. You used 1200 of the results for the training and 800 of the results for validation. Here is the financial information related to the marketing campaign.
The marketing firm charges $30 per name. We pay for the names whether we sell a car or not so this is a sunk setup cost. The treatment costs in this example refers to the cost in terms of time and money to approach a customer and offer them an extended test drive. This cost includes the wear and tear on the car loaned for the extended test drive. The net revenue per positive response is how much the firm profits after expenses for each car it sells.
Field | Amount | Description |
---|---|---|
Fixed (setup) costs | $60,000 | 2000 names x $30 per name |
Cost per treatment | $220 | Approaching customer and extended test drive |
Net revenue per positive response | $2000 |
Here is the resulting model:
Out of the 800 records in the validation set, 55 bought the car. The first model predicted 54 car buyers, which included 44 TPs and 10 FPs. We refer to these as being “treated,” which means that based on the model’s recommendation, the seller would approach these people and offer the extended test drive. Thus, the $220 cost is incurred for each person treated, 44 which will buy the car and ten that will not.
The 11 FNs are people that purchased the car but that our model predicted would not. When we apply this model to new data, these FNs represent an opportunity cost for missing these people. But because they were not predicted to be buyers, we would not spend the money to treat these people.
The figure below shows the results when financial information is added to the modeling results. The total cost of $35,880 includes the $24,000 cost of names plus the treatment cost (54 * $220 = $11,880). Net revenue was derived as (44 buyers * $2,000 = $88,000). Net profit is net revenue minus total cost.
The first model used the default probability threshold cutoff of 0.50. The second and third entries are the first model after changes in the probability threshold.
As the cutoff threshold was reduced from 0.5 to 0.34, TPs increased from 44 to 49. FN fell from 11 to 6. This comes at the costs of an increase in FPs from 10 to 29. Because these TPs pay more than the FPs cost, this increased profits.
As the cutoff was further reduced from 0.34 to 0.23, TPs increased from 49 to 50. FN fell from 6 to 5. But FPs increased from 29 to 50. Net profit fell because the large increase in cost from FPs costed more than the net revenue from an increase of only one TP.
How much would profit increase if the results were adjusted by a multiplier to reflect 10,000 potential customers?
The cost of names would increase from $24,000 to $300,000 because of the purchase of additional names. The new number of potential customers increased to 10,000 which is 12.5 times as many as in the original modeling result (10,000/800 = 12.5). The spreadsheet below shows the results when the numbers in the confusion matrix are adjusted by the multiplier and the financial results are adjusted accordingly.