“Catching fraud is both an art and a science,” stated Lemonade, Inc.’s co-founder in his yearly review. Lemonade has disrupted the insurance industry by capitalizing on progress in the AI and machine learning fields. At the same time, a careful reader will also spot why the same benefits do not apply to all businesses, or to all aspects of fraud-spotting (yet).
Auditors too have started to rely increasingly on data science, algorithms, and machine learning. Algorithms help in ordering data and streamlining repetitive processes; progress in the machine learning field is providing accountants and auditors with software that can extract behavioral rules from the data and thus improve the accuracy of its output. This is indeed perfect for processing insurance claims and spotting insurance fraudsters immediately by looking across huge “training sets” of comparable data. Could this (unsupervised) approach work in the same way also for business fraud in the money flows to and from a business?
When I was in college, a statistics professor pointed out to us how bad human brains were at picking several numbers at random. “When you try to think about it,” he said, “there is no such thing as a human-generated string of random numbers. A human will never be able to come up with a perfectly random sequence of numbers: there will always be a rule, whether one is aware of it or not, because human brains are terrible at generating something that has no logic, something random.” By contrast, human brains excel at spotting patterns, comparing them, categorizing and recognizing familiar and unfamiliar ones. This is exactly what Lemonade’s bot does: a machine that recognizes relatively simple patterns autonomously, flagging fraudulent claims from real ones by recognizing from a video, for example, when the same person submitted the same type of claim more than once. This, however, applies to insurance fraud, where there is a limited number of parameters and a huge amount of data for each policy. When it comes to corporate fraud, we could argue that the possibilities are almost infinite because each organization is unique, and each fraud has its element of uniqueness too. In such a set-up, setting rigid rules might not be a good idea, as you might end up with loads of false positives, while risking to clear bad examples at the same time. A bot may be trained to find issues, “anomalies”, whether due to fraud or error, but what use would it be to present thousands of irregularities, without an indication of what they could mean and an action plan on how to follow up on the most relevant cases? What can an organization possibly do if not provided with a correct interpretation of the issues found? Too much information is no better help than no information at all.
Our goal should be to get the machines to take care of the most dull and repetitive tasks. We already have an abundant set of legal rules on how to present accounting data, and machines have been very helpful in interpreting it to streamline processes and will be even more helpful in finding mistakes in the data. What machines still can’t do, however, is to interpret and correlate all those “mistakes” and compile them in a format that is of value to the decision makers. Only then will they be able to produce output which can help organizations spot the frauds, identify the systemic weaknesses, improve controls, and use resources more efficiently.
 A training set in the context of machine learning is a set of data points that the algorithms learn a rule from, the same rule that will then be applied on a given input. In Lemonade’s case, the bot analyzed the video from the first claim, and paid it out. As the two other videos in support of the two new claims came in, the content was compared against the first video, now part of the learning set, the individual was recognized and flagged, and the claims refused.