Fraud Detection in the World of Bitcoin

Publish date:
Op-ed - Fraud Detection in the World of Bitcoin

The Bitcoin protocol is strong cryptographically (to the best knowledge so far), and we wish the world encompassing its network and users were as secure. In this article, we will review several classes of attacks, how a feature of the Bitcoin protocol in conjunction with online wallets will help prevent them and we’ll end with an overview of why fraud detection is a difficult endeavor.

To avoid any confusion, in the context of this article, “fraud” refers to a transfer of funds to a destination not authorized by their legitimate owner. Double spending or transaction malleability have been covered extensively elsewhere.

Pretty much everyone reading this article will know Bitcoin transactions are technically irreversible and this makes them very attractive for merchants because the received funds are immediately spendable. In some jurisdictions it might be possible that the merchant is forced, through a court order, to refund a transfer, but this case is a legal matter, completely unrelated to the method of payment (nonetheless, for consumers it is a good resort to have in the rare case of rogue merchants).

Bitcoin’s use of digital signatures ensures transactional integrity and non-repudiation. The network, which verifies the transaction, cannot check however whether the private keys were provided by the legitimate user or another party. That step is outside of the protocol’s boundaries and is the wallet’s responsibility to authenticate the user and unlock access to the private key. The wallets or the users are therefore the targets of malicious attacks.

Attacks Against Wallets

The level of security provided by wallets varies with the implementation, in terms of both features and quality. Naive implementations will simply store the wallet unprotected or weakly protected (once the file is stolen, a short PIN can be trivially brute-forced) and this is why wallet-stealing malware is one of the easiest attack paths. In fact it is so obvious (reminiscent of Willie Sutton’s famous yet apocryphal answer: “because that’s where the money is”) that two researchers, Pat Litke and Joe Stewart, have recently catalogued 146 distinct Bitcoin-stealing malware, out of which only half were detected by anti-virus scanners (read a good summary or, better, the original paper). Even if the wallet is coded diligently, given that run on a software stack, the underlying layers can have vulnerabilities which, however difficult, will be found given time (the money to be stolen is a great motivator).

Several approaches exist to mitigate this risk. A common one is using two-factor authentication (requiring a telephone with a registered phone number which will receive a one-time code via SMS or a voice call. Google Authenticator is a popular choice). It is a good complement for computer-based wallets, but less so for mobile wallets residing on the same phone which can be stolen. By the time its owner removes the authorization for the phone, the funds might be gone. How can a thief know the wallet password? The same techniques as for stealing the PINs of physical cards: “shoulder surfing” or cameras filming the user while performing a legitimate transaction (a club’s bar is a high-risk location, for instance). Keylogger malware is another popular approach.

Other approaches consist of “brain wallets” which do not store the keys but generate them from a (hopefully long) passphrase memorized by the user. This is harder in practice, not only because occasional users may rightfully be wary of forgetting it (carrying a paper with it in the physical wallet would negate the security but it will happen nonetheless) but also because a good passphrase would be about50 wordslong and this would make frequent use cumbersome.

Lastly, paper wallets and hardware wallets are more secure, but they are more suitable for the cold storage of funds, rather than daily use. Diligent users will employ them, but do not expect this to be the norm.

In fact, the wallet does not have to be stolen to be used as a source of money. Ransomware, malware that encrypts user files and demands a payment for unlocking them – CryptoLocker being the most (in)famous — can provide a steady source of income through ransoms, regardless of whether the private keys are obtained. It is unlikely that such malware will be written for Bitcoin wallets specifically (why limit the attack?), but wallets will be taken with all other personal files. This is where cold storage of most funds, Bitcoin’s equivalent of “don’t keep all your money as cash with you”) as well as backups are absolutely essential.

So far we’ve talked about user wallets. On the other end of the wire, any online site storing bitcoins, be them wallets, merchants or exchanges, should be prepared for advanced persistent threats and followers of the excellent Krebs on Security blog will be familiar with how indirect attacks can be, going through third-party suppliers of the target (overused pun not intended). Indeed, attacked have they been. Most high-value attacks today are profit-motivated and supported by organized crime (by the way, someone tried to impersonate Gavin Andresen on the PGP key servers). Although PCI-DSS is not perfect, its recommendations are mostly applicable to Bitcoin businesses.

Attacks Against Users

Why hack the user’s computer when you can persuade him to pay you? Scamming is as old as the world. A Bitcoin address does not have any identification about its owner so simple persuasion to send money to the scammer’s address can work well (side note: what at times is a feature is, at others, a usability shortcoming. However, the idea of a resolution mechanism to associate a computer-friendly wallet address with the human-friendly social networks identities or a directory has facedconsiderable critiquefor fear it would create a two-tier system that will eventually destroy Bitcoin. A good decentralized solution is still in the future).

Every event involving a loss of bitcoins or even a service outage has the potential to be used by scammers for phishing (or fake Twitter accounts soliciting donations) and as Bitcoin becomes more popular, we can expect the wave of donation requests to fraudulent addresses following earthquakes or other disasters.

To be clear: this is not a vulnerability in the Bitcoin protocol. After all, the legitimate owner of the funds decided to make a transfer. It’s a plain scam, not a hack. Nonetheless, if the ecosystem will have features reducing the incidence of such scams the trust in the system and the adoption will be higher.

A different type of attack is one in which a Bitcoin-accepting website is hacked and the destination address modified to be the attacker’s. The attack cannot last long for merchants, whose checkout process will notice the funds have not arrived to their address but sites accepting donations may be exploited for longer times.

Transaction Screening

The conceptual answer is to have the transaction screened for fraud outside of the device originating the payment which can be under the control of an attacker or malware. This is similar in principle to how a credit card transaction initiated by the consumer is vetted and, to implement it, Bitcoin has an elegant mechanism: multi-signature transactions, requiring more than one party to sign a transfer. While they serve multiple purposes (escrow being a typical example), in this case a second party would be a service that screens transactions for fraud and only if the transaction is okayed by it, then it is broadcast to the network. Vitalik wrote an excellent (as usual) primer earlier this month and I’ll invite you to read it rather than covering the same topic here. The tutorials by James D’Angelo’s on YouTube are also highly recommended.

Who can be that other party screening the transaction? The web wallet is the first choice and, in fact, this feature is already implemented and promoted by CryptoCorp right on the front page, a proof that fraud prevention can have marketing value and not just make the finance department happy. Notably, the use of a web wallet provides more data points since the IP address and device information can be used as inputs and detect, for instance, if the network location of the user’s wallet has jumped across an ocean within minutes. Security-conscious users will prefer not to use online wallets, yet I think it is a safe bet to say that the convenience will keep them popular, as with cloud-based email or file storage. The wonderful thing is that multi-signatures, by requiring both the user and the exchange to cooperate, make this option arguably more secure than single-signature local storage given the permanent threat of having the hot storage hacked.

After an initial period in which online wallets implement their own solutions, if at all, I do expect some consolidation to follow in time, with fewer third party fraud detection services being used by more wallets. It’s part economics (pay-for-service is often cheaper than build-your-own, particularly a build-your-own-complicated-system), part efficiency. Collaborative fraud detection is already used by the payment industry, with merchants and financial institutions sharing fraud data with a number of vendors or among themselves with the goal of reducing it for everybody (disclosure: I’ve been involved in building such systems, but none of them is public or commercial and I hold no financial interest in any such vendors. Nonetheless, I have not named any companies).

Why Fraud Detection Is Not Easy

Fundamentally, detecting fraud is hard precisely because it is rare, dynamic and not necessarily obviously fraudulent. I think we can safely say there will not be a perfect fraud detection system. What is a perfect such system? One that detects 100 of all fraud without ever mistaking a legitimate transaction for a fraud. 100 sensitivity and 100 specificity, the Nirvana of classifiers, to throw in a little bit of statistical jargon here (it goes well with crypto).

Perhaps all techniques have been tried in the quest for a “better fraud trap”. Rule engines, neural networks, genetic algorithms, random forests, hidden Markov models, clustering, support vector machines, outlier detection — and I will stop here. Why this many? One reason is that new techniques are being invented. For instance, random forests are a development of the last two decades, enabled by the increase in computation power.

Another reason is brought about by the rarity of fraud. Suppose 1 transaction is 10,000 is a fraudulent. In order to train a binary classifier (the system that answers the question “is it fraud or not?”), past data, for which known fraud transactions are marked as such, are fed in to tune its parameters. When an algorithm is trained on 99.99 good data, the remaining 0.01 is just noise, not making a dent in its parameters. In other words, the needle is just noise in the haystack.

There are ways around this – after all, fraud detection is done almost exclusively by software today – the point is that it is not easy, particularly beyond catching trivial fraud. The fundamental tension of all binary classification systems, equally applicable to other domains, such as detecting cancer signs in mammograms or classifying galaxies, is between the ability to detect the target and the avoidance of false alerts. The more sensitive a test is, the higher chances of false positives. When this relationship is plotted it is known as a ROC curve and used to rank different strategies. In business terms, attempting to catch too much fraud will eventually inconvenience customers who will have legit transactions blocked or delayed and finding where to draw the line takes time (a good read straight from the real world: Coinbase’s own blogposton this topic).

Why this exposition on fraud detection here? Because the inclusion of fraud detection capabilities, while desired, will increase the complexity of a provider’s operations (a provider being a wallet or exchange). On one hand, it brings trust and financial value. On the other, the system will need to be maintained, kept up-to-date with changes in fraud patterns (more complex and expensive systems adapt automatically)and business processes will need to be defined to deal with the algorithms’ imperfections.

What about the future beyond the immediate? Well, for one human nature will not change and fraud attempts will continue to occur as long as the block chain will exist (will “before the block chain” be the next generation’s “before the web” placeholder for the prehistoric times?) and securing the entire ecosystem, even more so when more complex features such as smart contracts are added, will continue to be part of it, along with mining and development. The price of security is eternal vigilance, decentralized worlds included.