Apr 17, 2024 11 min read

Can Large Language Models Improve Security and Confidence in Decentralized Finance?

By: Roger A. Hallman

1 Introduction

Smart contracts are one of the foundational technologies in the decentralized finance (DeFi) ecosystem. These self-executing programs, where the terms of the contract are written directly into the code, allow for millions of automated, trustless daily transactions–at a value of billions of dollars. However, the amount of money moving through DeFi exchanges has attracted malign actors in addition to honest users conducting legitimate business. Smart contracts with malicious code are one of the primary means that malicious DeFi participants use to defraud honest traders.

These malicious smart contracts are attractive attack vectors for crime on DeFi platforms because they are difficult to detect a priori. Detecting malicious code through static or dynamic analysis takes time, and cryptocurrency traders often operate on timescales that are incompatible with vetting the smart contract’s code. Moreover, many traders lack the technical knowledge to properly evaluate whether malicious code is present in a DeFi liquidity pool’s smart contract. While financial institutions are known to make it a priority to hire computer scientists and engineers, it is unlikely that these initiatives can satisfy the technical knowledge gap and bring the necessary expertise to the trading floor.

Recent developments in artificial intelligence (AI) offer DeFi traders a cost-effective defense against malicious smart contracts. Large Language Models (LLMs) from OpenAI, Google, and other companies can read through smart contract code much more efficiently than any human analyst, enabling the detection of malicious code before traders have committed funds to a liquidity pool governed by a malicious contract. Tools such as GitHub Co-pilot1 are already demonstrating the potential boon that LLMs can provide for augmenting the technical workforce.

This article explores the potential use of LLMs for detecting malicious smart contracts in the DeFi ecosystem. As such, detailed background information for Decentralized finance, malicious smart contracts, and LLMs will be given in Section 2. Section 3 will survey the use of LLMs for cybersecurity, with a focus on code analysis. Finally, concluding remarks will be given in Section 4.

2 Background Information

2.1 Decentralized Finance and DeFi Security

Traditional centralized finance is populated by institutions (e.g., banks, insurance companies) that can control access to markets and other financial systems [10]. The controlled access that centralized finance institutions provide is meant to minimize the risk of malfeasance–though a substantial amount of illicit monies flow through these institutions. These risks are minimized by the application of Anti-Money Laundering (AML) and Know Your Customer (KYC) regulations mandated by national or regional governments, as well as internal structures (e.g., fraud detection) meant to prevent or ferret out malignant behavior by people using these institutions and systems.

In contrast, decentralized finance (DeFi) utilizes distributed ledger technology and blockchain networks, along with smart contracts to facilitate transactions [11]. DeFi takes place on decentralized exchanges, often utilizing cryptocurrency wallets as an interface. While certain institutions within the blockchain and cryptocurrency space will incorporate some level of safeguard to prevent illicit activities on their platform, many cryptocurrency wallets that support DeFi transactions are not subject to KYC or AML regulations.

The key element of the DeFi ecosystem is the decentralized exchange (DEX), based on automatic market maker (AMM) mechanisms. The AMM allows for the development of liquidity pools where providers will deposit a pair of tokens consisting of an established blockchain token (e.g., an ERC20 token for DEXs on the Ethereum blockchain) and another token for swapping. The established token is in the pool for the purposes of providing liquidity and building trust in that pool; trust in that pool is essential for engaging traders. Swaps are made via smart contract. However, there are many ways that scammers and other malignant actors can insert malicious code into liquidity pool smart contracts, and those malicious smart contracts are a major pain point in the field of DeFi security.

2.2 Malicious Smart Contracts

Smart contracts are programmed transaction protocols which automatically execute and control events based on predetermined conditions in the agreement or contract [5]. The transactions governed by smart contracts can range from trivially simple to intricate multiparty transactions.

Malicious smart contracts are smart contracts with malicious code; that is, the smart contract was programmed to deceive potential traders and enable the contract’s author(s) to steal more valuable cryptocurrency tokens that traders deposited into the liquidity pool [16]. There are many types of malicious smart contracts, which may impact different blockchain use cases. However, this article is concerned with the DeFi space, and so we will focus on DeFi honeypots [3]. Honeypot traps in DeFi are deceptively designed smart contracts which lure honest traders into the pool, but then making it all but impossible for the trader to withdraw their funds. This may be accomplished by obfuscating a high tax rate or hidden fees on transactions within the pool. Other tactics may include enabling blacklisting or whitelisting of certain traders within the pool; malicious code may be embedded and active from the opening of the pool, or may be triggered at a later time once certain conditions are met.

In order to build trust in a DEX, the exchange will require pools to meet a minimum compliance with blockchain industry standards (e.g., ERC20). This compliance requires a standardized interface; however, malignant programmers can still incorporate malicious code within the smart contract. This malicious code is often not detected until it is too late for the trader because most cryptocurrency traders do not possess the deep computer science and engineering knowledge that would enable them to discern a trap encoded within the smart contract.

2.3 Large Language Models

Large language models (LLMs) are modern deep learning architectures which are responsible for most of the recent and high profile advances in the field of artificial intelligence [14]. Earlier language models were based on sequential learning models such as recurrent neural networks (RNNs), long-short term memory (LSTMs), and gated recurrent units (GRUs). These previous models could have 10s of millions of parameters and achieved impressive results. Transformer-based LLMs with billions of parameters are surpassing the results of the earlier models, excelling at various tasks including text generation, language translation, etc.

These transformer-based LLMs, particularly generative pretrained transformer (GPT) models, are increasingly finding utility in code generation for software engineering. GitHub’s Co-Pilot capability, which is built on OpenAI’s ChatGPT, is probably the most visible example of this. These LLMs interact with a programmer, and are able to generate code snippets–or even entire programs–based upon their stated intent and requirements. Alternatively, these models can be used for completing a programmer’s preliminary code, going so far as ensuring that a finished program adheres to industry standards and best practices. Other uses for LLMs in software engineering include debugging, refactoring, and documentation. (Debugging is a particularly useful capability, as novice programmers routinely struggle to understand and correct error notifications that they may encounter during their work.) This code generation capability can prove to be a cybersecurity threat, and there are LLMs which have been trained by hackers to generate malware or perform other penetration testing operations. Commercially available LLMs, such as OpenAI’s ChatGPT or Google’s Gemini, have ethical guard rails built in to mitigate against misuse; however, many users have demonstrated the ability to “jailbreak” and manipulate the LLM into performing illegal operations [7].

3 The Role of Large Language Models in Smart Contract and Software Security

Static analysis methods such as formal verification, pattern or symbolic analysis, and fuzzing have proven to be the work horses for detecting vulnerabilities in smart contracts. For instance, Tanskov, et al. [13] developed a smart contract security tool which defines sets of both “safe” and “violation” patterns. Nikoli ́c, et al. [8] developed a static analysis capability to detect certain smart contract behaviors that are common in honeypot traps, as well as other malicious smart contracts. The impressive results from these and similar smart contract security verification systems demonstrate that the detection of malicious smart contracts is possible and scalable. However, these methods are limited in that they are reliant on previously known and documented malicious code, and may not detect newly designed attacks. Moreover, these methods are known to produce false positives and false negatives–a well-known shortcoming of static analysis methods.

Deep and machine learning-based methods have made significant progress in malicious smart contract detection, as well as in other areas of security engineering and software verification. Most relevant to the detection of honeypot smart contracts, Chen and Li [1] developed a system that integrates formal verification with a series of hybrid deep learning architectures (combining a convolutional neural network with other sequential architectures). Their system was trained exclusively on discovering ponzi scheme smart contracts, and is therefore not a reliable tool for discovering other types of malicious contracts. Zheng, et al. [15] developed a cascade ensemble model that utilizes XGBoost, a flexible and efficient gradient boosting library, in order to extract and identify features of ponzi scam smart contracts.

LLMs using pre-trained transformers excel at transferring knowledge downstream. Unlike earlier, less sophisticated language models, LLMs have context-dependent word embeddings, which allows the same word to have unique representations as the context changes. This ability for dynamic contextual understanding enables these models to achieve a more complete understanding of word meanings and complex usages, which explains remarkable results in natural language processing (NLP) tasks. Given that these LLMs have been trained on vast codexes–including massive amounts of source code–it seems intuitive that LLMs can be trained not only to detect software vulnerabilities and re-used malicious methodologies, but also to discover zero-day vulnerabilities and novel scams.

Thapa, et al. [12] conducted an exhaustive analysis comparing transformer-based LLMs to older RNN-based language models in a series of vulnerability detection tasks. They examined several transformer-based models including GPT-2 and variants of the Bi-directional Encoder Representations from Transformers (BERT) models, comparing their performance with Bi-directional LSTM (BiLSTM) and GRU (BiGRU) models. Specifically, they tested both binary and multi-class vulnerability detection capabilities on datasets with up to 126 distinct vulnerability types. Experimental results showed that transformer-based LLMs significantly outperformed both BiLSTM and BiGRU models at vulnerability detection, with GPT-2-based models.

A recent example of LLM-based smart contract vulnerability detection comes from He, et al. [4], built a model that leverages the contextual prowess of the BERT architecture and integrates it with a BiLSTM, and additional attention mechanism, and a fully-connected feed forward network. The transformer-based BERT learns semantic context of smart contract source code, while the BiLSTM layer extracts high-level features from the BERT layer output. The additional attention layer assigns weights to the features extracted by the BiLSTM layer to complete a holistic contextual representation of the code under examination, and the feed forward layer determines whether a vulnerability exists in that portion of the smart contract. They compared their system to [15] in order to test their ability to detect scam contracts, and exceeded the state-of-the-art performance in accuracy, recall, and F-1 score.

While the aforementioned results show great promise at software code analysis for vulnerability detection, it is important to remember that there are limitations which must be accounted for. Training dataset quality is a major factor in model performance, and there are arguably significant pain points when it comes to training LLMs for detecting and correcting vulnerabilities. For instance, Ding, et al. [2] created a software vulnerability dataset in order to test both code language models, as well as larger and well-known LLMs (e.g., GPT-3.5/4). Their dataset was developed to correct for numerous shortcomings including noisy or inaccurate labels, and data duplication. Their experimental results suggest that vulnerability detection LLMs are not yet ready for real world deployment. Specifically, one state-of-the-art code language model achieved an F-1 score of 68.26% against a widely used vulnerability dataset, but an F-1 score of 3.09% against their newly developed dataset.

Furthermore, there is a question of how well LLMs can remediate the vulnerabilities that they may detect. Pierce, et al. [9] examined LLM performance at repairing insecure code with mixed results. While the LLMs that they tested were able to generate patches to vulnerabilities. However, these patches were limited to a single function within individual files, with no consideration given to how those files interact with others within larger and more intricate software products. Another concern which must be foremost on the security engineer’s mind when using LLMs for vulnerability detection and repair are hallucinations [6], where the model essentially creates fraudulent information. In such a case, the LLM might erroneously determine that a secure segment of source code presents a vulnerability, or it might remedy a vulnerability by calling a library or function that doesn’t actually exist. Thus, while LLMs show great potential in the area of software security, human expertise is still necessary for oversight.

4 Conclusion

There has been an explosion of interest in the financial opportunities in the cryptocurrency sector and the DeFi ecosystem, where software-based smart contracts are a fundamental tool for transactions. Smart contracts are self-enforcing agreements where contract terms are written into the contract source code. While smart contracts in the DeFi ecosystem must conform to certain engineering standards, there is still ample room for vulnerabilities–both inadvertent and intentionally designed–to persist.

This blog post introduced the topic of using LLMs for software vulnerability detection, with a focus on smart contracts in DeFi. LLMs are becoming popular in society at large, but are showing promise in the software engineering realm as they are a tremendous productivity booster. It is only natural that researchers are exploring LLM capabilities for vulnerability detection and repair. We gave a brief survey to acquaint the reader with the state-of-the-art work, as well as present a few of the current limitations and pain points with this approach. LLM-based vulnerability detection for the identification of malicious smart contracts is a goal which is within reach, and which will undoubtedly be leveraged by professional DeFi traders in the future.

References

[1] Chen, S., and Li, F. Ponzi scheme detection in smart contracts using the integration of deep learning and formal verification. IET Blockchain (2023).

[2] Ding, Y., Fu, Y., Ibrahim, O., Sitawarin, C., Chen, X., Alomair, B., Wagner, D., Ray, B., and Chen, Y. Vulnerability detection with code language models: How far are we? arXiv preprint arXiv:2403.18624 (2024).

[3] Gan, R., Wang, L., and Lin, X. Why trick me: The honeypot traps on decentralized exchanges. In Proceedings of the 2023 Workshop on Decentralized Finance and Security (2023), pp. 17–23.

[4] He, F., Li, F., and Liang, P. Enhancing smart contract security: Leveraging pre-trained language models for advanced vulnerability detection. IET Blockchain (2024).

[5] Hu, B., Zhang, Z., Liu, J., Liu, Y., Yin, J., Lu, R., and Lin, X. A comprehensive survey on smart contract construction and execution: paradigms, tools, and systems. Patterns 2, 2 (2021).

[6] Liu, F., Liu, Y., Shi, L., Huang, H., Wang, R., Yang, Z., and Zhang, L. Exploring and evaluating hallucinations in llm-powered code generation. arXiv preprint arXiv:2404.00971 (2024).

[7] Monje, A., Monje, A., Hallman, R. A., and Cybenko, G. Being a bad influence on the kids: Malware generation in less than five minutes using chatgpt.

[8] Nikoli ́c, I., Kolluri, A., Sergey, I., Saxena, P., and Hobor, A. Finding the greedy, prodigal, and suicidal contracts at scale. In Proceedings of the 34th annual computer security applications conference (2018), pp. 653–663.

[9] Pearce, H., Tan, B., Ahmad, B., Karri, R., and Dolan-Gavitt, B. Examining zero-shot vulnerability repair with large language models. In 2023 IEEE Symposium on Security and Privacy (SP) (2023), IEEE, pp. 2339–2356.

[10] Qin, K., Zhou, L., Afonin, Y., Lazzaretti, L., and Gervais, A. Cefi vs. defi–comparing centralized to decentralized finance. arXiv preprint arXiv:2106.08157 (2021).

[11] Scharfman, J., and Scharfman, J. Decentralized finance (defi) compliance and operations. Cryptocurrency Compliance and Operations: Digital Assets, Blockchain and DeFi (2022), 171–186.

[12] Thapa, C., Jang, S. I., Ahmed, M. E., Camtepe, S., Pieprzyk, J., and Nepal, S. Transformer-based language models for software vulnerability detection. In Proceedings of the 38th Annual Computer Security Applications Conference (2022), pp. 481–496.

[13] Tsankov, P., Dan, A., Drachsler-Cohen, D., Gervais, A., Buenzli, F., and Vechev, M. Securify: Practical security analysis of smart contracts. In Proceedings of the 2018 ACM SIGSAC conference on computer and communications security (2018), pp. 67–82.

[14] Yao, Y., Duan, J., Xu, K., Cai, Y., Sun, Z., and Zhang, Y. A survey on large language model (llm) security and privacy: The good, the bad, and the ugly. High-Confidence Computing (2024), 100211.

[15] Zheng, Z., Chen, W., Zhong, Z., Chen, Z., and Lu, Y. Securing the ethereum from smart ponzi schemes: Identification using static features. ACM Transactions on Software Engineering and Methodology 32, 5 (2023), 1–28.

[16] Zhou, L., Xiong, X., Ernstberger, J., Chaliasos, S., Wang, Z., Wang, Y., Qin, K., Wattenhofer, R., Song, D., and Gervais, A. Sok: Decentralized finance (defi) attacks. In 2023 IEEE Symposium on Security and Privacy (SP) (2023), IEEE, pp. 2444–2461.