Difference between revisions of "SNF Blockchain"
Line 37: | Line 37: | ||
An existing method such as k-means clustering will be used to search for clusters in the networks graphs to split the data into groups based on their similarity, in order to determine which data points may be anomalies. In the case of k-means clustering, the goal is to partition the sample data corresponding to a particular network graph into k distinct and non-overlapping clusters. In the simplest case, the number of clusters k can be set manually to be equal to the number of types of anomalies we expect to see, or set to a value of two based on the premise of data being normal or anomalous. More formally, let 𝐶𝐶1, 𝐶𝐶2, ⋯ , 𝐶𝐶𝑘𝑘 denote sets containing the data points in each cluster, we aim to minimize the within cluster variation for each of the 𝑘𝑘 clusters as follows: | An existing method such as k-means clustering will be used to search for clusters in the networks graphs to split the data into groups based on their similarity, in order to determine which data points may be anomalies. In the case of k-means clustering, the goal is to partition the sample data corresponding to a particular network graph into k distinct and non-overlapping clusters. In the simplest case, the number of clusters k can be set manually to be equal to the number of types of anomalies we expect to see, or set to a value of two based on the premise of data being normal or anomalous. More formally, let 𝐶𝐶1, 𝐶𝐶2, ⋯ , 𝐶𝐶𝑘𝑘 denote sets containing the data points in each cluster, we aim to minimize the within cluster variation for each of the 𝑘𝑘 clusters as follows: | ||
+ | [[Image: BlockChain formula.jpg]] |
Revision as of 14:36, 30 October 2023
Blockchain networks are increasingly being implemented into healthcare, supply chain, and retail systems, through smart contracts, smart devices, smart identity management. Although the use of this technology brings with it benefits, it can also still cause problems. A particular problem is derived from the immutability property, which means that fraudulent transactions or transfers of information cannot be reversed. Rationale: Blockchains can be attacked via a deluge of requests or transactions within a short time span, resulting in the loss of connectivity to the blockchain for users and businesses, or even financial institutions. Therefore, the rapid detection of anomalies from such activities is critical in order to prevent damage from occurring, or correct any damage as soon as possible to reduce the severity of its impact.Overall objectives: This project will study the problem of anomaly and fraud detection from the perspective of blockchain-based networks. Anomaly and fraud detection in blockchain-based networks is more complex due to their unique properties such as decentralisation, global reach, anonymity, etc., which make them different from traditional networks.Specific aims: To further the understanding of the sources and behaviours of anomalies and fraud in blockchain-based networks, and develop new improved methods for both static and dynamic anomaly detection that can be used alongside blockchain-based systems for real-time fraud detection.Methods: Developing and implementing static anomaly detection methods via a hybrid approach and developing dynamic anomaly detection methods using extreme value theory.Expected results: This research work will be able to contribute to improving the security relating to blockchain-based networks by providing more accurate and efficient methods for detecting anomalies and fraud and reducing the impact of losses resulting from these anomalies.Impact for the field: The project will be particularly beneficial alongside real world blockchain-based networks to allow for the fast detection of anomalous or fraudulent data, preventing damage or allowing for damage to be corrected as soon as possible. For cryptocurrency networks, this will reduce the impact of market manipulation, fraud, and more widely on global financial markets, currencies, and trade. In addition, the project will be of interest to a broad range of cryptocurrency and blockchain stakeholders including (but not limited to) academics, financial institutions, policymakers, regulators, and cybercrime agencies.
Aims and Relevance
This project aims to study the problem of anomaly and fraud detection from the perspective of blockchain-based networks. The major developments of blockchain technology and cryptocurrencies have brought benefits such as increased efficiency and transparency to all, but the immutability property means that fraudulent transactions or transfers of information cannot be reversed. Rapid detection of anomalies from such activities is critical in order to prevent damage from occurring, or correct any damage as soon as possible to reduce the severity of its impact. Anomaly and fraud detection in blockchain-based networks are more complex due to their unique properties such as decentralization, global reach, anonymity, etc., which make them different from traditional networks.
The proposed research work comprises three main parts:
- Studying the evolution of blockchain-based networks over time.
- Investigating static anomaly detection methods for blockchain-based networks.
- Developing dynamic anomaly detection methods for blockchain-based networks.
This research aims to contribute to a better understanding of the sources and behaviors of anomalies and fraud in blockchain-based networks, as well as the development of new improved methods for anomaly detection, especially in reducing the false positive rate. Additionally, it will help to develop new methods that can be used alongside blockchain-based systems to detect anomalies and fraud in real time as new data is generated.
Methods
The proposed research work focuses on the problem of anomaly and fraud detection in blockchain-based and cryptocurrency networks. Due to the rising popularity of these systems in the financial sector and the potential benefits, it has become increasingly important to detect anomalies and outliers, which may be derived from true errors or more likely monetary or information fraud. Therefore, our goal is to extend and improve upon the accuracy of existing methods of static anomaly detection in the literature relating to blockchain-based network graphs through combining methods from statistics and data mining. Furthermore, our goal is also to develop a new method for dynamic anomaly detection based on data streams and statistical extreme value theory. This methodology will be particularly beneficial alongside real-world blockchain-based networks to allow for the fast detection of anomalous or fraudulent data, preventing damage or allowing for damage to be corrected as soon as possible. For cryptocurrency networks, this will reduce the impact of market manipulation, fraud, and more widely on global financial markets, currencies, and trade. For blockchain-based networks in general, this will assist in reducing the impact of information loss. The proposed research design can be split into three main targets as outlined below and illustrated in Figure 1.
Analysis of the Evolution of Blockchain-Based Network Graphs and Their Properties
The initial goal involves studying and analyzing the key properties of blockchain-based network graphs and how they have evolved over time. The key difference between blockchain-based networks and other systems that can be represented in terms of a network graph is that blockchain technology is relatively young, existing for just over 10 years, and still developing. Therefore, it is likely that the structures of blockchain-based networks have changed since they were first implemented and have continued to evolve. This is a key part of our analysis which needs to be completed before we start our investigation into extending existing and developing new methods for anomaly detection. The main reason is that many assumptions regarding anomalies in other types of networks may not be directly applicable. For example, in credit card transaction networks, anomalies may be classed as transactions where the value of the transaction is significantly higher, the number of transactions is significantly higher, or transactions occur in locations that are far away from the majority. However, in blockchain-based networks, the concepts of normal and anomalous data are not so clear-cut and known.
To address this problem, we propose to perform a comprehensive analysis of the network graphs of large blockchain-based networks. These will include the network graphs of large cryptocurrencies such as Bitcoin and Ethereum, in addition to other blockchains for which network data can be obtained. A starting point is to investigate the fundamental result that the network graphs of many real-world systems follow the power-law model. This states that in a network graph, the probability that a node has a degree (number of edges) of k is given by the relationship 𝑷𝑷(𝒌𝒌) ∝ 𝒌𝒌−𝜶𝜶 or equivalently 𝐥𝐥𝐥𝐥𝐥𝐥 𝑷𝑷 ∝ −𝜸𝜸 𝐥𝐥𝐥𝐥𝐥𝐥 𝜶𝜶, which forms a straight line on a logarithmic scale (Boginski et al., 2005). This indicates that a large number of nodes have a very small degree, while a small number of nodes have a very large degree. For example, in a blockchain transaction graph, this would suggest that a large number of accounts make very few transactions, while a small number of accounts make a large number of transactions. This would provide a general idea of whether the structure and behavior of the blockchain show any similarities to traditional networks. In addition, other common network graph statistics such as the clustering coefficient, cliques, and independent sets will also be computed. Due to the lack of labeled data relating to anomalies and fraud, there do not appear to be any benchmarks for distinguishing between normal and anomalous data in blockchain-based networks. Therefore, we propose to split our network data into subsamples of months and years and construct a large number of different network graphs from our datasets covering transaction graphs, user graphs, and graphs based on other network variables. By analyzing the distribution of these network graph statistics for the graphs, we will be able to see how the distributions and their parameters have changed over time. This can then provide a benchmark time series for parameters and statistics that can be used as possible baselines and inputs in the anomaly detection methods.
Analysis of static anomaly detection methods
After obtaining a comprehensive overview of the structures of blockchain-based networks and cryptocurrency networks and how they have evolved over time, the second phase will focus on trying to improve existing methods for anomaly detection in blockchain-based networks. We classify anomalies and outliers, into three different groups as follows (Chandola et al., 2009): a) point anomalies – these are the simplest types of anomalies. Single data points are classed as anomalies if they are located far enough away from the centroid of the data set; b) collective anomalies – these are sets of point anomalies that are linked to each other; c) contextual anomalies – these anomalies are conditional and usually occur in time series data.
Point anomalies and collective anomalies can be considered as part of the static anomaly and fraud detection problem. In theory, these types of anomalies will generally be more pronounced and easier to detect. Existing data on anomalies that have previously occurred in blockchain-based networks is limited. We propose to build our real data sample from a combination of publicly available data from two main sources: a) cryptocurrency networks – using data from previously reported anomalous events such as hacks, including date and time, type of anomaly, total loss, etc; b) other blockchain-based networks – using data from previously reported anomalous events such as attacks on user wallets, smart contracts, double spending, distributed denial of service (DDOS) attacks, etc. Although this data will likely be very general, it can still provide an indication of approximate time periods that can be focused on for detecting anomalies.
The main part of our method will be based on a hybrid approach, where individual anomaly detection methods are combined and used in parallel, or consecutively. The motivation for this approach is provided in the current literature, which has found that anomalies detected in blockchain-based networks using existing methods do not show a significant overlap (Mansourifar et al., 2020). To overcome this, network graphs of the cryptocurrency and blockchain-based networks will be constructed from our real data using the undirected graph model defined in Section 1. These will correspond to the network graphs of the time periods when anomalous events actually occurred.
An existing method such as k-means clustering will be used to search for clusters in the networks graphs to split the data into groups based on their similarity, in order to determine which data points may be anomalies. In the case of k-means clustering, the goal is to partition the sample data corresponding to a particular network graph into k distinct and non-overlapping clusters. In the simplest case, the number of clusters k can be set manually to be equal to the number of types of anomalies we expect to see, or set to a value of two based on the premise of data being normal or anomalous. More formally, let 𝐶𝐶1, 𝐶𝐶2, ⋯ , 𝐶𝐶𝑘𝑘 denote sets containing the data points in each cluster, we aim to minimize the within cluster variation for each of the 𝑘𝑘 clusters as follows: File:BlockChain formula.jpg