SNSF Narrative Digital Finance

From EU COST Fin-AI
Jump to navigation Jump to search

Abstract

Large fluctuations, instabilities, trends and uncertainty of financial markets constitute a substantial challenge for asset management companies, pension funds and regulators. Nowadays, most asset management companies and financial institutions follow a so-called systematic trading approach in their investment decisions. Systematic trading refers to applying predefined, rule-based trading strategies for buy- and sell orders. However, automated or rules-based trading activities bring certain risks for market participants and the whole financial market. In times of increased market volatility, market turmoil or so-called market sell-offs, investors applying similar trading rules might undertake the same actions, escalating and increasing systemic market risk through such behavior. Such situations have been frequently observed on financial markets for instance, in March 2020 (sell-off related to the Covid pandemic), during the European Sovereign Debt crisis and the global financial crisis 2007-08. Research in economics and management has begun to embrace the role that narratives play in guiding individual and collective decision-making. McCloskey (2011) describes unforeseen growth in economic development yet goes on to explain that no economic theory is able to capture this extent. She argues that a change in rhetoric had basically freed a social class (the bourgeoisie) and given it a sense of dignity and liberty. As such, economic change, she argues, depends to a great extent on social narratives that shape ideas and the beliefs of people. Yet, despite the notion that narratives, individual and collective actions, and market outcomes are inextricably linked, our knowledge about the mechanisms or processes through which they interact and how narratives can inform opinions or sway current thinking is still evolving. Entrepreneurs, for example, may use verbal communication to achieve plausibility (i.e., generate the sense that a given interpretation of events appears acceptable) or resonance (i.e., obtain alignment with the beliefs of the target audience; see van Werven et al., 2019). They may do so through rhetoric such as storytelling (Navis & Glynn, 2011) or crafting compelling arguments (van Werven et al., 2015) as well as employing combinations of figurative language and gesturing (Clarke et al., 2021) as they manage and conform with the expectation of their audience.

Outcomes of invoking narratives are consequential. The literature has indeed documented various forms of verbal communication–including written texts such as social media posts and blogs, or business plans or spoken text (Garud et al., 2014; Clarke et al., 2019, Clarke et al., 2021) – as a crucial means to secure support and investment. The narratives or rhetoric employed in these stories are used as vehicles for assembling and communicating details about ideas and future possibilities (Garud et al., 2014). In summary, narratives help audiences make sense of situations and situate the description into the audience's social and cultural framework (Lounsbury and Glynn, 2001).

In the following, we, therefore, explore computational techniques to predict financial market outcomes using text, speech, and video/picture data. Advances in data processing and machine learning allow new ways of analysing data and may have profound implications for empirical testing of lightly studied, yet complex, empirical financial relationships. This project therefore integrates various forms of narratives into the context of financial market analysis, leverages machine learning techniques, and aims to show how narratives are inextricably interwoven in the continuously unfolding financial market evolutions. We will extend quantitative research through novel measurement techniques, the creation of new data sets, offering new solutions towards prediction problems, and the induction of new theories (Obschonka & Audretsch, 2020). We will also contribute to recent works that demonstrated the potential of theoretical and methodological advancements through the application of machine learning in the research practice (Mullainathan & Spiess, 2017; von Krogh, 2018). In pursuit of both practical 'relevance' of our research (Wiklund et al., 2019) and the contribution of "AI-integrated" research (Levesque et al. 2020), our approach will provide actionable insights.

Grant Link


Approach

In a first step, we will design a tool allowing us to collect all relevant data from various data sources. Indeed, collecting purely financial data, such as stock prices or macroeconomic indicators, can be easily performed using subscription-based platforms such as Bloomberg, Reuters or Investing.com. However, textual data will constitute a substantial challenge in terms of (i) collecting from the web, (ii) formatting, and (iii) pre-processing, including dating and categorising. For this purpose, we will develop an automated tool which will collect textual data, categorise them, date and store them in an easy to analyse format. We will manage our database with SQL solutions. The second step will focus on our research questions and the four building blocks listed below.

We will formulate numerous data-driven general/main and block-specific research questions within our hypothesis-driven project. The main research questions will be:

  1. In what sense are financial markets (ex-ante) predictable?
  2. Is the ex-ante forecastability persistent, can it be applied for real use cases and to which extent?
  3. How can structural break detection and changes in financial time series improve and complement modern portfolio theory?

Block 1: Text data & text analytics

--- Text mining techniques are frequently used in scientific research for forecasting developments of various financial assets such as FX, equities, bonds, commodities see, for instance, Fung et al. 2003, Hajizadeh et al. 2010, Nassirtoussi et al. 2014, Kumar and Ravi 2016, Loughran and McDonald 2016 (Chan and Franklin 2011, Cambria and White 2014, Xing et al. 2017, Chen et al. 2020). Our solution uses NLP and text mining techniques for asset allocation and prediction to apply for structural breaks and change point detection combined with asset allocation methodology. For instance, those techniques are used for predicting cryptocurrency price bubbles using social media data (Biessey 2021). However, the field of classic financial assets tends to be under-researched. Therefore, we will collect relevant literature, review solution and answer the following research questions:

  1. How can textual analysis and the application of natural language processing techniques be efficiently used for portfolio management, including risk management and asset allocation?
  2. What are the most promising NLP / text analysis techniques?

Block 2: Structural breaks detection & asset price bubbles

The survey of econometric tests of asset price bubbles shows that, despite recent advances, econometric detection of asset price bubbles cannot be achieved with a satisfactory degree of certainty (Gürkaynak 2008). Furthermore, currently, there exist a relatively low number of scientific papers about the live detection of structural breaks in a systematic way. Most of the existing solutions have not been validated on real-world data. An obvious downside of such experiments is that the dynamics of the simulated data are often particular to the paper, and any model that corresponds to these dynamics has an unfair advantage (van den Burg and Williams 2020). Hence, we will tackle the issue of structural breaks and asset price bubbles in the three steps. In the first step, we will focus on post- ante structural detection methods for asset price bubbles to identify past breaks in real macroeconomic and financial time series. The breaks will be compared and based on consensus simplified if needed. In the second step, we will reapply well-established, known methods for live detection of breaks and check their ex-ante performance. Based on the current state of literature, we expect a relatively poor forecastability of breakpoints. Therefore, in the third step, we will involve NLP and text analysis techniques as a supporting or main method for detecting breakpoints. Within our research in this block, we will answer the following research questions:

  1. How to detect, identify and date structural breaks in online and offline matters?
  2. Detection of structural breaks / change points / asset prices bubbles in a live-matter using most recent (alternative) data (Twitter, News etc.)

Block 3: Narratives for structural breaks

The so-called "narratives block" will be highly dependent on results from block 1 and 2. Using newly acquired knowledge and experience with text analysis and NLP and insights into detecting structural changes, we will develop a framework with market narratives for detecting asset price bubbles. Narratives "go viral" and spread worldwide with economic impact (Shiller 2017). There is considerable evidence in the scientific literature showing that people respond strongly to narratives in the fields of marketing (Escalas 2007); journalism (Machill et al. 2007 ); education (McQuiggan et al. 2008); health interventions (Slater et al 2003); and philanthropy (Weber et al. 2006). We will answer the following research questions:

  1. Can market narratives help predict financial market bubbles and their bust?
  2. Can market narratives help detect financial market bubbles?
  3. Can narratives sway investment opinions?

In order to achieve this, one needs to devise experiments for measuring narratives and then incorporate those measurements into predictive models aimed at explaining different aspects of financial market behaviour. Looking at the 3rd research question within this block, we can investigate the effectiveness of specific components of narrative strategies by carrying out an experiment with potential investors who will be given different investment options that: a) incorporate a narrative structure and b) emphasise different positive or negative emotions in the text. Put differently, the participants will have text and information presented to them, where the level of the manipulated component (e.g. emotional content) is low/high and a narrative appears/does not appear in the description of the investment. Once the data is collected, we can employ pre-trained Transformer models (GloVe Embeddings with Long Short-Term Memory Network (GVEL) or Bidirectional Encoder Representations from Transformers (BERT)) and finetune them to model the presence of narratives. This will also enable us to test how text impacts investment option perceptions by using text matching approaches that employ lower dimension summaries of texts (Roberts, Stewart, and Nielsen, 2020) or use low-dimensional representations as causally sufficient embeddings (Veitch, Sridhar, and Blei, 2020). Such experiments will provide evidence of the causal influence that different components of narratives and emotions exert on the appeal of these investments by influencing the judgements of potential investors. Furthermore, for narratives to be effective they need to resonate with potential investors (van Werven et al., 2019). We thus use a study to test the effectiveness of specific components of narrative strategies in an experimental design. We will introduce participants to different investment options that a) incorporate a narrative structure b) emphasise different positive or negative emotions in the text. Due to practical considerations (sample size), we will employ a 2x2 experimental design for each of the components tested (e.g. narrative element present vs. purely informational text; emotionality of communication low vs. high). The experiments will provide evidence of the causal influence that different components of narratives and emotions exert on the appeal of these investments by influencing the judgements of potential investors. This experiment might also inform us about narratives and biases, where for example, investors forego certain investment options (overlook correlations that may improve their risk-return profile in the portfolio) because of the narratives presented to them.


Block 4: Multidimensional AI and ML solutions in a fully integrated framework

As already mentioned in the literature review, AI and ML techniques possess a substantial potential to revolutionise financial markets (Milana and Ashta 2021). New technologies transform business models and markets for trading, credit and blockchain-based Finance, generate efficiencies, reduce friction, enhance product offerings, and refine the existing financial services industry (Buchanan 2019, Hilpisch 2020, Moloi and Marwala 2020). Since, in previous blocks, we look at detecting structural breaks and asset price bubbles from various perspectives and apply different techniques, it seems to be self-explanatory and expected to check if those methods can be combined into a fully integrated framework. Research questions:

  1. Can a combined ML approach outperform each single method?
  2. Do complex AI and ML approaches outperform simple forecast combinations?

References

  1. Acemoglu, D., García-Jimeno, C., & Robinson, J. A. (2015). State capacity and economic development: A network approach. American Economic Review, 105(8), 2364-2409.
  2. Ahelegbey, D. F., Giudici, P., & Hadji-Misheva, B. (2019a). Latent factor models for credit scoring in P2P systems. Physica A: Statistical Mechanics and its Applications, 522, 112-121.
  3. Ahelegbey, D. F., Giudici, P., & Hadji-Misheva, B. (2019b). Factorial Network Models To Improve P2P Credit Risk Management. Frontiers in Artificial Intelligence, 2, 8.
  4. Ahmad, A., & Dey, L. (2007). A k-mean clustering algorithm for mixed numeric and categorical data. Data & Knowledge Engineering, 63(2), 503-527.
  5. Ariza-Garzon, M.J., Arroyo, J., Caparrini, A. and Segovia-Vargas, M. (2020). Explainability of a Machine Learning Granting Scoring Model in Peer-to-Peer Lending. IEEE.
  6. Arrieta, A. Dıaz-Rodrıguez, D., Del Sera, J., Bennetot, A. Tabikg, S., Barbadoh, A., Garciag. S., Gil- Lopeza, S., Molinag, D., Benjaminsh, R., Chatilaf, R. and Herrerag, F. (2019). Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI. arXiv:1910.10045v2.
  7. Arya, V, Bellamy K. E., Chen, P. Dhurandhar, A. Hind Samuel M., C. Hoffman, Houde, Q. Vera Liao, y Luss, Mojsilović, A. Mourad, S., Pedemonte, P., Ramya Raghavendra, John Richards, Prasanna Sattigeri Karthikeyan Shanmugam, et al. (2019). One Explanation Does Not Fit All: A Toolkit and Taxonomy of AI Explainability Techniques. arXiv:1909.03012v2.
  8. Babaei, G., & Bamdad, S. (2020). A multi-objective instance-based decision support system for investment recommendation in peer-to-peer lending. Expert Systems with Applications, 150, 113278.
  9. Bastani, K., Asgari, E., & Namavari, H. (2019). Wide and deep learning for peer-to-peer lending. Expert Systems with Applications, 134, 209-224.
  10. Berg, M. and Kuiper, O. (2020). XAI in the Financial Sector. A Conceptual Framework for Explainable AI.
  11. Bhatt, U., Xiang, A., Sharma, S., Weller, A., Taly, A., Jia, Y., Ghosh, J., Puri, R., Moura, J. and Eckersley, P. (2020). Explainable Machine Learning in Deployment. IBM Research.
  12. Billio, M., Getmansky, M., Lo, A. W., & Pelizzon, L. (2012). Econometric measures of connectedness and systemic risk in the finance and insurance sectors. Journal of financial economics, 104(3), 535-559.
  13. Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment, 2008(10), P10008.
  14. Bussmann, N., Giudici, P., Marinelli, D. and Papenbrock, J. (2020). Explainable Machine Learning in Credit Risk Management. Computational Economics, 57, 203-216.
  15. Byanjankar, A., Heikkilä, M., & Mezei, J. (2015, December). Predicting credit risk in peer-to-peer lending: A neural network approach. In 2015 IEEE Symposium Series on Computational Intelligence (pp. 719-725). IEEE.
  16. Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority oversampling technique. Journal of artificial intelligence research, 16, 321-357.
  17. Cui, L., et al. (2016). P2P lending analysis using the most relevant graph-based features. In Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR) (pp. 3-14). Springer, Cham.
  18. Deev, O., & Lyócsa, Š. (2020). Connectedness of financial institutions in Europe: A network approach across quantiles. Physica A: Statistical Mechanics and its Applications, 550, 124035.
  19. Diebold, F. X., & Yılmaz, K. (2014). On the network topology of variance decompositions: Measuring the connectedness of financial firms. Journal of Econometrics, 182(1), 119-134.
  20. Duarte, J., Siegel, S., & Young, L. (2012). Trust and credit: The role of appearance in peer-to-peer lending. The Review of Financial Studies, 25(8), 2455-2484.
  21. Emekter, R., Tu, Y., Jirasakuldech, B., & Lu, M. (2015). Evaluating credit risk and loan performance in online Peer-to-Peer (P2P) lending. Applied Economics, 47(1), 54-70.
  22. Florez-Lopez R., and J. M. Ramon-Jeronimo (2015). Enhancing accuracy and interpretability of ensemble strategies in credit risk assessment. A correlated-adjusted decision forest proposal. Expert Syst. Appl., 42, 13, 5737–5753.
  23. Gao, M., Yen, J., & Liu, M. (2021). Determinants of defaults on P2P lending platforms in China. International Review of Economics & Finance, 72, 334-348.
  24. Giudici, P., Hadji-Misheva, B., & Spelta, A. (2019). Network based scoring models to improve credit risk management in peer to peer lending platforms. Frontiers in Artificial Intelligence, 2, 3.
  25. Giudici, P., Hadji-Misheva, B., & Spelta, A. (2020). Network based credit risk models. Quality Engineering, 32(2), 199-211.
  26. Guo, Y., Zhou, W., Luo, C., Liu, C., & Xiong, H. (2016). Instance-based credit risk assessment for investment decisions in P2P lending. European Journal of Operational Research, 249(2), 417-426.
  27. Ha, V. S., Lu, D. N., Choi, G. S., Nguyen, H. N., & Yoon, B. (2019, February). Improving credit risk prediction in online peer-to-peer (p2p) lending using feature selection with deep learning. In 2019 21st International Conference on Advanced Communication Technology (ICACT) (pp. 511-515). IEEE.
  28. Hadji Misheva, B., Osterrieder, J., Hirsa, A., Kulkarni, O. and Fung Lin, S. (2021). Explainable AI in Credit Risk Management, arXiv:2103.00949.
  29. Hansen, P. R., Lunde, A., & Nason, J. M. (2011). The model confidence set. Econometrica, 79(2), 453-497.
  30. Harikumar, S., & Surya, P. V. (2015). K-medoid clustering for heterogeneous datasets. Procedia Computer Science, 70, 226-237.
  31. He, Q., & Li, X. (2021). The failure of Chinese peer-to-peer lending platforms: Finance and politics. Journal of Corporate Finance, 66, 101852.
  32. Jin, Y., & Zhu, Y. (2015, April). A data-driven approach to predict default risk of loan for online peer-to-peer (P2P) lending. In 2015 Fifth International Conference on Communication Systems and Network Technologies (pp. 609-613). IEEE.
  33. Kozodoi, N., Lessmann, S., Papakonstantinou, K., Gatsoulis, Y., & Baesens, B. (2019). A multi-objective approach for profit-driven feature selection in credit scoring. Decision support systems, 120, 106-117.
  34. Liang, K., & He, J. (2020). Analyzing credit risk among Chinese P2P-lending businesses by integrating text-related soft information. Electronic Commerce Research and Applications, 40, 100947.
  35. Lundberg, S.M. and Erion, G.G. and Lee, Su-In. (2018). Consistent Individualized Feature Attribution for Tree Ensembles. arXiv:1802.03888v3.
  36. Lundberg, S.M. Erion, G., Chen, H., DeGrave, A., Prutkin, J.M., Nair, B., Katz. R., Himmelfarb, J. Bansal, N. Lee, S. Explainable AI for Trees: From Local Explanations to Global Understanding. arXiv:1905.04610v1.
  37. Lundberg, S.M., Lee, Su-In. (2017). A Unified Approach to Interpreting Model Predictions. arXiv:1705.07874v2.
  38. Lyócsa, S., & Vašaničová, P. (2020). Default or Profit Scoring Credit Systems? Evidence from an Emerging High-Risk P2P Loan Market. Evidence from an Emerging High-Risk P2P Loan Market., (July 31, 2020).
  39. Malekipirbazari, M., & Aksakalli, V. (2015). Risk assessment in social lending via random forests. Expert Systems with Applications, 42(10), 4621-4631.
  40. Moscato, V., Picariello, A., & Sperlí, G. (2021). A benchmark of machine learning approaches for credit score prediction. Expert Systems with Applications, 165, 113986.
  41. Niu, K., Zhang, Z., Liu, Y., & Li, R. (2020). Resampling ensemble model based on data distribution for imbalanced credit risk evaluation in P2P lending. Information Sciences, 536, 120-134.
  42. Onnela, J. P., Kaski, K., & Kertész, J. (2004). Clustering and information in correlation based financial networks. The European Physical Journal B, 38(2), 353-362.
  43. Plawiak, P., et al. (2019). DGHNL: A new deep genetic hierarchical network of learners for prediction of credit scoring. Information Sciences, 516, 401-418.
  44. Provenzano, et al. (2020). Machine Learning approach for Credit Scoring, arXiv:2008.01687.
  45. Ribeiro, M.T., Singh, S. and Guestrin, C. (2017). “Why Should I Trust You?” Explaining the Predictions of Any Classifier. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations. arXiv:2008.01687.
  46. Robins, G., Pattison, P., Kalish, Y., & Lusher, D. (2007). An introduction to exponential random graph (p*) models for social networks. Social networks, 29(2), 173-191.
  47. Serrano-Cinca, C., & Gutiérrez-Nieto, B. (2016). The use of profit scoring as an alternative to credit scoring systems in peer-to-peer (P2P) lending. Decision Support Systems, 89, 113-122.
  48. Serrano-Cinca, C., Gutiérrez-Nieto, B., & López-Palacios, L. (2015). Determinants of default in P2P lending. PloS one, 10(10), e0139427.
  49. Sokol, K. and Flach, P. (2020). Explainability Fact Sheets: A Framework for Systematic Assessment of Explainable Approaches.
  50. Srinivasan, R., Chander, A. and Pezeshkpour, P. (2019). Generating User-friendly Explanations for Loan Denials using GANs. arXiv:1906.10244.
  51. Storcheus, D., Rostamizadeh, A., & Kumar, S. (2015, December). A survey of modern questions and challenges in feature extraction. In Feature Extraction: Modern Questions and Challenges (pp. 1-18). PMLR.
  52. Sun, J., H. Li, Q.-H. Huang, and K.Y. He, (2014). Predicting financial distress and corporate failure: A review from the state-of-the-art definitions, modeling, sampling, and featuring approaches. Knowl.-Based Syst., 57, 41–56.