Marina Danilevsky is a Research Staff Member in the Scalable NLP group at IBM Almaden Research Center in San Jose, California. She received her Ph.D. and M.S. in Computer Science from the University of Illinois at Urbana-Champaign (UIUC) in 2014 and a B.S. in Mathematics from the University of Chicago in 2007. Her interests lie in the areas of data mining, scalable text mining, natural language processing, information networks, and other related fields.
A growing area of interest in FinTech is in answering complex semantic queries about financial entities (e.g., which companies have outperformed their competitors by more than 10% in the last year), which has applications for financial advising, company analysis, and investment strategy. In today’s AI-driven world, such queries ideally ought to be addressed by automatically analyzing relevant information and returning a comprehensive answer. However, relevant data sources, which can include market data, financial statements, news articles, analyst reports, and customer information, are highly heterogeneous in content, format, and latency. We examine a subset of data sources – publically available financial documents – to illustrate some of the challenging aspects of interpreting, integrating, and transforming raw data into a knowledge base that can support complex semantic queries in the FinTech space.