Unleashing the Power of Public Data for Financial Risk Measurement, Regulation, and Governance

We present Midas, a system that uses complex data processing to extract and aggregate facts from a large collection of structured and unstructured documents into a set of unified, clean entities and relationships. Midas focuses on data for financial companies and is based on periodic filings with the U.S. Securities and Exchange Commission (SEC) and Federal Deposit Insurance Corporation (FDIC). We show that, by using data aggregated by Midas, we can provide valuable insights about financial institutions either at the whole system level or at the individual company level. To illustrate, we show how co-lending relationships that are extracted and aggregated from SEC text filings can be used to construct a network of the major financial institutions. Centrality computations on this network enable us to identify critical hub banks for monitoring systemic risk. Financial analysts or regulators can further drill down into individual companies and visualize aggregated financial data as well as relationships with other companies or people (e.g., officers or directors). The key technology components that we implemented in Midas and that enable the above applications are: information extraction, entity resolution, mapping and fusion, all on top of a scalable infrastructure based on Hadoop.

By: Mauricio A. Hernández; Howard Ho; Georgia Koutrika; Rajasekar Krishnamurthy; Lucian Popa; Ioana R. Stanoi; Shivakumar Vaithyanathan; Sanjiv Das

Published in: RJ10475 in 2010

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rj10475.pdf

Questions about this service can be mailed to reports@us.ibm.com .