About Us Icon About Us Icon Business Analyst Icon Business Analyst Icon CEO Icon CEO Icon Datameer Icon Datameer Icon Envelope Icon Envelope Icon Facebook Icon Facebook Icon Google Plus Icon Google Plus Icon Instagram Icon Instagram Icon IT Professional Icon IT Professional Icon Learn Icon Learn Icon Linkedin Icon Linkedin Icon Product Icon Product Icon Partners Icon Partners Icon Search Icon Search Icon Social Networks Icon Social Networks Icon Share Icon Share Icon Support Icon Support Icon Testimonial Icon Testimonial Icon Twitter Icon Twitter Icon

Datameer Blog

How Big Data Helps to Find Rogue Traders

By on July 10, 2012

After the problems in the financial services industry and the billions invested in “rescuing” some of the banks, you would think folks would have already figured it out. Well, it’s obviously not an easy feat, so lets discuss what a rogue trader is and what the technical challenges are to identifying them.

Based on Basel II, legislation initially created in 2004, banks can only hold as much adequate financial capital for the risk they are exposed to through lending and investing (e.g. assets such as stocks, loans). Obviously, the price of assets are constantly changing, and therefore they have a rating associated with them that indicates their particular risk. Traders can only buy assets within a certain risk category. Higher risk assets theoretically have greater potential return, so traders that are paid based on commission are somewhat incentivized to buy and sell higher risk assets because the return on investment (the bonus check) is clearly associated with risk taking – and it’s not even their money.

If an asset or a portfolio turns upside down and cannot be sold for a profit, the trader has a problem. Optimistically, a rogue trader might hang on to this asset that now represents a high loss and hope the asset recovers to a higher value. He might even trade other risky assets to cover the loss by the new trade’s potential big profits, and so the downward spiral continues.

You would think that it would be impossible to hide a loss of hundreds of millions or even billions of dollars, due to the fact that all trades are done digitally and that assets technically are just records in a database.

If only it were that easy…

After the financial shake out, many banks merged. Banks have hundreds of databases storing asset data today. Almost every mutual fund company, trading group, or asset portfolio group can have their own database. Just one of our customers alone has over 250 databases. That’s a jungle in and of itself, without even taking into consideration external data sources, like the data that comes from the rating providers, semi-structured trading log files, and more. Data analytics is extremely slow to implement in this environment – we’re talking months, if not years – because of the complex, 3-tiered data architecture. First there is an ETL process that extracts data from one database and transforms it into the static schema of another database. We call this process schema on write. Then, on top of that, we install a BI system where the only people who can do the analytics are the ones that understand the static schema and ETL process.

Now think about coming up with a perfect schema for merging 250+ different databases into a data warehouse. This would be an academic exercise that quite frankly isn’t solvable, especially with the environment changing faster than this all can be implemented.

A rogue trader takes advantage of this data jungle and either moves assets around or hides it where others aren’t looking for it, like in account 88888 in the famous movie “Rogue Trader” with Ewan McGregor.

Using this technique, certain portfolios might have an unreasonably high risk exposure today, but because the asset can be moved around in the data jungle – tomorrow things can appear to be just fine.

However, there is light on the end of the tunnel, and it’s called Hadoop. Hadoop is what will make it possible to cut through the jungle because it will allow banks to implement a concept called “schema on read” instead of the slow “schema on write” process we previously discussed. What this means is that instead of the slow ETL process where data needs to be modeled before it’s put into a data warehouse, now banks can just dump all of their raw, untouched data into a gigantic (still cheap) Hadoop cluster and instead weave the data together when it needs to be read, on an on-demand basis. So instead of spending months or years making the data ready before it is housed in a single store, now it is possible to weave all data together in hours or days in a Hadoop cluster.

It’s also important to note that Hadoop is by no means just a data store. Hadoop is a storage and compute engine, and it is highly optimized for analytical workloads, unlike traditional data warehouses.

As to be expected with any technological disruption, at first, traditional vendors didn’t take Hadoop seriously, and then they said it wasn’t enterprise ready. Today we see they’re “connecting” to Hadoop in an effort to maintain the status quo of their traditional 3-tier approach, by claiming that Hadoop “is just another data source”.

We’re proud to report that at Datameer, we’ve helped multiple financial institutions cut through the data jungle without interruption. Datameer makes it possible for business users themselves to integrate a large number of data sources and get the fast insights about trading patterns and current risk exposure that they’ve long needed.

If you’re trying to setup a rogue trader early warning system, give us a call.

Connect with Datameer

Follow us on Twitter
Connect with us on LinkedIn, Google+ and Facebook

Stefan Groschupf

Stefan Groschupf

Stefan Groschupf is a big data veteran and serial entrepreneur with strong roots in the open source community. He was one of the very few early contributors to Nutch, the open source project that spun out Hadoop, which 10 years later, is considered a 20 billion dollar business. Open source technologies designed and coded by Stefan can be found running in all 20 of the Fortune 20 companies in the world, and innovative open source technologies like Kafka, Storm, Katta and Spark, all rely on technology Stefan designed more than a half decade ago. In 2003, Groschupf was named one of the most innovative Germans under 30 by Stern Magazine. In 2013, Fast Company named Datameer, one of the most innovative companies in the world. Stefan is currently CEO and Chairman of Datameer, the company he co-founded in 2009 after several years of architecting and implementing distributed big data analytic systems for companies like Apple, EMI Music, Hoffmann La Roche, AT&T, the European Union, and others. After two years in the market, Datameer was commercially deployed in more than 30 percent of the Fortune 20. Stefan is a frequent conference speaker, contributor to industry publications and books, holds patents and is advising a set of startups on product, scale and operations. If not working, Stefan is backpacking, sea kayaking, kite boarding or mountain biking. He lives in San Francisco, California.