Recommended Reading for Big Data & GRC

Yes, technology moves fast, but sometimes you come across an old chestnut of guidance about tech that still applies today. That happened to me as I stumbled across a collection of essays about Big Data that offer a bundle of insight about how to use Big Data and cloud computing for better GRC management.

The book is Planning for Big Data by Edd Dunbill, published in 2012 (because that’s what passes for “old chestnut” in this field, but anyway…) and it’s available as a free e-book from O’Reilly. Most of the essays are more technical than the typical compliance officer needs to know, addressing specific Big Data technologies or vendors best left to your CIO. The most useful essays for compliance and audit professionals are the first two, which talk about how Big Data should work (Chapter 1, “The Feedback Economy”) and why Big Data has so much potential (Chapter 2, “What Is Big Data?”).

I found the essays useful even now, after four more years of advances in Big Data and cloud technology, specifically because they were written before all those advances happened. They frame Big Data in a broad outline that non-technical people (like compliance professionals) can understand, explaining how its underlying structure lets Big Data solve problems that traditional IT cannot. What’s more, we here in the world of regulatory compliance have had four years of advances in our own field, too—so now we can articulate our needs more clearly.

Take that sharper awareness of our own headaches, read these essays about Big Data, and you come away with a much stronger “oh, now I get it” sense of how to make Big Data work well for you—which, in theory, is what all these GRC tech projects are supposed to do.

catBetter to start with Chapter 2, actually, since that essay explores why Big Data is what it is, and how that fact dictates lots of the technology structure that makes Big Data run. At its core, Dunbill says, “Big Data is data that exceeds the processing capacity of conventional databases. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures.”

Those two sentences alone should get a compliance officer’s brain cells firing. Your problems exceed the capacity of conventional databases, which for too many companies is still the Excel spreadsheet. Your data can be too big (if you are managing thousands of third parties or financial transactions), move too fast (if you are monitoring geo-location data or social media for privacy concerns), or doesn’t fit the strictures of database architectures (if multiple streams of data come at you in a variety of formats). Corporate compliance challenges today are tailor-made for Big Data, because they are all about taking many fragments of data and piecing them into one answer for a larger question such as, “Are our third-party risks improving or not?”

The other part of the essay explores the implication that Big Data definition carries: if Big Data tackles problems too big for conventional databases, that means you will need more than Excel spreadsheets hosted on one company server to get going. Lots of Big Data problems are solved by breaking up your big pile of data into smaller piles, spread across many computers, which each process their own small pile of data before the answers get recombined back into one answer.

What does that mean for a compliance officer? It means you can propose huge new demands on your IT department sure to make the CIO drop you on Facebook, or it means you use cloud-based software for your GRC needs—since cloud-based computing does just what that distributed computing system describes.

And now we non-technical compliance professionals are much closer to understanding how Big Data can be applied to compliance problems. At the least, you can have more useful conversations with vendors and the CIO about what might fit your needs.

The first essay talks more about how Big Data computing should perform over time, since the problems Big Data tries to solve are complex systems. (Picture the difference between totaling up assets and liabilities in a quarterly financial statement, versus risk-ranking and monitoring 20,000 third parties every day.) When done correctly, Dunbill says, Big Data helps people follow a loop: observe, orient, decide, act. You can process piles of data, understand the meaning within them, decide what to do, and put that decision into action.

Yes, one challenge is that much of the data you want to study will be unstructured, so you’ll need to normalize it all into something that can be processed to answer the questions you have; that’s what GRC vendors and IT developers are for. Will they do a perfect job? No, and a fair number might not even do a good job—but that can be due to unfocused thinking as much as to sloppy coding. These two Big Data essays are one good start to help focus the thought process.

Leave a Comment

You must be logged in to post a comment.