Editor: Guest poster Nathan Coombs brings us observations and speculations from the cutting edge of “big data” analysis in finance.

Observations and speculations about topological data analysis
By Nathan Coombs

Those involved in the social studies of finance should be interested in innovations taking place within the field of topological data analysis (TDA). This sophisticated approach to exploiting big data looks set to change how complex information is utilised, with unknown repercussions for the operation of financial and ‘real’ markets. The technology may not have yet entered into financial activities, but it almost certainly soon will.

The start-up firm, Ayasdi, is at the forefront of pioneering commercial TDA applications. Founded in 2008 by Stanford mathematics professor Gunnar Carlsson, along with Gurjeet Singh and Harlan Sexton, Ayasdi now has operational models for automated analysis of high-dimensional data sets. They have attracted $30.6 million in funding as of July 2013, and signed up major pharmaceutical groups, government agencies, and oil and gas companies. According to their website they also have their sights set on bringing the technology to finance. Even US President Obama reportedly asked for a demonstration of their system. The company’s marketing stresses that their systems conduct what they call ‘automated discovery’. That is, Ayasdi’s algorithms will uncover patterns without a human agent first formulating a hypothesis in order to conduct statistical tests. They thus claim their system is able to discover things that you didn’t even know you were looking for.

TDA of the type being developed by Ayasdi is able to take this unprecedented approach because it can discern patterns in large data sets which would be otherwise difficult to extract. Normal data mining approaches when applied to high-dimensional data sets are bound within computation limits; TDA, on the other hand, can circumvent the information processing horizon of conventional techniques. Carlsson et al. (2013) enumerate the features unique to topology which explain its efficacy in dealing with data:

First is the coordinate-free nature of topological analysis. Topology is the study of shapes and deformations; it is therefore primarily qualitative. This makes it highly suited to the study of data when the relations (distance between pairs of points: a metric space) are typically more important than the specific coordinates used.

Second is topology’s capacity to describe the changes within shapes and patterns under relatively small deformations. The ability of topology to hold on to the invariance of shapes as they are stretched and contorted means that when applied to data sets this form of analysis will not be overly sensitive to noise.

Third, topology permits the compressed representation of shapes. Rather than taking an object in all its complexity, topology builds a finite representation in the form of a network, also called a simplical complex.

These features of topology provide a powerful way to analyse large data sets. The compression capacities of topology, in combination with its coordinate-free manipulation of data and the resistance of its simplical complexes to noise, means that once a data set is converted into topological form their algorithms can much more efficiently and powerfully find patterns within it.

How these topological insights are turned into functional algorithms is, however, complex. In order to follow in fine detail Carlsson’s expositions of the methods involved (e.g. 2009), training in the field of algebraic topology is probably necessary. The terms persistent homology, Delauney triangulation, Betti numbers, and functoriality should be familiar to anyone attempting in depth understanding of the scholarly papers. Compared, for example, to the Black-Scholes-Merton formula for option pricing, which can be interpreted fairly easily by anyone with a grasp on the syntax of differential algebra, TDA works at a level of mathematical sophistication practically inaccessible to those without advanced mathematical training. In this it is not unique; most high-level information processing methods are complex. But it does result in a curiously blurred line between academic and commercial research.

Whereas for example the proprietary models created by quants in investment banks are typically only published academically after a lag time of a numbers of years, Carlsson and his colleagues at Ayasdi published their approach ahead of beginning commercial operations. Although these publications do not detail the specific algorithms developed by the company used to turn TDA into operational software, they do lay out most of the conceptual work lying behind it.

Why this openness about their approach? Partly, at least, the answer seems to rest with the complexity of the mathematics involved. As co-founder Gurjeet Singh puts it: ‘Ayasdi’s topology-oriented approach to data analysis is unique because few people understand the underlying technology … “As far as we know, we are the only company capitalizing on topological data analysis,” he added. “It’s such a small community of people who even understand how this works. We know them all. A lot of them work at our company already.”’

It is a situation that poses both challenges and opportunities for sociologists of finance. The challenge lies with getting up to speed with this branch of mathematics so that it is possible to follow the technical work pursued by companies like Ayadi. The opportunity is that since those involved in TDA seem relatively open in publishing their methods, researchers are not restricted to following developments years after they have already been deployed in the marketplace. Researchers should be able to follow theoretical developments in TDA synchronously with their application over the following years.

What might we expect of TDA when it is inevitably applied to finance? In the first instance, it should give a marked advantage to those firms who are early adopters. The capacity of its algorithms to detect unknown patterns – indeed, patterns in places no one even thought to look – should lend these firms the ability to exploit pricing anomalies. As the technology becomes more widespread, however, the exhaustive nature of TDA – it can literally discover every pattern there is to discover within a data set – could lead to the elimination of anomalies. As soon as they appear, they will be instantaneously exploited and hence erased. Every possible anomaly could be detected by every trading firm simultaneously, and with it, according to the efficient market hypothesis, any potential for arbitrage profits.

Of course, TDA is not a static technology; the addition of new and different algorithms to the fundamental data mining model could lead to various forms of it emerging. Similarly, the stratification of risk tolerance amongst market participants could lead to borderline cases where the statistical significance of the patterns detected by its algorithms separates high-risk from low-risk traders. But at its most fundamental, it does not seem obvious how TDA could do more than expose universally all pricing anomalies. TDA might therefore spell the death toll for conventional arbitrage oriented forms of trading.

Beyond this, it is still too early to speculate further about the consequences of the technology for finance. Across the broader sweep of applications in the ‘real’ economy, however, TDA will likely deepen the automation of logistics, marketing and even strategic decision-making. The technology’s capacity to automate discovery and feed such insights into predictive analytics may herald an era of economic transition, whereby it is no longer just routine tasks such as card payments and credit checks which are automated, but, moreover, middle class professional work such as research and management, previously believed to require the ineluctable human touch. In turn, such changes raise profound questions for the epistemology underlying many free market theories like F.A. Hayek’s, which place emphasis on the engaged, culturally-sustained practical knowledge of market participants. With the increasing mathematization of economic processes attendant to the automation of coordination activity, the pertinence of such epistemologies may well be on the wane.


Carlsson, G. et al. (2013) ‘Extracting insights from the shape of complex data using topology’, Scientific Reports, No. 3. Available at: http://www.nature.com/srep/2013/130207/srep01236/full/srep01236.html

Carlsson, G. (2009) ‘Topology and Data’, Bulletin of the American Mathematical Society, Vol. 46, No. 2, April, pp. 255-308. Available at: http://www.ams.org/journals/bull/2009-46-02/S0273-0979-09-01249-X/

Dr. Nathan Coombs is beginning as a Research Fellow at the University of Edinburgh in October. His postdoctoral project concerns the challenge of ‘big data’ and other automating technologies for fundamental theories of political economy. He is co-editor of the Journal of Critical Globalisation Studies.


Here is a fascinating NPR interview with Thomas Peterffy, the Hungarian who invented not one but two things crucial to financial markets today: one of the first computer programs to price options, and high-speed trading.


Today one of the richest in America, Thomas Peterffy recounts his youth in Communist Hungary where as a schoolboy he sold his classmates a sought-after Western good: chewing gum. Let’s disregard for a moment Peterffy’s recent political activities and rewind almost half a century.


Peterffy was a trader on Wall Street who came up with an option pricing program in the 1970s. The Hungarian-born computer programmer tells the story of how he figured out the non-random movement of options prices, programmed it, but could not possibly bring his computer on the trading floor at the time, so he printed tables from his computer with different option prices and brought the papers in a big binder into the trading pit. But the manager of the exchange did not allow the binder, either, so Peterffy ended up folding the papers and they were sticking out of his pockets in all directions. Similar practices were taking place at around this time in Chicago, as MacKenzie and Millo (2003) have documented. Trading by math was not popular, and his peers duly made fun of him: an immigrant guy with a “weird accent”, as Peterffy says. Sure enough, we know from Peter Levin, Melissa Fisher and many other sociologists’ and anthropologists’ research that trading face-to-face was  full of white machismo. But Peterffy’s persistence meant the start of automated trading and according to many, the development of NASDAQ as we know it.


The second unusual thing Peterffy did in the 1980s (!) was connect his computer directly to the stock exchange cables, directly receiving prices and executing algorithms at high speed. Peterffy describes in the NPR interview how he cut the wires coming from the exchange and plugged them straight into his computer, which then could execute the algorithms without input from a human. And so high-speed trading was born.


My intention here is not to glorify my fellow countryman, by any means, but to add two sociological notes:


1. On options pricing automation: although the story is similar, if not identical, to what is described by Donald MacKenzie and Yuval Millo (2003) in their paper on the creation of the Chicago Board Options Exchange, there seems to be a difference. The economists are missing from the picture. The Chicago economists who were involved in distributing the Black-Scholes formula to traders were a crucial part of the process by which trading on the CBOE became closer to the predictions of the theoretical option-pricing model. But in the case of Peterffy and the New York Stock Exchange, the engineering innovation did not seem to be built around the theoretical model. I am not sure he used Black-Scholes, even if he came up with his predictive models at the same time.


What does this seemingly pragmatic, inductive development of algorithm mean for the rise of automated trading? Moreover, how does this story relate to what happened in Chicago at the CBOE around this time, where economics turned out to be performative, where the Black-Scholes formula was what changed the market’s performance (MacKenzie and Millo)?


2. On high-frequency trading: picking up on conversations we had at the Open University (CRESC) – Leicester workshop last week, Peterffy was among the first who recognized something important about the stock exchanges. Physical information flow, ie the actual cable, is a useful way to think about presence “in” the market. While everyone was trading face-to-face, and learning about prices via the centralized and distributed stock ticker (another invention in and of itself), Peterffy’s re-cabling, if controversial, put his algorithms at an advantage to learn about prices and issue trades. This also became a fight about the small print in the contractual relationship between the Exchange and the trading party, but Peterffy’s inventions prevailed.


So much for a trailer to this automation thriller. We can read the full story of Peterffy in Automate This: How Algorithms Came to Rule Our World, a book by Christopher Steiner (2012), who argues that Peterffy’s 1960s programming introduced “The Algorithm That Changed Wall Street”. Now obviously, innovations like this are not one man’s single-handed achievement. But a part of the innovation story has been overlooked, and it has to do with familiarity and “fitting in”. Hence my favorite part of the interview, where Peterffy talks about the big binder he was shuffling into the trading pit (recounted with an unmistakable Hungarian accent):


“They asked ‘What is this?’ I said, these are my numbers which will help me trade, hopefully. They looked at me strange, they didn’t understand my accent. I did not feel very welcome.”


The fact that what became a crucial innovation on Wall Street came partly from an immigrant with a heavy accent, is a case in point for those chronicling the gender, racial and ethnic exclusions and inclusions that have taken place on Wall Street (for example, Melissa Fisher, Karen Ho, Michael Lewis).

The Wall Street Journal (WSJ) recently published a headline article titled “Hedge Funds’ Pack Behaviors Magnifies Market Swings”. While it is not unusual to see the WSJ write on hedge funds and market swings, this article is unusual because it emphasizes the social ties linking investors. It reflects a sea change in the way that the public and the media view financial markets – and an opportunity for the social studies of finance (SSF) to reach a broader audience.

For the past decade, the quant metaphor has dominated public perceptions of financial markets. Institutional investors – particularly hedge funds – were seen as “quants” that used sophisticated computer models to analyze market trends. This idea went hand-in-hand with the view that markets were efficient – fueled by reliable, public data, proceed through sophisticated, rational algorithms, and powered by intelligent computer systems instead of mistake-prone humans.

Of course, the recent financial crisis has dislodged such beliefs. Instead of mathematical geniuses finding hidden patterns in public data, quants were revealed as Wizards of Oz – mere human beings capable of making mistakes. Their tools – computerized systems – went from being the enforcers of an efficient market to a worrying source of market instability. As stories about flash trading and inexplicable volatility popped up, the public even began to ask whether the quants were trying to defraud the public.

If institutional investors are mere humans instead of quantitative demigods, shouldn’t they also act like humans? And – shouldn’t their social natures affect the way they make investment decisions? The mainstream media is finally confronting such questions – which SSF has long raised. This particular WSJ article parallels a widely-circulated working paper by Jan Simon, Yuval Millo and their collaborators, as well as my own work under review at ASR.

The world is finally catching up with SSF. Will we finally be heard? It is our responsibility to reach out to the public and the media.

Many readers of this blog may have already come across a fascinating story in August from the Atlantic about mysterious high-frequency trading behavior. I missed it the first time around, on account of ASA perhaps, but recently found it: Market Data Firm Spots the Tracks of Bizarre Robot Traders. If the title alone didn’t make you want to read this story, I don’t know what could. Bizarre Robot Traders? I’m sold!

The story describes a tremendous number of nonsense bids – bids that are far below or above the current market price, and thus will never be filled – made at incredible speed in a regular, and quite pretty, patterns:

Are these noise trades an attempt to gain a tiny speed advantage?

Donovan thinks that the odd algorithms are just a way of introducing noise into the works. Other firms have to deal with that noise, but the originating entity can easily filter it out because they know what they did. Perhaps that gives them an advantage of some milliseconds. In the highly competitive and fast HFT world, where even one’s physical proximity to a stock exchange matters, market players could be looking for any advantage.

Or are they trial runs for a denial of service attack?

But already since the May event, Nanex’s monitoring turned up another potentially disastrous situation. On July 16 in a quiet hour before the market opened, suddenly they saw a huge spike in bandwidth. When they looked at the data, they found that 84,000 quotes for each of 300 stocks had been made in under 20 seconds.

“This all happened pre-market when volume is low, but if this kind of burst had come in at a time when we were getting hit hardest, I guarantee it would have caused delays in the [central quotation system],” Donovan said. That, in turn, could have become one of those dominoes that always seem to present themselves whenever there is a catastrophic failure of a complex system.

I certainly don’t know – do any of you? Either way, this story (“Bizarre Robot Traders!”) makes me feel like finance has finally entered into the science fiction future I was promised in my childhood.

Every week starting today, Socializing Finance will post a couple of SSF-readable / related links. This week’s choice is a classical SSF theme, “humans and machines”.

Settlement Day“: reading the future through the development of GSNET. A parody of the ‘rise of the machines’ starring algorithms (among others).

Trading Desk”: If you ever wanted to know how traders use their keyboards in order to release daily tensions at work, this link is for you.

Explaining Market Events“: The preliminary report jointly produced by the CFTC and the SEC on recent events mentioned here.

Me and my Machine“: Automated Trader’s freaky section. This is Geek’s stuff.

Nerds on Wall Street“: A recent (2009) reference with interesting information on algo trading and the development of automated markets.

An interesting commentary appeared on BBC news about yesterday’s plunge in
US stock markets due to Greece’s continuing debt crisis:

“Computer trading is thought to have cranked up the losses, as
programmes designed to sell stocks at a specified level came into
action when the market started falling. ‘I think the machines just
took over,’ said Charlie Smith, chief investment officer at Fort Pitt
Capital Group. ‘There’s not a lot of human interaction. We’ve known
that automated trading can run away from you, and I think that’s what
we saw happen today.’”

Here the trader differentiates between two kinds of “panic” process
that both appear to the observers of the market as falling stock
prices: selling spells generated by machine interaction versus human
interaction. He assures that this time the plunge happened because the
machines were trading. This is a different kind of panic than what we
conventionally think of, one that is based on expectations about
European government debt, which escalates as traders are watching each
other’s moves, or more precisely, “the market’s” movement. Which kind
of panic prevails seems to be specific to the trading system of each
type of market. Another trader reassures us that today’s dive was “an
equity market structure issue, there’s no major problem going on.”

It is interesting that the traders almost dismiss the plunge as a periodic
and temporary side-effect, automated trading gone wild. Real problems
seem to emerge only when humans are involved. But if machine sociality
can crash a market and have ripple effects to other markets, then
perhaps the agency of trading software should be recognized.

Still with the on-going Goldman Sachs story: yesterday, during one of the hearings of the American Senate Governmental Affairs subcommittee we had one of these rare chances where worldviews collide ‘on air’. In yesterday’s hearing, Senator Carl Levin was questioning former Goldman Sachs Mortgages Department head Daniel Sparks about matters related to selling of structured mortgage-based financial products known as Timberwolf, during 2007. The full transcript is not available (you can see the video here), but a few lines can give us a gist of the dialogue that took place. When Levin asks Sparks why Goldman Sachs hid from the customers their opinion of the value of Timberwolf (a product that an internal GS memo described as a ‘shitty deal’), Sparks answers that ‘there are prices in the market that people want to invest in things’. On another occasion exchange, when asked what volume of the Timberwolf contract was sold, Sparks answered: ‘I don’t know, but the price would have reflected levels that they [buyers] would have wanted to invest at that time’.

This reveals the incompatibility in its naked form. While Levin focused on the discrepancy between the opinions among Goldman Sachs’ employees about the value of the product and between the prices paid for these financial contracts, Sparks placed ‘the market’ as the final arbiter about matters of value. That is, according to this order of worth it does not matter what one thinks or knows about the value of assets, it only matters what price is agreed on in the market. Both Levin and Sparks agree that not all information was available to all market actors. However, while this is a matter for moral concern according to Levin’s order of worth, it is merely a temporary inefficiency according to Sparks’ view.

Moreover, the fact that this dialogue took place in a highly-visible political arena, a televised Congressional hearing, entrenches the ‘ideal type’ roles that Levin and Sparks play. Sparks, no doubt at the advice of his lawyers, played the role of the reflexive Homo economicus, claiming, in effect, that markets are the only device of distributional justice to which he should refer. Levin, in contrast, played the role of the tribune of the people, calling for inter-personal norms and practices of decency. These two ideal type worldviews, as Boltanski and Thevenot show, cannot be reconciled. What we call ‘the economy’, then, is oftentimes the chronology of the struggle between these orders of worth