Guest Post: Topological Data Analysis and Finance
August 2, 2013
Editor: Guest poster Nathan Coombs brings us observations and speculations from the cutting edge of “big data” analysis in finance.
Observations and speculations about topological data analysis
By Nathan Coombs
Those involved in the social studies of finance should be interested in innovations taking place within the field of topological data analysis (TDA). This sophisticated approach to exploiting big data looks set to change how complex information is utilised, with unknown repercussions for the operation of financial and ‘real’ markets. The technology may not have yet entered into financial activities, but it almost certainly soon will.
The start-up firm, Ayasdi, is at the forefront of pioneering commercial TDA applications. Founded in 2008 by Stanford mathematics professor Gunnar Carlsson, along with Gurjeet Singh and Harlan Sexton, Ayasdi now has operational models for automated analysis of high-dimensional data sets. They have attracted $30.6 million in funding as of July 2013, and signed up major pharmaceutical groups, government agencies, and oil and gas companies. According to their website they also have their sights set on bringing the technology to finance. Even US President Obama reportedly asked for a demonstration of their system. The company’s marketing stresses that their systems conduct what they call ‘automated discovery’. That is, Ayasdi’s algorithms will uncover patterns without a human agent first formulating a hypothesis in order to conduct statistical tests. They thus claim their system is able to discover things that you didn’t even know you were looking for.
TDA of the type being developed by Ayasdi is able to take this unprecedented approach because it can discern patterns in large data sets which would be otherwise difficult to extract. Normal data mining approaches when applied to high-dimensional data sets are bound within computation limits; TDA, on the other hand, can circumvent the information processing horizon of conventional techniques. Carlsson et al. (2013) enumerate the features unique to topology which explain its efficacy in dealing with data:
First is the coordinate-free nature of topological analysis. Topology is the study of shapes and deformations; it is therefore primarily qualitative. This makes it highly suited to the study of data when the relations (distance between pairs of points: a metric space) are typically more important than the specific coordinates used.
Second is topology’s capacity to describe the changes within shapes and patterns under relatively small deformations. The ability of topology to hold on to the invariance of shapes as they are stretched and contorted means that when applied to data sets this form of analysis will not be overly sensitive to noise.
Third, topology permits the compressed representation of shapes. Rather than taking an object in all its complexity, topology builds a finite representation in the form of a network, also called a simplical complex.
These features of topology provide a powerful way to analyse large data sets. The compression capacities of topology, in combination with its coordinate-free manipulation of data and the resistance of its simplical complexes to noise, means that once a data set is converted into topological form their algorithms can much more efficiently and powerfully find patterns within it.
How these topological insights are turned into functional algorithms is, however, complex. In order to follow in fine detail Carlsson’s expositions of the methods involved (e.g. 2009), training in the field of algebraic topology is probably necessary. The terms persistent homology, Delauney triangulation, Betti numbers, and functoriality should be familiar to anyone attempting in depth understanding of the scholarly papers. Compared, for example, to the Black-Scholes-Merton formula for option pricing, which can be interpreted fairly easily by anyone with a grasp on the syntax of differential algebra, TDA works at a level of mathematical sophistication practically inaccessible to those without advanced mathematical training. In this it is not unique; most high-level information processing methods are complex. But it does result in a curiously blurred line between academic and commercial research.
Whereas for example the proprietary models created by quants in investment banks are typically only published academically after a lag time of a numbers of years, Carlsson and his colleagues at Ayasdi published their approach ahead of beginning commercial operations. Although these publications do not detail the specific algorithms developed by the company used to turn TDA into operational software, they do lay out most of the conceptual work lying behind it.
Why this openness about their approach? Partly, at least, the answer seems to rest with the complexity of the mathematics involved. As co-founder Gurjeet Singh puts it: ‘Ayasdi’s topology-oriented approach to data analysis is unique because few people understand the underlying technology … “As far as we know, we are the only company capitalizing on topological data analysis,” he added. “It’s such a small community of people who even understand how this works. We know them all. A lot of them work at our company already.”’
It is a situation that poses both challenges and opportunities for sociologists of finance. The challenge lies with getting up to speed with this branch of mathematics so that it is possible to follow the technical work pursued by companies like Ayadi. The opportunity is that since those involved in TDA seem relatively open in publishing their methods, researchers are not restricted to following developments years after they have already been deployed in the marketplace. Researchers should be able to follow theoretical developments in TDA synchronously with their application over the following years.
What might we expect of TDA when it is inevitably applied to finance? In the first instance, it should give a marked advantage to those firms who are early adopters. The capacity of its algorithms to detect unknown patterns – indeed, patterns in places no one even thought to look – should lend these firms the ability to exploit pricing anomalies. As the technology becomes more widespread, however, the exhaustive nature of TDA – it can literally discover every pattern there is to discover within a data set – could lead to the elimination of anomalies. As soon as they appear, they will be instantaneously exploited and hence erased. Every possible anomaly could be detected by every trading firm simultaneously, and with it, according to the efficient market hypothesis, any potential for arbitrage profits.
Of course, TDA is not a static technology; the addition of new and different algorithms to the fundamental data mining model could lead to various forms of it emerging. Similarly, the stratification of risk tolerance amongst market participants could lead to borderline cases where the statistical significance of the patterns detected by its algorithms separates high-risk from low-risk traders. But at its most fundamental, it does not seem obvious how TDA could do more than expose universally all pricing anomalies. TDA might therefore spell the death toll for conventional arbitrage oriented forms of trading.
Beyond this, it is still too early to speculate further about the consequences of the technology for finance. Across the broader sweep of applications in the ‘real’ economy, however, TDA will likely deepen the automation of logistics, marketing and even strategic decision-making. The technology’s capacity to automate discovery and feed such insights into predictive analytics may herald an era of economic transition, whereby it is no longer just routine tasks such as card payments and credit checks which are automated, but, moreover, middle class professional work such as research and management, previously believed to require the ineluctable human touch. In turn, such changes raise profound questions for the epistemology underlying many free market theories like F.A. Hayek’s, which place emphasis on the engaged, culturally-sustained practical knowledge of market participants. With the increasing mathematization of economic processes attendant to the automation of coordination activity, the pertinence of such epistemologies may well be on the wane.
Carlsson, G. et al. (2013) ‘Extracting insights from the shape of complex data using topology’, Scientific Reports, No. 3. Available at: http://www.nature.com/srep/2013/130207/srep01236/full/srep01236.html
Carlsson, G. (2009) ‘Topology and Data’, Bulletin of the American Mathematical Society, Vol. 46, No. 2, April, pp. 255-308. Available at: http://www.ams.org/journals/bull/2009-46-02/S0273-0979-09-01249-X/
Dr. Nathan Coombs is beginning as a Research Fellow at the University of Edinburgh in October. His postdoctoral project concerns the challenge of ‘big data’ and other automating technologies for fundamental theories of political economy. He is co-editor of the Journal of Critical Globalisation Studies.