Statistics is about deciding what conclusions you can make from data, and how confident you can be about them. More specifically I work in multivariate analysis. Multivariate statistics is where instead of a single number [for each data point] you have a collection of numbers.
The particular area of multivariate analysis that I work in is where there’s some special structure [to the data] - for example it lies on a sphere or a circle. So the work that I do is to develop new methodology to analyse that kind of data.
Over the course of my PhD I became more interested in what you do when you’ve got data
I did my PhD in applied mathematics and mathematical biology – that field is all about taking some interesting biological phenomenon or process and trying to write down a mathematical description of it, and then analysing the mathematical description of it. Which is a really valuable approach in its own right, but over the course of my PhD I became more and more interested in what you do when you’ve got data, so I moved more and more into statistics.
I think [statistics is] a bit more closely tied to the traditional scientific approach of having a falsifiable hypothesis you can test with data. I found that is more satisfying for me, it’s something that appealed more to me than manipulating mathematical models separate from data.
I’m not going to go up to someone on the street and say “look at this new statistical method I’ve constructed” and have them go away overjoyed! [Laughs] Statistics is one stage removed from that: there are lots of fields in which statistics plays the role of supporting the data analysis that then leads onto people’s lives being affected and influenced.
As statisticians we don’t really collect data; we are providing a methodology that helps those researchers who will create something of scientific, medical or commercial impact.
It’s tempting to point to a particular paper in a particular journal, but actually I think it was getting that first fellowship [at The University of Nottingham]. It was having the opportunity then, just as I finished my PhD, to have this period where I could really focus my attention on something that by that point I knew I wanted to do.
To be really open-minded and to read widely and not to focus on a particular small field, and not try to adapt problems to the methodologies and the maths that you know, but rather to read about different approaches from different fields. It’s the exchange of ideas between fields that I think is really key and really powerful. So I would certainly encourage wide reading, open mindedness and trying to absorb with all the free moments one gets at the start of one’s career, to really use that opportunity to lap up as many things as possible.
It’s that exchange of ideas between fields that I think is really powerful
I think the main challenge in our field is the fact that it’s so fast moving and that the nature of the challenge is changing as the volume of data grows, and the data structures become more diverse. I think working out how to make sense of data of vast quantity and vastly different types, and how to make the most of those data streams - those are the real challenges.
There’s a very famous statistician called Sir Ronald Fisher, and he did a phenomenal amount of work. I think everyone would agree that the work he did was absolutely foundational. It’s maybe an obvious choice, but I think that Fisher is probably the most influential statistician that has ever lived, so it has to be him.
What were the big ideas in the last 50 years, and can I publish them? [Laughs] It would be intriguing to know the direction the field’s taken. The top-performing performing models currently available for particular tasks are often based on neural network models. But these are somewhat different to traditional statistical models. They often perform really well, but their properties, how they work, why they work, you don’t really ever know. I’d like to know if we’ve moved further towards these black box machines, which work but are hard to understand, or if we’ve gone in the other direction because understanding and interpretability of the model are important.
Global Research Theme Digital Futures
Research Priority Area Data Driven Discovery
Simon Preston is an Assistant Professor in the School of Mathematical Sciences. He previously completed two fellowships at The University of Nottingham, and works in the field of multivariate analysis.
University Park NottinghamNG7 2RD +44 (0) 115 951 5151 research@nottingham.ac.uk