“[…]C’est le narcissisme qui vient nourrir la bête, parce que, paradoxalement, beaucoup de scientifiques vont accepter des indicateurs mal construits pour dire: «Ah, mon index h est plus élevé que celui de mon collègue qui, comme vous l’avez dit, est plutôt médiocre», parce que une des caractéristiques des chercheurs, hein, et des professeurs d’université, ce sont tous de grands individualistes avec de gros égos[…]”
“[…]This is the narcissism that feeds the beast, because, paradoxically, a lot of scientists are willing to accept poorly crafted indices to say «Hey, my h-index is higher than my colleague’s who, as you’ve said, is not that good» because one of the characteristics of researchers and university professors, say, they are all individualists with limitless egos[…]”
Yves Gingras, professor of History of Sciences at the University of Québec at Montréal, transcript at 14’00” (and my own English translation) from the radio programme L’évaluation, maladie chronique de la recherche scientifique broadcast on France Culture on 04/05/2015.
I have always been massively interested in Formula 1 (as a Ferrari fan of course), approaching the sport always from the technical viewpoint rather than the glitzy, Montecarlo-like glossy image. Put it in other words, as a young boy I used to dream of becoming a car mechanic (becoming a scientist came at a much later stage), only to service the sleek scarlet single-seaters from Maranello. Now, I do find it strange that I did not want to become a driver, but surely I was, and I am, fascinated by the drivers’ skills, their courage, and the epic narratives woven around motor racing.
One can then easily imagine how happy I was when I could finally grab the opportunity to go and see the Saturday qualifying session of the British Grand Prix at Silverstone earlier in July. A glorious day it was, a day at the races, my ears were buzzing from the high-pitch whining of today’s F1 hybrid1 cars going through corners, while the acrid mixture of exhausts, fuel, oil, burned tyres was tickling my nostrils. Gone were the images on a TV screen: these cars were real, flashing past just across a fence, elegantly dancing through chicanes as if glued to the track by downforce.
It was as I was watching Fernando Alonso, two-time World Champion and one of the finest, raciest drivers of the paddock, limping around in a struggling McLaren car that I started wondering: “If Alonso were a researcher, would he be fired for not delivering results as expected by the funding agency?”. The scream of the crowd rose high as local hero Lewis Hamilton sped past to grab pole position. Hip hip hooray Hamilton! “Here’s one with a quadruple h-index, at least for now”, I thought.
Evaluation is NOT classification
Evaluation, classification, rankings, points…this is what motor racing is all about. Is it the same for scientific research? The topic of bibliometric indices, their use – and misuse – to evaluate and rank papers and researchers is a minefield, or I should say it is like racing on a wet track on slick tyres: it is so easy to spin and crash. We scientists complain about the indices, but at the same time we accept them and use them to our advantage if they can reinforce a grant application or our curriculum vitae. Ambiguity and confusion reign supreme. So I thought it was a good idea to straighten out this issue by resorting to someone else’s words. I happen to have listened to a very interesting radio programme on this subject, broadcast on the channel France Culture of the French public radio: L’évaluation, maladie chronique de la recherche scientifique. If you speak French, do take some time and listen to it. Otherwise, or if you are simply too busy, just keep on reading, and I will do a quick survey of the main points.
The guests of the radio programme, a researcher in theoretical physics and a professor of history of sciences, brought up several points worth remembering:
- Scientometry, the development of parameters measuring how science evolves and progresses, were developed in the 1960s with an eye to creating an analog to the economic indices and ratings created in the aftermath of the economic crash of 1929 (and this is scary enough, my personal comment). Bibliometry is a subset of scientometry, exclusively dealing with publications.
- The 1970s saw the dawn of a massive increase in the number of scientists and in the scientific output of universities and research centres. Citations were first introduced as a practical way of cross-linking papers forming a bibliography, but the pen-and-paper approach to bibliographic searches severely impaired their use as evaluation indices. However, the same decade marked the beginning of a new era in the evaluation in the scientific research, as funding institutions in the US started wondering if it was true that those who were funded also had the largest number of citations (this is most likely a tautology: they have the largest number of citations because they are funded).
- Eventually, citations became evaluation parameters after the 1980s and the advent of information technology let the genie out of the bottle, solving the “data crunch” that had restrained bibliometry thus far. This led to the “evaluation frenzy” that started in the late 1990s, continuing to this day, while online databases allows the so-called “wild bibliometry”.
- The number of scientists has skyrocketed; big science and big data characterise our crowded scientific arena. Hyper-evaluation seems suitable as an additional catchword defining the current scientific era. Yet, all too often we resort to indices that measure the wrong thing, like the h-index or, that are incommensurable with one another, like the impact factors of journals.
- The h-index is unreliable because it scales with the number of papers to the power of 0.9: it is easy to see that a scientist with a small number of papers with several citations is more likely to have written reports of broader interest, despite having a smaller h-index than another scientist with more papers receiving fewer citations.
- Citations themselves have to be handled carefully, as reviews and articles describing experimental procedures are likely to be very highly cited, but they are not necessarily ground-breaking.
- Journal impact factors are primarily not comparable with one another because they include self-citations to the same journal. What is worse, the misuse of impact factor has a vicious effect on journals themselves and the overall functioning of scientific publishing, in ways that are easy to understand.
- There will always be the need to evaluate scientific articles, research proposals, and the careers of individual scientists. The least worst approach is also the oldest, the peer review, which, in the words of one of the speakers, is like democracy in being [sic] “the worst form of government, except for all the rest” 2. Peer review was first introduced at the same time as the institutionalisation of science in the 17th century, to preserve the reputation of a given scientific academy or their journals (the first was established in 1665: Philosophical Transactions of the Royal Society). Peer review was formalised in the 1930s in the US, as exemplified by the journal Physical Review Letters.
- In conclusion, the take-home message is: évaluer, ce n’est pas classer, “evaluation is NOT classification“.
Let me comment on this. So, in an ideal world, peer review is the best option, and it also has the advantage of preserving the social angle, the human scale of the scientific endeavour: someone will take the time to go through the scientific articles of Dr X. Y., applicant for the post of associate professor at the university of Yew Nork. This person will discuss his/her opinion on the candidate with a panel of other people, and of course they will quarrel, disagree and but hopefully come to a shared conclusion and evaluate the candidate in the most objective way. Yet, as one can easily see, a human referee can fall prey to all sorts of subconscious (let alone conscious) bias 3, although the double-blind (the evaluator and the evaluated both do not know who the other is). More simply, referees might not have time to read the applicant’s articles with due care and attention: after all, refereeing and reviewing are not the most glorious activities for a scientist, many would argue.
Here comes the number to the rescue, this (purported) epitome of rationality, objectivity, unbiased evaluation. As I mentioned in a previous post, our fascination with numbers as accurate, impartial gauges of phenomena is probably a relic of Pythagorean thought 4, or, rather, it boils down to deeply-ingrained aspects of the human mind that the Pythagorean school was the first to identify and discuss. Give us our daily numbers, churned out by an algorithm (the more arcane, the better), and we can finally feel at ease, we can finally reduce the chaotic world surrounding us to a harmonic collection of numbers. The same holds true on a larger scale: just see the massive significance acquired by economic indicators such the debt-to-GDP ratio 5. Please, do not misunderstand what I am saying: as Galileo Galilei once remarked, it is definitely true that mathematics is a powerful language that we need to learn if we are to decipher the “book of the Universe”. Indeed, mathematics, not Kabballah-like numerology: unless one is familiar with the formula (or the algorithm), or the measuring device providing a certain number, the latter becomes an empty shell, a meaningless collection of digits. No number without its unit of measurement, as my secondary school teachers used to say. So true.
In this respect, numbers are definitely double-edged swords. There is a very contemporary eagerness to classify and rank: oneself against peers, and against oneself. Take amateur athletics, for example, one of the most popular sports activities: it is all too easy to shift one’s focus from running for fun (amateur actually means he who loves something), or to become ‘fitter’ (whatever it means, fair enough), to running as yet another opportunity to show off, bolster our online avatars, and measure oneself against others. Example: I own a GPS watch with heart rate monitor. A cool gadget. I use it regularly when I go running, I have realised that it has let me train better. Yet, when I bought it I struggled to disable all ‘social’ options of the watch: it was not happy to work offline, as I forced it to. At the end of the day, I thought, there was nothing to boast about (I am not such a good runner after all), and I did not see the point in flooding social networks with my own small-scale version of a data crunch. Here is where I see a similarity with the misuse of bibliometric indices: when humans are part of a network of peers, the temptation of pinning a badge on one’s shirt, or of going around with a performance tag to be proud of, is often too irresistible, and I have the feeling that this compulsive drive to rank oneself is on the rise in our contemporary “social network society”.
Yet, evaluation is not classification. As simple as that.
A racing life?
Let us wrap it up. The overemphasis on classification in today’s academic world seems to derive from the toxic combination between a penchant for numbers and rankings deeply ingrained in our psyche and the widespread access to all sorts of bibliography indices, academic databases and search engines. As a consequence, this anxious need to rank oneself compounds the stress arising from the policies introduced in several countries to evaluate (and often, of course, to rank) the scientific “output” or “impact” of individual researchers and research groups. Sometimes, these are time-consuming, complex and arcane procedures which gobble down precious energies that could be devoted to science.
I will leave it at that to avoid getting bogged down in a critical review of evaluation policies: after all, theleakyburette is a blog on chemistry and I want this post to be just a short foray into the minefield of the evaluation of scientific research. Generally speaking, I acknowledge that scientists, like everyone else doing whatever job, artistic or sporting activity, cannot escape some evaluation, which, in my opinion, works at its best when it is conduced by competent (human) referees on small, homogeneous samples. And yes, please, do introduce the double-blind as soon as possible.
Sadly, ever-increasing competition among researchers is all too often depicted, or perceived, as a cut-throat race in which scientists go so far as to cut corners in order to be the first to cross the chequered flag. Comparisons and metaphors aside, motor racing, at least, is racing by definition and uses timings to draw up a ranking. Whichever car+driver combination completes a given amount of laps in the shortest time wins. End of the story. Does the racing metaphor really apply to the competitive world of academic research? No, it doesn’t, I strongly believe so. In spite of what we researchers feel, and in spite of the famous “publish or perish”, science is not and must not become a race. Younger academics, in particular, have the responsibility of making sure that the spirit of the scientific enquiry does not drown amidst the rough seas of ambition and the rising tide of competition, and we should be extremely wary of the misuse of bibliometric indices. Let there be a bit of competition, like a dash of salt that seasons a dish, not like a charge of saltpeter, the blast of which we addictively need to move forth: as in old firearms, it can dramatically backfire.
However, motor racing can indeed be a metaphor of scientific research, but from another point of view. Think about the countless components that make up a racing car, provided by several manufacturers6. Think about the contribution of all mechanics who work overnight to troubleshoot and set up the car, while changing tyres in the blink of an eye during the race; let’s not forget the role of race engineers who advise the driver and devise strategies. Of course, it is the driver the one who, at the end of the day, risks his/her life to drive flat out and pushes the car to the limit and secure victory; yet, the driver’s success would simply be impossible without those who lays the foundations for success. So is the researcher’s role, the prominent tip of the complex machinery of a research laboratory or university. In this context, how on Earth could a simple number account for all the work done behind the scenes by countless people, every one of them adding their own contribution, be it large or small? Any bibliometric index referring to a single researcher will incorporate all these contributions and end up being a complex convolution of them. From this point of view, the significance of the h-index should be greatly reassessed: it is a number, nothing but a number. And as a number we all should regard it.
In the end, let me just stress it once more: the scientific enquiry is a collective undertaking. It is correct to give individual scientists the credit that they deserve for their outstanding contribution to the advancement of science; however, the increasing complexity and interdisciplinary nature of today’s scientific research must warrant an increased emphasis on teams instead of the individual. Someone has remarked that the Nobel Prize should be updated and awarded to teams as well7. A long overdue update indeed. On the other hand, I once heard someone saying (I honestly do not remember when and where) that the international mobility of researchers and the competition among universities to hire the brightest minds is the contemporary counterpart of the situation in Renaissance Italy, when all sorts of princely courts, city-states and statelets would vie for the most talented artists, who, in turn, ended up moving wherever they were offered the best ‘facilities’ to create their masterpieces. Art and science: one of my favourite subjects, so much so that I myself wrote at the back of my PhD thesis that the research group is the contemporary analogue of the art workshop of the Renaissance. That said, there is a fundamental difference between artists and scientists, which Martin Rees, then President of the Royal Society and Astronomer Royal, clearly expressed in his last 2010 Reith Lecture, Runaway World 8. In a nutshell: the individual scientist is disposable, the individual artist is not, but his/her contribution might not last as long. In Rees’ own words 8: “Any artist’s work is individual and distinctive – but it generally doesn’t work, doesn’t last. Contrarywise, even the journeyman scientist adds a few durable bricks to the corpus of ‘public knowledge’. But our contributions as scientists lose their identity. If A didn’t discover something, in general B soon would – indeed there are many cases of near-simultaneous discovery. Not so, of course, in the arts. As another Reith Lecturer, Peter Medawar, remarked, when Wagner diverted his energies for ten years, in the middle of the Ring cycle, to compose Meistersinger and Tristan, he wasn’t worried that someone would scoop him on Götterdammerung. Even Einstein exemplifies this contrast. He made a greater imprint on 20th century science than any other individual; but had he never existed all his insights would by now have been revealed – though gradually, by several people, rather than by one great mind“.
Let us learn and savour the pleasure of assembling our cars, bit by bit; let us feel the skin-like texture of the warm rubber surface of slick tyres when they come out of the blankets. Let us take the wheel in a firm grip and secure our safety belts as we sit in the cockpit. And when we head off onto the track to race for pole position, let us not forget: “What science teaches us is not the fulfullment in the act of finding, but beauty awakened in the moments of searching” 9.
1. As of last season, F1 racing cars are equipped with a dazzling array of energy-recovery and energy-storage systems. Here the word hybrid is really to be understood in its deeper meaning: the internal combustion engine, the braking system, the turbocharger, and an electric motor all dance in unison to a tempo that can change from lap to lap to deliver either more acceleration or a higher fuel economy. The basic principle is: when it comes to energy, every little helps. For example, braking can then become a source of energy, harvested and stored, or delivered directly to the electric motor. A battery is the major energy storage device, but others can be envisaged, such as flywheels or supercapacitors. (By the way, batteries and supercapacitors are the battlefield of electrochemistry, my own discipline). The complex sequence of events taking place when the driver brakes or accelerates needs adaption in terms of driving style with respect to previous cars (up to the 2013 season). Hence the ongoing problems experienced by Räikkönen: he has been prone to spinning, which can boil down both to his driving style not matching the new car and shortcomings in the management of energy harvesting and delivery in his Ferrari. Racing geeks like me can read a full account of technicalities on this webpage.
2. Winston Churchill’s quote seems to have been: “Many forms of Government have been tried and will be tried in this world of sin and woe. No one pretends that democracy is perfect or all-wise. Indeed, it has been said that democracy is the worst form of government except all those other forms that have been tried from time to time.”
3. An article appeared in Le Monde diplomatique (French edition), June 2015, Personne n’est à l’abri, (“Nobody is safe“) by Marina Maestrutti, discusses four subconscious biases contributing to the development of conspiracy theories. Scientists (and referees) should be wary of (some) them as well: conjunction fallacy (we tend to overestimate the correlation between any two distinct events), causation fallacy (we tend to prefer explanations involving clear-cut causation rather than admitting that purely random events took place), the exposure fallacy (we are heavily influenced by explanatory models or theories to which we are exposed, and we tend to construct a validation of them from available observation) and the verification fallacy (we tend to seek corroboration of theories that we hold true a priori rather than looking for evidence falsifying them).
4. Hilariously enough, the h-index is an integer, the type of number Pythagorean disciples worshipped.
5. A more scholarly discussion dealing with the veneration of numbers as tools of economic and political governance can again be found in Le Monde diplomatique (French edition): Le rêve de l’harmonie par le calcul (“The dream of harmony through computing“), Alain Supiot, February 2015. From the abstract available online, here is my own translation: “The fascination for numbers and their ordering power is ancient; it is not unique of Western cultures. The interest in their symbolic value is one of the key features of Chinese thought, and the contribution of Indian, Persian and Arabic mathematics to this discipline is well-known. Yet, it is the Western world that has continuously deepened its analysis of numbers: at first venerated idols, later on they became instruments of knowledge and then of prediction, only to be endowed with a truly legal value by means of the contemporary practice of governance through numbers.”
6. An Italian firm based in the Bergamo area (yeah!) is the leading supplier of brakes, for example (look up the name yourselves!).
7. In Scientific American.
8. Transcripts available online.
9. V. Uskoković, in Foundations of Science, 2010, 15, 303-344