Artificial Intelligence And Big Data: The Amazing Digital Transformation Of Elsevier From Publisher To Tech Company

Artificial Intelligence And Big Data: The Amazing Digital Transformation Of Elsevier From Publisher To Tech Company

Elsevier is one of the world’s leading publishers of scientific and medical information. While most famous as the publisher of leading journals such as The Lancet and Cell, the business has undergone thorough transformation in recent years to position itself as a provider of analytics tools and platforms for the medical, academic and scientific community. 

Artificial Intelligence And Big Data: The Amazing Digital Transformation Of Elsevier From Publisher To Tech Company

This has been achieved by building advanced analytics systems utilising big data and machine learning on top of the huge amount of data collated and published by the company in its 140-year history.  

Traditionally scientific or academic research would have involved pouring over paper-based publications and books, using indexes to hopefully find relevant information. The first stage in transforming this process into a digital experience was achieved with the widespread digitization of literature which occurred as the internet increasingly became a part of everyday life.        

Search engines and metadata simplified the process of searching through more than 400,000 articles which appeared in Elsevier publications each year. Within the business this is considered to be stage one of the company’s ongoing digital transformation.

Now, with the amount of information in the world growing at an exponential rate, stage two is well underway. CTO Dan Olley spoke to me about the ongoing task of adapting to new ways that researchers, clinicians and educators are seeking out information in the digital age.

“The problem is that we’re approaching information overload, ” he tells me. 

“We’ve all got way too much information coming at us – and the challenge is to distil that down to what’s really important, getting the right bit of knowledge at the right point, and distinguishing fact from not-so-accurate fact.” 

Information overload certainly isn’t a problem that’s unique to Elsevier’s customers, or the wider academic and scientific communities at large. It’s estimated that the total amount of data in the world doubles every two years. By 2020, the amount of information in existence will be 45 zettabytes – that’s 45 trillion gigabytes, to put it in terms that the human brain can just about comprehend. If all of that information was stored on iPads with 128 gigabytes of memory, you’d be able to build a stack of them reaching from the Earth to the Moon – six times. 

So the problem these days is rarely a lack of information. It’s more likely to be a matter of where to find the right information at the right time, and accessing it in ways that fit in with the way researchers and scientists work on a day-to-day basis.

Olley says “I believe we need to give our customers the knowledge that they need at the right time, rather than just a whole lot of content.

“This is how analytics comes into the picture for us – how do we give people the information they need to help them make the best decisions? To really help clinicians enhance lives, and help scientists make breakthroughs, rather than just give them stuff to read.”  

Valuable as it is, if information is locked away in forms that only humans can read – journals, publication, documents, diagrams and photographs. This sort of information is known as unstructured data – because it doesn’t fit neatly into the rows and columns that traditional computer analytics software needs to be able to process data. As a result, it can only ever be processed as quickly as people can read it. 

This is where machine learning comes in. Machine learning is the term used for the current cutting-edge in artificial intelligence algorithms – computer software designed to become increasingly efficient at processing data as it “learns” in much the same way as humans do.

“What machine learning does for us is unlock the ability to start processing this unstructured data, and start to derive insights from it.  

“We can use machine learning to extract information and insights from documents in ways that ‘traditional’ natural language processing has struggled to do. This is even more true when you come on to images and other visual data” 

One of the first applications Elsevier found for this revolutionary technology was to solve a simple problem it became aware of, by studying how humans were using its existing systems.

Analysis of search terms entered by researchers found that most often they were looking for information in the form of flow charts. Luckily image recognition – sometimes known as computer vision, because it seeks to enable computers to “see” in the same way humans do – was immediately useful here.  

By training machine learning algorithms to comb through hundreds of thousands of research papers and articles and become increasingly good at deciding when an image constituted a flow chart, as opposed to a bar chart, pie chart or photograph – it could begin to return results more accurately matching what the human researchers were looking for. 

While it was doing this, it also began to classify every other image it came across. Rather than just rejecting a photograph as “not a flow chart”, and therefore not required, it becomes increasingly good at categorising and labelling images of all sorts. So, when the next human researcher looked for photographs, the data was already labelled in the system.    

Compared to carrying out this work manually, employing machine learning saved tens of person-years’ of work. “We got the thing live in a few months, and it’s been a really useful feature, ” says Olley.

Technology similar to that used in the “recommendation engines” familiar to users of Amazon or Netflix – “If you like this, you will probably also like that…” proved equally useful to academic and scientific researchers. So much so, that Elsevier built it into its services such as its ScienceDirect platform.

This means that ScienceDirect is able to work out what a person is interested in based on all of their interactions with our tools, and then recommend other research which might be relevant, even if it comes from a completely different discipline. Olley says “We’re basically working out what someone is researching and recommending other things that are going to be helpful to them – we can say ‘here are three articles, or even three paragraphs or images, which have just been published which we think are very relevant to your research.’”  

From there, the next challenge was to put the same methodology to work within Elsevier’s own operations. 

“The interesting thing is, once you start getting good at this stuff, you realise how many opportunities there are, ” says Olley

“So, we built this technology into our products, aimed at nurses, doctors, academic researches, researchers based in large corporations such as pharmaceutical companies. But we also realised we could use it in our own operational processes.”     

In order to do this, Elsevier had to become a technology company and start developing technological solutions to its problems, in the manner that has made leaders of the tech world such as Google, Amazon and Facebook into masters of their domain. 

“My view is that when we look at organisations in 10 years, every department in the company will have an analytics team. I think organisations are starting with things like ‘analytics centres of excellence’ but I don’t think that’s where it ends, this needs to be a ubiquitous skill for people who really understand the problem they are trying to solve.

“Look at where we are now – you can get machine learning as a service from Amazon, Microsoft or Google. The technology is not the problem, it’s the data.

“Start by looking at your data, work out what problems you’ve got to solve, and work out what data you need to solve those problems. Today, data is becoming the real currency of the commercial world.” 

Elsevier has followed a model that is likely to become increasingly common, as we see businesses grounded in old-world technology and data systems make the switch to digitally-driven organisations.  

Increasingly, every aspect of business is becoming data driven, and putting the right tools and systems in place to convert the vast amount of knowledge into power - actionable insights - is key to successful digital transformation.   

One last piece of wisdom from Mr Olley which I certainly feel is very valuable – “Don’t feel like you have to solve 100% of the problem right away. If you’re trying to solve a problem, it’s fine if your machine learning algorithms can solve half of it, and you still have to send half of it to humans.”

One last piece of wisdom from Mr Olley which I certainly feel is very valuable – “Don’t feel like you have to solve 100% of the problem right away. If you’re trying to solve a problem, it’s fine if your machine learning algorithms can solve half of it, and you still have to send half of it to humans.”  



Written by

Bernard Marr

Bernard Marr is a bestselling author, keynote speaker, and advisor to companies and governments. He has worked with and advised many of the world's best-known organisations. LinkedIn has recently ranked Bernard as one of the top 10 Business Influencers in the world (in fact, No 5 - just behind Bill Gates and Richard Branson). He writes on the topics of intelligent business performance for various publications including Forbes, HuffPost, and LinkedIn Pulse. His blogs and SlideShare presentation have millions of readers.

Some Of Our Customers

Connect with Bernard Marr