• Blog

We invested in Import.io through our Growth EIS

Find out more

Import.io: Investing in the future of data

In January 2016 Oxford Capital announced its investment in Import.io. This company has created a software platform (SaaS, or Software as a Service) that can collate vast quantities of unstructured data from across the internet, and turn it into datasets that can be more easily analysed and interpreted at scale. We invested alongside co-lead Imperial Innovations, Wellington and Open Ocean in a $13m Series A round that represents the largest European web data investment to date.

So what is all this fuss about data, in particular big data, and why have we invested in the sector?

So what are we talking about…what is “data”?
Put very simply, data is a collection of facts (numbers, words, measurements, observations, etc) that has been translated into a form that computers can process – information that is capable of being read by a machine as opposed to a human.

This distinction is important. Human-readable (also known as unstructured data) refers to information that only humans can interpret, such as an image or the meaning of a block of text. If it requires a person to interpret it, that information is human-readable.

Machine-readable (or structured data) refers to information that computer programs can process. A program is a set of instructions for manipulating data, and when we take data and apply a set of programs, we get software. In order for a program to perform instructions on data, that data must have some kind of uniform structure.

Data is created whenever information is recorded in a machine readable, or structured form. There are many ways this could happen, be it personal data given to companies when an online form is filled or preferences are expressed through social media; transactional data created following an interaction such as clicking on an ad, making a purchase, visiting a certain web page, submitting instructions etc.; or sensor data produced by objects (and often referred to as the Internet of Things).

“Big data”…grown up data?
Technically all of types of data, and not just web data, contribute to Big Data. There’s no official size that makes data “big”. The term simply represents the increasing amount and the varied types of data that is now being collected, perhaps most frequently used to describe the kind of transactional data collected by the likes of Facebook and Google to understand the intentions of their customers and help target more effective digital advertising.

As more and more of the world’s information moves online and becomes digitised, it means that analysts can start to use it as data. Things like social media, online books, music, videos and the increased amount of sensors have all added to the astounding increase in the amount of data that has become available for analysis (see figure 1 also below).

The thing that differentiates Big Data from the “regular data” we were analysing before is that the tools we use to collect, store and analyse it have had to change to accommodate the increase in size and complexity. With the latest tools on the market, we no longer have to rely on sampling. Instead, we can process datasets in their entirety and gain a far more complete picture of the world around us.

So why web data in particular?
Web data is a collective term that refers to any type of data that might be pulled from the internet. That might be data on what competitors are selling, published government data, sports data, news, etc. It’s a catchall for literally anything that can be found on the web that is public facing (ie not stored in some internal database).

Web data is important because it’s one of the major ways businesses can access information that isn’t generated by themselves. When creating business models and making important Business Intelligence (BI) decisions, businesses need information on what is happening internally and externally within their organization and what is happening in the wider market.

Web data can be used to monitor competitors, track potential customers, keep track of channel partners, generate leads, build apps, and much more. It’s uses are still being discovered as the technology for turning unstructured data into structured data improves.

Web data can be collected by writing web scrapers to collect it, using a scraping tool, or by paying a third party to do the scraping for you. A web scraper is a computer program that takes a URL as an input and pulls the data out in a structured format – usually a JSON feed or CSV.

So where is the opportunity?
As the sophistication and adoption of big data analytics grows there is an increasing demand for data that is process ready. A recent CapGemini report suggested that 65% of businesses believe they will become uncompetitive if they don’t embrace dig data driven analytics – data is the modern day “oil” and is required for success. As more enterprises adopt big data, new sources of data will be required to continue to derive advantage, regardless of sector or vertical.

Import io image

Fig 1. An exponential growth in unstructured data
Exponential growth of big data
The web as a source of big data offers an unrivalled opportunity to access intelligence that is unavailable elsewhere. 250m global websites contain vast amounts of data about people, things and places, and real time transactional intelligence. Significant competitive advantage can be gained through collecting and analysing this data.

However, extracting high quality, enterprise-ready data from the web at scale in a complete and timely way is a difficult problem precisely because the web is designed to be read by humans and not by machines. The challenges are significant, for example constructing a learning algorithm to support changing semantic relationships between the elements of a given page, or scaling processing capability economically and embracing the challenges of latency, storage and pattern recognition. All this requires a significant investment in data science and scalable infrastructure.

Import.io is a SaaS software platform that makes it easy for enterprises to access and utilise this data. We believe that Import are at the leading edge of the adoption of the web as a source of competitive data. The opportunities for its application are wide ranging, and we are excited for the future.