In Search for a Unicorn.

Seungkwon (Alex) Son
10 min readMay 4, 2021
Creator: Paul Campbell | Credit: Getty Images/iStockphoto

Introduction

In the 21st century, technology enables every single second of our lives. Whether we’re studying for our exams, spending the night out, and everything in between, technology is involved in some way. Globally, it is estimated that 5 billion people have mobile devices. Keeping in mind the world population of 7.67 billion in 2021, this is an immense number.

It’s to the extent that humans are now living in at least two different realities: the physical and the digital. While technology has its fair share of ethical concerns, it is undebatable the amount of impact that it has had on every single one of our lives.

Despite all this, technology will only continue to advance and at an exponential rate. Each year, we witness the rise of one unicorn after another that goes on to be mainstays in every household. For example, Uber was founded a little more than ten years ago, but it wasn’t until the last few years that it became the primary mode of transportation for a large portion of the population. In this way, new technologies have an extremely high potential for impact and adoption.

Credit: Lean B2B

Perhaps the constituents with the most financial incentive to accelerate the adoption of technology are its investors, primarily venture capital funds. These funds take an early interest in startups and invest in driving their future growth in exchange for a stake in the company. The recent boom in the technology market is closely paralleled by a boom in VC funding. To illustrate, in Q1 2021, global VC investments reached $125 billion. This is a 50 percent increase quarter over quarter and a 94 percent increase year over year.

Thus, we as consumers, investors, and citizens of a technology world have a vested interest in knowing, what are the next big technologies will disrupt and enhance our lives?

Data

We will utilize data from CrunchBase, Kaggle, and Google Trends API to explore this question. Specifically, we will use data of over 196,000 startups that were listed on CrunchBase as of 2013 to explore the broad trends during the time and generate a predictive model for success. Google Trends API will be used to then show the search trend peaks.

Ultimately, we will explore technology trends from the perspective of the investors and the general public.

Broad Trends in the Startup Landscape

We will first explore a macro overview of the startup landscape and how it has been changing over the last few decades. This important to analyze in order to illustrate how fast technology has been shifting and how the funding landscape has changed.

Created using ggplot2

Here we see a histogram of the number of startups by their founding year starting from 1960 to 2013. The histogram is very left-skewed meaning that as we get closer to 2013 there are significantly more startups being founded. This is to be expected since technology is synonymous with startups and the growth of startups mirrors the advancement of technology.

Each bar in the histogram is also divided up by the proportion of startups founded in that year according to their current status: acquired, closed, IPO’d, or operating. This presents an interesting outlook on the startup landscape in that while the total number of startups founded is growing significantly each year, the large majority of them have not exited nor closed. This, in turn, raises further questions regarding the “operating” status of many of these startups.

Given the significant increase in the pure number of startups founded each year, we then wanted to explore the funding landscape as well.

Created using ggplot2

Above we see a histogram of the total amount raised in millions of dollars by the startups present in our list. Not present in the graph in order to preserve the scale of the rest of the numbers are the over 160,000 startups that had zero dollars in total funding. Thus, the large majority of startups actually do not or are not able to raise any venture capital dollars.

However, of the coveted startups that are able to raise funding, we see a fairly bell-shaped curve that is centered around $25 million. There are outliers along both extremes with some that have raised over a few billion dollars. Thus, we see that venture capital investment is also at its peak.

Top Sectors

It is clear that there are more startups are being found than ever before resulting in record-breaking venture capital dollars being spent. In order to examine what the next big technologies are going to look like, we need to dig deeper into the types of startups that are being found.

Some of the most hyped companies in recent years have been around the sectors of e-commerce, software, and autonomous vehicles. We will compare the media’s sentiment to the actual data regarding startups.

Created using ggplot2

Shown is a rotated bar chart of the Top 25 Sectors by the Number of Startups Founded in those sectors in our dataset. The top five sectors according to this metric are software, web, e-commerce, video games, and mobile. This matches the earlier hypothesis on the “most hyped” technology sectors from the media.

However, it is unclear where the causation lies. It may be the case that there are more startups in these sectors because these sectors are the most promising, but it may also be that there are more startups in these sectors because of the perceived “hype.” We can dig deeper into the roots by looking at where VC funding is going.

Created using ggplot2

This is an alternative view on the Top 25 sectors using the metric of the average amount of VC funding that is going to them. Working off the assumption that investors are likely the most educated on future trends, we see that the top five sectors are quite different. Specifically, the top five sectors according to this metric are nanotech, cleantech, semiconductor, biotech, and automotive.

When trying to rationalize these sectors, there are clear arguments presented as well. For example, it has been widely reported that there is a global semiconductor shortage that will last for the foreseeable future and impact every modern industry. This is because semiconductor chips make up the backbone of computers, mobile devices, automobiles, and more.

Ultimately, we argue that looking at average funding is a more predictive metric for which sectors and companies will be important going into the future. This will be explored more below.

IPOs and Predicting Startup Exits

Credit: CNBC

There are many ways for startups to be considered “successful.” They can receive a high valuation as a result of a new funding round. They can grow x% year over year. They can even be acquired by another company for millions of dollars. However, perhaps the most coveted exit of a startup is the Initial Public Offering.

An Initial Public Offering, or IPO, is when a private company lists itself on a public exchange for its stocks to be traded by anyone eligible. For technology startups, it is usually the Nasdaq exchange on which they list their stocks. This is arguably the most coveted exit because of the financial fallout from an IPO as well as the public perception of the company following it. An IPO represents longevity, acceptance into the “big leagues”, and retained ownership of the company.

Thus, we will try to use simple methods of machine learning, specifically logistic regression, in order to predict which startups are most likely to IPO. Given the data available, the most important measure we tried to explore was whether VC funding was an important predictor of IPO likelihood. Finding that this is, indeed, an important factor would corroborate our earlier hypothesis on the top sectors as well.

Analyzed using Logistic Regression

Shown above is the logistic regression model that I ran along with the results. The dependent variable that we were trying to predict was ipo_id, which takes the value 1 if a company had an IPO and 0 if a company did not. The independent variables that we tried to predict ipo_id on were as follows:

year(founded_at) = the year that the startup was founded
funding_rounds = the number of funding rounds
funding_total_usd = the total $ of funding
milestones = the number of performance metrics reached
relationship = the number of key employees

Ultimately, every variable except funding_rounds was significant at the
p < 0.001 level with funding_rounds being significant at the
p < 0.01 level.

Furthermore, a higher value for every variable except year(founded_at) led to a higher likelihood of IPO, whereas a higher value for year(founded_at) led to a lower likelihood of IPO. This means that the younger the startup, the less likely it is to IPO.

Finally, when examining the accuracy of this model, the model was accurate at predicting IPO 99.46% of the time on the train data and 99.45% on the test data. However, it is important to note that when we compare this to assigning a value of 0 to every startup’s ipo_id, the accuracy is 99.08%. This is due to the fact of a startup in that very, very few startups actually are able to IPO.

Using Total VC Funding Amount as Proxy for Success

As we saw in the section about the Top 25 Sectors by Average Funding and the logistic regression model, it is clear that VC funding is an important proxy for future startup success. In other words, the most successful startups tend to be venture-backed.

We now wanted to take a step back and examine what the top venture-backed startups were in our dataset and compare that to how their trending in 2021.

Created using knitr::kable()

Shown above is a chart of the Top 25 Startups according to the Total Venture Funding that it has received. Most notable are Verizon, Facebook, Twitter, Groupon, and SurveyMonkey in terms of household names in 2021. However, there are many names that may be less familiar like Clearwire, Carestream, Solyndra, and others.

Given the significant bets that were made on these companies, we wanted to look at how they have trended since by using perhaps an unconventional metric: Google Trends Data.

Created using gtrendsR API with Base R plot and lowess smoothing

While Google Trends data is not by any means an objective measure of the success of a startup, it does offer a unique perspective on its performance regarding the general populace’s perspective. Because the focus of this project is on discovering those companies that will impact everyone’s daily lives, we felt that Google Trends data would offer a lens into this.

Each graph shows the relative hits over time for a specific term with a lowess smoothing curve that shows the general shape of the points. Relative hits means that the movement of the graph refers to how popular this search term was at a given time relative to its historical popularity.

The top two graphs are for the term “Clearwire” and “Verizon” which were the top two most funded companies in our dataset. Overall, the Google Trends data shows that these companies peaked in Google search hits in 2010 with its search hits decreasing over time. Ultimately, it shows that these companies have trailed off in its search popularity.

However, the bottom two graphs are for the term “Twitter” and “sigmacare” also two companies that were on top 10 most funded companies. These companies are those that still remain very popular in terms of relative hits in the present with Twitter reaching another peak in present times and SigmaCare having a constant upwards slope. SigmaCare is an electronic health records vendor that was acquired in 2017.

Ultimately, these graphs give an interesting representation of the longevity of companies in terms of the public limelight that they receive. However, Google Trends data does not show the actual relevance of a business because the general populace is driven by many different influences.

Conclusion

Ultimately, in 2021, the conversation of the next big technology is an ever-evolving one. It is an incredibly provoking idea to think about how our society will change in the future with every single person in the world being a stakeholder in this question.

While VC investors may hold the throne in discovering the next startup to fund, it is ultimately up to us to create these startups.

Data science provides one lens into discovering the trends of the future, and it proves to be an interesting outlook.

Seungkwon (Alex) Son is a junior at the University of Pennsylvania studying Behavioral Economics and Psychology. This project was completed in part for Professor Tambe’s OIDD 245: Analytics and the Digital Economy course.

Credit: TechCrunch

--

--

Seungkwon (Alex) Son

Student at Penn studying Behavioral Economics and Psychology. Interested in developing technologies that enhance our lives!