What’s driving open source software in 2019
Cloud native, AI/ML, and data tools and topics are areas of emphasis for the O’Reilly Open Source Software Conference.
Virtually every impactful socio-technical transformation of the last 20 years—Web 2.0, DevOps, cloud, big data, artificial intelligence (AI), and machine learning (ML)—is encoded in the record of speaker proposals from the O’Reilly Open Source Software Conference (OSCON). This record doesn’t merely reflect the salience of these and other trends, it anticipates this salience, sometimes by several years.
A variety of qualitative and quantitative signals have already shown us that cloud native, data, and AI/ML are important drivers of open source software. That’s why we made these topics the pillars of OSCON 2019, and it’s why the call for speakers highlighted these areas. We were curious to see what the proposal data could tell us about the evolution of these topic spaces. How are speakers approaching and interpreting them? What tools are ascendant? What can we anticipate as trends to watch and pay attention to?
Our recent analysis of speaker proposals from the 2017-2019 editions of OSCON[1] yielded several intriguing findings:
- We see cloud native gaining traction for open source developers to help promote resilience, scaling, availability, and improved responsiveness. The shift to a cloud native paradigm brings new challenges, new tools, and new practices for developers to master.
- Results from our ranking of proposal phrases show the centrality of data to the open source community: “data” (the No. 5 term) outpacing “code” (the No. 14 term), the rise in AI/ML topics, and in the nascent cloud native paradigm where monitoring and analytics assume critical importance—highlighting the demand for skills in analytics, data acquisition, etc.
- AI and ML posted big year-over-year jumps in the OSCON proposals, with the focus shifting from exploration to operationalizing the technology—driving the need for AI/ML skills as well as expertise in a constellation of adjacent technologies such as automation, monitoring, data preparation, and integration.
- The rise of cloud native, data centrality, and the surge in AI/ML topics are coalescing around a single entity: the customer—or more precisely, the customer experience.
The following table shows topics of interest from our analysis of speaker proposals from the 2017-2019 editions of OSCON. We used a form of the Term Frequency-Inverse Document Frequency (TF/IDF) technique to identify and rank the top terms.
We focused this list on important industry terms and terms showing notable year-over-year changes. We omitted individual items like “open” and “source” (these were the No. 1- and No. 2-ranked words, but the combined term “open source” had more relevance to the analysis).
Clarity on cloud native
It’s puzzling: Kubernetes, microservices, Docker, and containers all declined in 2019 proposals. There’s no reason to believe any of these technologies actually declined in importance, however. Google searches using the terms “Docker” and/or “containers” are up slightly, year-over-year; “microservices” is down just a tick, and “Kubernetes” is up just a tick.
More likely, the decline indicates a change in the way developers, architects, and other practitioners approach and understand these subjects—in a more holistic encompassing way. The problem space has shifted away from an individual focus on Kubernetes, for orchestration, containers, microservices, and related technologies—principally, via education and demonstration—to an emphasis on using them together to support availability, scalability, and developer productivity. The term “cloud native” is understood well enough to encapsulate these individual technologies without needing to specify the cloud native components by name.
We see, and have noted, the big increases in attention to cloud native topics across all of the O’Reilly conferences and on our online learning platform. Based on a deeper dive into both quantitative and qualitative analysis of the space, we have aggregated concepts from cloud native and other topics into a theme we call Next Architecture. At a high level, Next Architecture describes the intersection of key architectural innovations (microservice architecture, serverless architecture, service mesh architecture) with container virtualization, service orchestration, and other technologies core to the cloud native paradigm. The overarching goals of those embracing Next Architecture are two-fold: first, to promote an improved overall service experience for customers and, second, to permit greater architectural flexibility and resilience for organizations.
The term “cloud native” climbed by 64 positions to No. 49 in the 2019 OSCON proposals. This follows a significant gain in 2018 (up 457 places) and marks an increase of 521 places from 2017. The related term “cloud native application” shot up 676 places in 2019 to No. 427; it’s up 1,174 places since 2017.
The ascent of cloud native gives context to the slight year-over-year drop recorded for the term “Kubernetes” as well as the long-term decline (2017 through 2019) of the term “microservices.” Individually, Kubernetes, containers, and microservices still constitute sites of ongoing innovation, but the shift to a cloud native paradigm brings with it a new set of challenges as well as a different landscape of technologies and practices. In order to accommodate this shift, the onus is on developers, architects, and other practitioners to acquire new knowledge and master new skills. This, too, is borne out in the proposals.
For example, Helm, a package manager for Kubernetes, is faring just fine: the term “Helm” jumped 247 places in proposals to No. 432. Similarly, the term “Knative”—a Kubernetes-based platform optimized for serverless workloads—appeared for the first time in proposals this year at No. 527. Another Kubernetes-related term, “Kubeflow,” also made its first appearance in 2019 at No. 610. The term “serverless”—although not exclusive to cloud native architecture—moved up 11 slots (to No. 21), following a gain of 27 places in 2018. It now seems poised to crack the top 20.
Finally, Apache Pulsar, a pub-sub messaging system, has the makings of a new star. The term “Pulsar” climbed 606 positions to No. 207 in 2019 proposals. The essentially identical term “Apache Pulsar” shot up 535 places to No. 394. This shouldn’t be surprising: Apache promoted Pulsar from incubator to top-level project status last September. Pulsar may end up the messaging solution of choice for the cloud native ecosystem. We plan on closely monitoring its adoption and popularity.
An emphasis on architecture
The term “architecture” is in the midst of a moment: it ticked up 10 places in this year’s proposals to No. 26. Similarly, “microservices architecture” rose 242 places to No. 651, while “software architecture” surged 2,385 places to No. 938 after being completely absent from 2017’s proposals. “Serverless architecture,” unchanged year-over-year at No. 981, is nonetheless up 743 places from 2017.
A cluster of related terms saw significant uptick. The term “mesh” increased slightly over 2018 (up 11 places to No. 170), following a massive spike (5,549 places) between 2017 and 2018. In 2017, conversely, not a single proposal used the term. Similarly, “service mesh,” at No. 183, is up 34 places over 2018; we found no mentions of “service mesh” in the 2017 OSCON proposals. Finally, “Istio”—a service mesh layer for Kubernetes, Mesos, and other technologies—climbed 67 places in 2019, to No. 156; it was completely absent from the 2017 speaker proposals.
Even though terms such as “mesh” or “service mesh” do not automatically imply architecture, they are core features of the Next Architecture that knit together the cloud native paradigm and its many moving parts. A service mesh, for example, provides an architectural substrate for orchestrating microservices. In theory, it also permits organizations to build and orchestrate very fine-grained microservices that (in hewing to Unix design theory) do one thing—and do it well. These microservices are fine-grained in the sense that they generalize a specific function (a credit check, for example) and do not incorporate additional domain-specific logic. This decouples them from domain-specific dependencies and makes it possible for organizations to build architectures that are at once flexible and, potentially, more resilient.
The centrality of data to open source
“Data” wasn’t the top term in speaker proposals—OSCON is the “open source” conference, after all—but it might as well have been. “Data” at No. 5 outpaced “code” at No. 14, even though the latter moved up six slots in the overall rankings. If this doesn’t surprise you, it should.
The knock against software engineering is that it gives short shrift to the priorities and vicissitudes of data management—i.e., how data is generated, managed, changed, and, inevitably, purged. Developers grok code, not data. That’s the conventional wisdom, at least.
But the conventional wisdom is wrong. That “data” held fast at No. 5 year-over-year is a testament to the role it plays in the applied, empirical work of developers, architects, and other practitioners. Data isn’t a passive, inert quantity that is acted upon or manipulated by code; rather, it constitutes an active, positive quantity in its own right. Data is, in a sense, code’s raison d’etre.
The growth of AI and ML attests to this. In the first place, data collection and analysis is a prerequisite to efforts to understand or improve the customer experience, be it via the use of traditional analytics or via advanced techniques such as AI and ML. In the second place, the new cloud native paradigm, and, especially, the emphasis it places on automating the orchestration of finely grained microservices, ups the stakes significantly.
There is a growing need for diagnostic solutions that are capable of monitoring, detecting, and (via ML-driven automation) correcting application performance or availability issues. This helps account for the uptick in the term “monitoring” in proposals, which climbed 42 positions to No. 231. A more concrete example is that of Apache Prometheus, a monitoring and analytics platform. The term “Prometheus” climbed from No. 1,643 in 2017 proposals to No. 386 in 2019 proposals. Prometheus isn’t technically a data management or analytics tool. Nevertheless, its trajectory in proposals is consistent with a growing emphasis on data collection and analysis.
This new data-driven trend is attested in other ways, too. A cluster of terms relating to data acquisition, integration, management, and analysis trended upward between 2018 and 2019. These include “Kafka” (No. 164; +41 positions), “streaming” (No. 239; +48), “Spark” (No. 431; +11), and “SQL” (No. 511; +151).
AI won’t be denied
The frequency of the terms “AI” (No. 22) and “ML” (No. 18) increased significantly in the 2019 OSCON proposals. If combined, the AI/ML topic would shoot up to the No. 5 slot, and it would be the first tool-based topic in the rankings, up 11 positions since 2018 and 114 positions since 2017. This follows two years of exponential growth for the term “AI,” especially, which increased by 2,024 positions between 2017 and 2018. “ML,” meanwhile, is up by 133 positions since 2017. Given the role that open source development, collaboration, and education has played in the mainstream diffusion of AI/ML tools and techniques, this is not surprising.
Strangely, however, a cluster of terms that relates to AI and ML fared less well in 2019. “Deep learning,” for example, declined by 27 positions this year (to No. 144), following a surge of 2,075 places in 2018. Similarly, “neural networks” dropped a staggering 1,538 slots (to No. 3,576), after climbing 3,692 places in 2018. Other related terms saw huge declines, too. “Natural language processing” declined by 434 positions (to No. 1,687), while “NLP” plummeted 1,347 places (to No. 3,031); both terms had skyrocketed up the rankings in the 2018 proposals. The generic term “natural language” improved this year (climbing 78 places to No. 802), following even stronger growth in 2018. The term “TensorFlow,” by contrast, fell 75 positions in 2019 to No. 288 after a surge in 2018, when it spiked 1,866 positions.
This invites a question: why are bellwether AI- and ML-specific terms trending downward even as their generic parents continue their climb up the ranking?
One obvious explanation is noise: we’re working with three years’ worth of data, after all.
Another explanation is that—as with containers and microservices—the focus of problem solving is changing. The types of AI and ML use cases that are, in general, associated with bleeding-edge and early adoption (e.g., one-off and skunkworks projects, prototypes, proofs-of-concept) are giving way to solution-oriented use cases. In other words, to use cases that are consistent with the aims and purposes of mature adoption.
The result is that the focus has shifted from the specifics of technology implementation to the problem of operationalizing AI and ML, for example, by deploying “weak” AI services that are designed to perform specific functions or tasks. This in turn drives the need for AI- and ML-related knowledge and programming skills as well as for expertise in a constellation of supporting or adjacent technologies, such as automation, monitoring, data ingestion and integration, and so on.
This is borne out by spikes in generic AI/ML-related terms in this year’s proposals. The term “ML models” grew by 5,146 places in 2019, climbing to No. 469; by way of contrast, not a single proposal used the term in 2017. The even more generic term “models”—which admittedly is not specifically related to ML—climbed 32 positions to No. 37. Or consider data science, which includes the applications of AI/ML techniques to specific use cases. The term “data science” continues to ascend the rankings, jumping 85 slots in 2019 (to No. 277), and up by 315 slots since 2017.
The larger point is that all of these technologies and techniques are as important as ever. Interest in TensorFlow, for example, is empirically not declining any more than interest in Docker or microservices is declining. The focus of the problem space has shifted, that’s all.
A new focus on the customer
Developers, architects, designers—and the organizations that employ them—are focusing their efforts on improving the customer experience. This shift manifests itself in a variety of ways.
The most important is via an emphasis on delivering products and services that better align with the needs, expectations, and priorities of customers. The “customer” isn’t a monolith, however. There’s the external customer—i.e., the person or entity who purchases goods and services. But there’s also the internal customer—i.e., the line-of-business customer, the consumer (sometimes one of several consumers) of IT services. Neither of these customer entities is monolithic, either: they tend to segment into clusters based on the needs, characteristics, demographics, attitudes, etc., of the people who comprise them. Another way of putting this is to say that it isn’t so much turtles but customers all the way down.
Organizations believe if they can better understand who their customers are, they can better respond to and anticipate their needs. This helps explain the stratospheric rise of AI and ML in OSCON proposals, which also owes much to the interpenetration of digital technology with human life. Our use of software, services, and devices generates granular data about how we use them—and, less benignly, about us, too. All of this data serves as grist for customer-focused analytics. It contributes to new research into human-computer interaction. It is already changing the way companies (re)design and market products and services.
This new emphasis on the customer experience is borne out in the OSCON proposals data set. For example, a cluster of terms that relates to user experience (UX) trended upward in 2019: the term “UX” itself surged 1,763 slots to sit at No. 937. This might not seem like much, but “UX” came in at No. 2,700 in 2018’s tally. The term “user experience” sits at No. 1,344—down 101 positions from last year. This is probably a correction, however: in 2018, “user experience” jumped by 245 places. For the record, the generic term “experience” also improved year-over-year, climbing 23 positions to No. 109 in the ranking.
Elsewhere, terms such as “design” (No. 62; +40 positions between 2018 and 2019) and “designed” (No. 491; +67 positions), which are weakly linked with customer experience, moved up the ranking. So did the term “interactive,” which jumped 309 slots in 2019 to No. 761 and is up more than 630 slots since 2017. “Empathy,” a term that occurs frequently in the context of UX design, increased by 29 positions (to No. 2,468) in 2019, following a decline between 2017 and 2018. That isn’t all. The term “human” surged by 390 positions to No. 417 in 2019; it posted a triple-digit gain in 2018, too. Similarly, “ethical” has been coming on strong. It saw a tiny uptick in 2019 (+31 positions, to No. 2,465), but has surged by more than 3,000 positions since 2017.
A focus on the customer experience is shown in other ways, too. Microservices, serverless, the emergence of a new cloud native paradigm: none of these things are happening in a vacuum. Yes, they align with the needs and priorities of developers, architects, and other IT practitioners, but the customer is neither an afterthought nor a marginal constituency in this innovation. Rather, the customer experience is a thematic part of each of these trends. It isn’t just that microservices are a key component of (for example) an architectural paradigm that promises superior flexibility and resilience; it is that these same benefits will be passed on to the customer in the form of improved responsiveness, availability, performance, etc.
A supporting role for programming languages
A decade and more ago, OSCON was filled with talks and conversations focused on programming languages as primary topics. Times have changed—the OSCON focus has moved up the stack, paying more attention to data, architecture, AI/ML, and customer experience.
The first language that shows up in the 2019 proposals, Python, clocks in as the 39th ranked term (up 47 positions over 2018), with Java next, ranked No. 59 (up 97 positions). Further down the list we find up-and-coming languages like Rust (No. 152, up 314 positions) and Kotlin (No. 236, up 380 places). Languages still matter, and we see some recovery in language interest at OSCON, but they have taken on more of a supporting role as tools in the service of creating meaningful applications, services, and experiences.
Concluding thoughts
The pace of innovation at OSCON mirrors that of open source itself: it is forever moiling, roiling, and seething. Terms and topics that came out of nowhere to skyrocket up the ranking this year could easily fall away into oblivion (or, at any rate, to the lowest tier of results) in 2020. As we’ve seen, however, certain themes do have staying power.
A focus on new technologies, techniques, and, especially, problem spaces is always a dominant theme in OSCON proposals, as is an emphasis on problem solving in general. This was a defining characteristic of the very first OSCON, and it is no less definitive of OSCON 20 years later. Also unchanged is OSCON’s essential identity as a community of attendees: a place to discover and learn, connect and interact, and, in all cases, to forge impactful, transformative relationships. It is no accident that the term “community” occupies the same place in the ranking (No. 7) as it did in 2017. And it doesn’t take a disruption-predicting model to prophesy its appearance in the top tier of OSCON 2029 proposals, either.
Special thanks to Phil Harvey, Mark Madsen, Jaimie Murdock, and Catfish Murdock for comments, questions, elaborations, and (a surfeit of) criticisms.