Over the past year, the major players in technology have been trying to jump start their Artificial Intelligence (AI) efforts. As AI has become the cornerstone technology for the next generation technology company , the largest companies decided to move faster by Open Sourcing their core technology. Google, IBM, Microsoft, Facebook and Amazon have joined the race to open up their code. And Google has emerged as the clear winner at this point. (Data at the end of this article)
First of all why have these companies Open Sourced an asset that will become a core piece of many of their future offerings? Today time to market and the largest community of developers define success. In Artificial Intelligence/Machine Learning, time to make and largest adoption are primary drivers of success. Open Sourcing your code creates a community of developers and companies committed to helping you move your code forward. Plus these companies get the added advantage of hundreds of new developers contributing to making their code better with their diverse thoughts on the code as well as developing features faster than these companies can accomplish on their own. As you can see at the end of the article, rapid time to market and many developers and companies using their code has tapped by Open Sourcing their implementations.
So far surprisingly Google and Microsoft have built the largest Open Source communities. These efforts appear to be taking off with large communities and becoming the cornerstone of these companies AI efforts. And Google is winning by having 5 times as many developers/companies using their product than the rest of the industry combined as measured by “forks” of their code (16,643 forks for Google vs 4,458 forks for the rest of the big companies – Amazon, Microsoft,IBM, Facebook). These “forks” mean that developers have taken the code for themselves either as an individual coder or for an entire company or startup. I have judged two startup competitions where multiple startups were based on Google’s Tensor Flow. For most of them they abandoned their own code and adopted Google’s as a way of getting an application of AI to market sooner than they could on their own. Overall Google is getting their AI embedded everywhere and becoming the standard by which AI is measured.
And perhaps more important than projects basing themselves on the Google code, Google is getting the AI developers to work on the code, as twice as many coders are contributing their work to the Tensor Flow project than the rest of the industry combined. (493 accepted contributors for Google’s Tensor Flow vs 275 for all of the others combined) What this means is Google has the hearts and minds of the best AI developers and they increasingly are jumping into the Google code, not their competitor. Google can hire developers from these contributors that are already familiar with their code and immediately have them productive. In case you think this number is too small of a set of developers to make a difference, think of these 750 developers as the tip of the iceberg. For every developer contributing to an Open Source effort, there are dozens supporting their work or working on proprietary versions at these companies. And with 16,643 forks, the “other” company effect is huge employing thousand of AI developers. Bloomberg picked up on how Google is racing ahead of the competition with this story:
The result is Google is building another lead so large that many of their competitors will not be able to catch up. With the hearts and minds of the industry developing code that Google can reuse, Google can be expected to get to breakthroughs faster and with better quality that their peers. Now I know there will be successes the others will introduce, but overall Google at this point has the breadth of codebase and the developers to match any individual breakthroughs quickly.
Google kicked off the AI Open Source rush by open up their acquired company Deep Minds’s Tensor Flow. This has become the bedrock of many AI startup companies codebase. But why Tensor Flow, when the other companies quickly put a codebase out to the market? Well for one initially Tensor Flow was not very good. In fact I saw some initial presentations by Deep Minds from Google and they internally at Google called it “Tensor Slow”. Why then have they been winning the hearts and minds of the developers? Google iterated very rapidly and made several corpus’s of data available to test and tune with. The developers flocked to Tensor Flow and with Google’s help have made it the best codebase very rapidly with over 10,000 code contributions in a year. If anything Google has the data to experiment with. Most of that seemed to be “talk to me after this presentation and I will get you data” types of deals, but I know several companies that took advantage of Google’s test data, it worked well. Test data drove many early developers to work on Tensor Flow over other choices. So the code went from being so-so to being good rapidly as developers flocked to it and companies based their AI code on it. Twitter, Uber, Snapchat, Airbus and other large companies have joined in contributing. Momentum rules the day today for Tensor Flow.
What about IBM and Watson? What has happened with this product? With IBM Watson advertising dominating US AI advertising, this has peaked US buyers interest in AI, but not developer interest. IBM sees Watson as the bedrock of the “new IBM” and has kept most of its technology “behind the curtain”. Hence the Machine Learning Artificial Intelligence IBM Open Sourced came from IBM Research, not IBM Watson. To develop on Watson you have had to sign an agreement with IBM. They offer a fairly extensive set of API’s without the agreement, but it it hard to see “how” the data has reached its conclusion and developers are wary of “magic” that happens without knowing how the conclusion was reached. Plus Watson itself is very expensive versus the core Open Source foundational technology being free to experiment with. As a result, those companies who need a rapid solution to market use Watson. But those who want a customized solution that the developers can tune and see how AI conclusions were made, use the Open Source solutions like Google’s Tensor Flow. This has meant that Watson is used in big company’s implementations, companies wanting a prebuilt solution without seeing the underlying reasoning and those who can afford Watson. The rest of the tech world use the Open Source implementations as their AI foundation.
So far the AI Open Sourced projects have accelerated the rate of change in the AI market. The question is has this helped the AI industry move past its promise of the past into a meaningful technology for today? The greater question is will this control of the AI market be another market dominated by Google? If Google’s technology dominates the market like Android does the mobile market will this make Google the default AI tools company. What does that say about our data, and our decisions in the future? Facebook came user criticism last week for influencing the US election. What if a company controls the articficial intelligence than many of us use, will that lead to manipulation of the AI users? Lots of questions will be debated for years to come.
Obviously I barely touched on the AI subject. But for Open Sourcing codebases, I think this is an area rarely discussed and compared. I welcome your comments.
If you want to see for yourself, compare the various projects:
IBM Research opened up their System ML, though interestingly this is not part of the Machine Learning/AI Watson system. They made it part of the Apache Open Source Foundation at Apache System ML. Here is their Github IBM System ML with 3,340 commits, 23 contributors with 121 forks
Amazon opened us their machine learning behind their recommendation system. Amazon Labs Github has DSSTNE with 141 commits, 20 contributors with 536 forks
Facebook took a different route by using the already established Torch Machine Learning framework. Currently maintained by Facebook, Twitter and the Google Deepmind group. Facebook has been making extensions to Torch as well as meaningful contributions to the core codebase.
Torch Github with 1,001 commits and 108 contributors with 1,688 forks
Facebook Torch Github for extensions 97 commits and 11 contributors with 238 forks