Machine Learning Platforms

Originally aired:

About the Session

Machine Learning is clearly here to stay. While it is a far cry from actual Artificial Intelligence, it provides many invaluable and remarkable ways to learn from the data we are collecting about our customers, products and daily activities. The past afforded us machine learning libraries which became machine learning frameworks. Now, we are designing and building machine learning platforms that facilitate entire initiatives in reusable and extensible ways. We will discuss many of the drivers of modern machine learning systems and the platforms that we are seeing emerge.


Alright, I'm going to go ahead and get started. Thank you very much for attending and thank you for the great turnout to the show. This is exciting, that during these crazy times, we can all still get together and talk about technology. I'd like to bring greetings from Northern California. I live just outside of Sacramento and I've only been to India a couple of times, but I've enjoyed both times and I hope someday to come back and meet some of you in person. So, welcome very much.

So, today I'm going to talk about machine learning platforms, not necessarily machine learning in and of itself, but more the process of taking those ideas and producing some kind of capability with them.

A lot of people will ask me what language they should learn, and it's really more about the architecture. It's more about thinking about what this deployment is going to look like, so that's what I want to focus, but we'll spend a little bit of time talking about machine learning background in here as well. Feel free to toss any questions into the chat room and I will try to address them as I can.

A little bit about my background, I've been developing software for about 26 years now. I was first exposed to the idea of machine learning 20 years ago, when I worked for a company called Parabon and we were building a platform to use idle time on computers all around the world, specifically to make it possible to do very computationally intensive things for a relatively low cost. We would basically rent time on people's unused computers and then sort of repackaged that to do machine learning kinds of things, so we were doing exhaustive reggression systems and genetic algorithm-based feature selection, among other things. That was my first real exposure to the idea that machine learning isn't just about the algorithms or even the data, it's also about how can you get the work done as efficiently as possible.

One of the take home messages that I hope you accept and absorb today is that our industry is going to be very much influenced by hardware developments as well. We've got these growth curves in the data space, and the only real way that we can deal with those is to do things in parallel.

That involves using clusters in the cloud. That involves using GPUs and custom hardware like tensor processing units and things like that. But the other part of this is that we are dealing with more and more opportunities where we want to do some kind of machine learning on a device like this.

In this world of learning from your data and learning from the things that are occurring around you to exploit those, we're going to have to figure out: Where do we put our data? Where do we put our code? Because we have geographically distributed heterogeneous complex architectures.

We want it to be low latency, high interactivity, but we also want it to be safe and private and accessible as well.

Basically, this is a great summary from a book called “Doing Data Science”, by Cathy O'Neil and Rachel Schutt. They basically highlight the fact that machine learning isn't done in a vacuum. It's done as part of a larger data science activity.

That involves collecting information from the world. That involves cleaning it up and making it prepared to learn from. If you understand your data well, then you can maybe make a presentation about it. But if you don't understand it, then we might have to cycle back through it a couple of times and figure out what kind of signal is there in the data that we can learn from.

And once you gain some comfort with that, then you'll want to perhaps go through a more formal machine learning hyper parameter tuning operation. And until about 10 or 15 years ago, most machine learning systems would get you that far. You would have a training step that would produce a model that you could then do something with.

But at the company that I mentioned our chief analytical officer was an economist with a PhD, and he would produce these models in Excel or Visual Basic or something, and then we would have to hand code the models to run on our Java-based platform. And that's just not really acceptable anymore.

We want to be able to take the models that we're producing and roll them into production. But the thing we have to remember is we have two basic outcomes with the machine learning system. One is to tell a story and try to communicate to somebody why they should do something, and the other is to turn it into a data product, like a recommendation engine or something like that.

And we have to keep in mind: if the data model is so big that we can't push it down to a phone or a tablet or something, then there's going to be the need to deal with calls back into the cloud, into cloud based systems. And then I could have issues with respect to the data going back and forth in terms of latency.

So we're also seeing people moving some of the computation, not up in the cloud, not on the phone, but kind of at the edge between in these areas and companies like Fastly and CloudFlare are building new platforms for allowing you to push code out, and the code itself will be distributed, kind of like static files are with a content delivery network.

To successfully roll out a machine learning activity, we want to make it smooth from the training process into the deployment process and in the update process, because if you're very successful, then your model will be useful. But even then, your data is likely to change, right? If you're trying to analyze sales activities or customer activities over time, those numbers will change. We want the model to be able to be updated, to reflect those changes in numbers.

Sometimes things change very quickly, like what happened to a lot of supply chain with respect to the onset of this global pandemic, right? We had supply chains that were feeding information. We were learning from the demand and we were trying to keep inventory in response to that. And when there is a disruption, or things change radically, we need to be able to move quickly. The goal in building a machine learning platform is to be able to get out the value that's there, but also to be able to zoom in as well to what's happening around you.

Here's one of the big issues that we're dealing with. We have data that's growing at a rate that far surpasses both our capacity to train new people to deal with it, but also, it's growing faster than hardware is getting faster.

For a long time, we had this period of growth, where if your software was slow, you just had to wait 8 to 18 to 24 months, and the extra density of transistors and resistors and things like that in our silicon could translate to faster computers.

Around the mid-2000s that kind of stopped, and we'll talk a little bit about that in a moment, but the point is we're going to have to rely on software to help us deal with this explosive growth, and we're also gonna have to rely on hardware, and if you can't get hardware access to things to accelerate, then that makes it difficult.

So, if you're trying to push a machine learning capability down to the browser, say, or to a phone or a device, we have to have some kind of hardware acceleration at that end as well to be able to respond to it.

So, this is a slide or a diagram from a blog article by a gentleman named Herb Sutter, and I encourage you to read it because I think it's very important to the people in our industry to understand how hardware is changing. And this is from one blog article that he wrote called The Free Lunch is Over.

Then he wrote another one called Welcome to the Jungle, which is about how our modern hardware systems are complex and have a variety of heterogeneous computational engines where those engines could be a general purpose CPU or they could be a GPU, and go from there. I'm going to very quickly go back and copy this link and paste it into the slides. If you just go to https://tinyurl.com/gids-arch-ml. Sorry. I meant to do that beforehand.

All right. So, the reality is we have to start paying attention to this hardware because what we can't do is what we asked our customers to do 20 years ago, which is to port their code to run on our platform. Instead we'd like to just be able to recompile the code to take advantage of new platforms.

And that's where technologies like LLVM are coming in. This is a modular architecture with respect to compiler optimization strategies, and you're able to reuse a lot of the optimizations and retarget different backend platforms, so part of the reason why Apple is going to be very successful in migrating from Intel to its own arm based chips now is because over the past decade, they moved to LLVM as the basis of their tool chain as did their customers.

All they're going to have to do is to be able to recompile to target the new platform, and that's the kind of ease that we as developers want. We don't want to rewrite our software because you have multiple CPUs or we're targeting a different architecture. And so, that's going to become increasingly part of what we deal with.

So, for about 30 years, from 1975 to about 2005, we were able to translate the growth potential from Moore's law into faster computers. But then that stopped happening. It got into production processes and the laws of physics and things like that, so instead of using the extra transistors to make more complex chips that were faster, we would make more chips and suddenly we have multi-core systems, quad-core systems, and to take advantage of that extra power, we as developers had to write concurrent software, which is something that we're not particularly good at and it's easy to make mistakes.

That's in part why the industry is moving towards functional programming, immutable data structures, and things like that, because then we can kind of hide the complexity underneath the programming model and we as developers don't have to work as hard to take advantage of that.

But in this same time period is when we see this explosive growth in things like cloud computing, so we can now spin up extra instances in the cloud as we need them, but also heterogeneous computing in, like I said, these complex hardware architectures that mix GPUs, CPUs, field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs).

Google is moving forward with their tensor processing units as machine learning coprocessors, so as they roll out more computationally intensive things like they did this past fall with the Burt algorithm, they had to include extra hardware so that it was hardware accelerated and it would still meet the customer's goals with respect to response times.

The other aspect that we have to deal with has to do with how much performance we get for how much energy we use. In the olden days of big data centers and things like that, we didn't care as much, but these days both in terms of environmental costs, but also in terms of something as pedestrian as battery lives on mobile devices, we do have to care about power consumption, and we're starting to see some really interesting things where the same code could be optimized differently if you want to focus on speed performance, as opposed to maybe lowering the power consumption of the runtime execution of your code.

This is just part of the nature of what we're dealing with. So, in order to take advantage of the data growth, we have to deal with parallelization, but that generally involves more complex programming models that developers struggle with, so it's complex when we're trying to roll out a computationally intensive thing like machine learning into these environments. So, I want you to take this picture away because this picture, it does a nice job of highlighting the complexity of our deployment models.

We've got stuff that makes sense to run in the cloud and we've got stuff that makes sense to run on devices, but we also have maybe this application code that's being deployed somewhere in the middle, and that's that edge area that I was talking about. It's not on device. It's not necessarily on premise, but it could be geographically close to where you are. So, we have low latency interactions with maybe microservices or serverless architectures that might be serving some kind of machine learning purpose.

When we think about building a system that has machine learning capabilities in it, this is what we have to have in mind when we do that, because if I have a medical application that is analyzing somebody's speech patterns to find evidence of… maybe they had a stroke or something, I don't necessarily want the data to come off the device because then I have medical regulations in many countries I'd have to deal with. So, if I can push the inference, the machine learning step down to the device, then that's good for privacy reasons. It's good for user experience reasons as well.

So, just as a quick reminder, in case you're not familiar with machine learning, the main difference here is that it's not just us as programmers telling the computer what to do. It's us trying to learn from our data to build models. So, we generally go through what's called a training phase, where the data and the answers, in the case of a supervised learning activity, are presented to the training model, and the model absorbs some amount of information from that.

And there's this big tension between if the model is not sophisticated enough, then there's a lot of error in the model based on bias, because it's too much of an approximation, and if we spend too much time looking at the individual data points, then that's going to be like the equivalent of a child who finds the answer sheet at school, and he can memorize the answers, and it will look like he did a great job, but if you asked him other questions, he didn't actually learn anything,

and won't do very well. So, that's why we have this tension in a machine learning system where we want to find a nice balance where the model is rich enough to absorb information from the data, but not so rich that we're, we're just memorizing individual data points because those aren't going to be there in our production system.

So, if we're successful, then we produce a model and then that model is then useful in an inference phase, were the rules that were derived from the data can be exploited where new data can be classified or make predictions accordingly.

That's why there's this two-phase process to a machine learning system, if we're talking about supervised learning, and that training area can be done anywhere, but the inference area is really going to be the part that the user interacts with, and that's the part that we need to be worried about where it ends up running.

When Apple introduces Metal and Metal 2 APIs that run on iOS that have hardware acceleration, when the Khronos Group introduces the Vulkan APIs to be able to do those kinds of things on other devices and desktop machines, that's about being able to consume those models in response to user input. And so, we need to worry about what those architectures look like.

Over time, these architectures, particularly with respect to deep learning systems, have gotten much more complex and more computationally intensive. So, in order to build complex neural networks, this is an example neural network that might analyze handwriting sample images, in terms of predicting whether an image has a number in it, like a three or a six or a nine. All of that training process produces a result, but that result might be a very large model that's very computationally intensive to run during the inference stage. And again, as we have more sophistication, we can capture more features and those features give us greater insight into sort of the abstractions of what's in an image.

It also ends up being a lot more computationally intensive. That's part of the trade off that we need to deal with. Think about that three-tier architecture diagram that I showed. It might make sense to train in the cloud, but then I might want to interact with the user so that if I hold my camera up, I can detect what's in the scene either for maybe dealing with people with eyesight issues or “I've got one app in my phone that allows me to detect what kind of a plant something is.”

The user experience is better if I can push those down. I need to worry about the model size. I need to worry about the updating process. I need to worry about the hardware that the device is maintaining.

We've got a lot of choices. When I teach machine learning classes, I usually use Anaconda. This is a great platform that you can download. It's basically a self-contained Python distribution, and it comes with a variety of tools, it includes Jupyter Notebook, that we'll talk about, and it can include RStudio if you're interested in the R platform, it includes VS Code, which has a lot of great support for plugins for doing machine learning and then maybe pushing things up to Azure Cloud or something.

That brings us to our first platform, which is the R programming language and maybe RStudio. This is a very tried and true established language. It was written by statisticians and those statisticians give us insights into various vertical domains on top of the learning algorithms. Anything from genomics and genetics to epigenetics, to financial models, chemical models, physical models are all available.

It's well-established, well-tested, but it's not really intended for production. It's going to take a long time to execute the R code. And therefore, just to let you know, as a learning platform, it's great. As an analysis platform, it's great, but I wouldn't invest a lot of time in trying to take an R based system into production.

Now, that doesn't mean it's not possible. There’s some really cool integration work that allows you to reach into Hadoop file systems and work with clustered environments to connect to other sorts of environments that support distribution of data or very large data sets, but it wouldn't be my first recommendation if you're interested in that. I would say some of the other platforms that I'm going to talk about are going to make more sense, but if you're just trying to learn about machine learning and play around and do so on your own devices, it's a great learning language. It's a great learning environment. You can create analysis for publications and documents and things like that.

Scikit-Learn is the main platform that I use for teaching. It's very easy to learn. It's got a very low overhead in terms of what you have to think about other than just basic Python and some consistent APIs, but it too is not necessarily something that you would directly run in production. Number one, you may not be running Python in production, but number two, the models themselves don't easily lend themselves to take advantage of extra CPUs and hardware.

But the reason I use it for training purposes is because the ideas absolutely translate to some of the other platforms, and it's just a lot simpler for a student to pay attention.

You notice the consistency in three different uses of the API here. We've got a linear regression algorithm where you create an instance of the learner. You fit the input to the output, to the X data X training data to the Y training data that generates a model that can then be used to make predictions.

Building a decision tree classifier, same thing. You create a decision tree; you specify some learning parameters for it. You fit the X inputs to the Y outputs. Then you can make predictions against the model. The third one is not a supervised learning activity. It is an unsupervised learning activity, but notice it's got a very similar structure.

You can take Scikit-Learn models and serialize them out and then reuse them or translate them into other models. So, I don't want to discourage you from considering that as an option, but in terms of learning, I don't know that there's a whole lot that's better out there because there's a lot you can do. There's plenty of algorithms to play with. It's really easy to get up and going. And when mixed with an environment like Jupyter Notebook, it allows you to tell stories, whether they're data science stories, or machine learning stories in an HTML document. The notebooks themselves tend to be JSON files and you can check those into your repos and check them out.

And as long as you have an environment like Anaconda available, then it's very easy for somebody else to interact with them. And so, you can say, “Here's why we're doing it”, and maybe link to your issue tracking system. “And here's where the data is coming from”, and maybe link to where the data is coming from. “And here's some transformations that we had to do to make the models work better.”

So, it's really great to tell stories and mix the code in and somebody can execute it. There's not a lot of installation that they have to worry about. As a learning platform, I think Scikit-Learn, Pandas and NumPy are fantastic for individual analysts or people working on teams or producing documentation or reports. They're fantastic. But if you're going to try to take it into production, then you're going to have to do some more work to make that more easily transferable. And it's not out of the question. It's just not the easiest thing to consider.

There's another option here and that's to use preexisting models that somebody else has gone to the trouble of maintaining. Microsoft, Google, Amazon, other companies are having hosted APIs or services that provide machine learning capabilities for things like speech recognition, natural language text translation, image classification, auto captioning, those sorts of things.

As an example, some of the things that you can play around with just as an API, the Microsoft Vision services gives you a very simple, easy API to do image classification behind the scenes.

And one of my favorite stories about this, I encourage you to go to http://wildbook.org and spend some time looking around at what they're doing, they've got some really cool videos, they're using these capabilities to build a platform that allows naturalists and researchers who are trying to preserve wildlife to be able to identify individual zebras and leopards and things based on their spots. And they're able to reuse these platforms. The models themselves are sort of attuned to these various deployment architectures, or you can reuse stuff in the cloud, but not only that. When you mix in things like people go to places and take pictures of animals on safaris and whatnot, and maybe they publish those pictures on their social media, there are bots that are running as part of the Wildbook project that will say “Hey, that's one of our zebras. That's one of our leopards”, and then try to pick up based on metadata from the image where it was located, and if they can't find it, then a chat bot will kick in and leave comments on the person's social media page to say “I'm trying to do some preservation. I recognize an animal that you took this picture of, would you mind telling me when and where you did?”, and that makes it a lot easier to build sort of production oriented systems if you don't have to worry about the training process. You're taking advantage of preexisting models that are sort of hidden behind an API.

As you consider rolling something out into production from a machine learning perspective, if there is a preexisting service that might be an easier thing to do, that already has addressed some of these latency issues, some of these cloud scaling issues, some of these other modeling issues, that's something to consider as well.

If you don't have to go through the whole training process yourself, you don't have the resources or the data or the capacity or the time or the budget to do so, oftentimes there are hosted services that might be available to use as well.

There's a series of, again, natural language processing capabilities that do entity extraction, keyword extraction, sentiment analysis, a lot of machine learning models that are based on large training initiatives with large corpuses of news stories, financial stories, medical literature, those are available through these APIs as well. So, it's not just about images and image classification, but also a variety of other capabilities and services.

On the other side of this, we're doing more stuff with internet of things devices and sensors, and if we're trying to make decisions based on temperature sensors, or humidity sensors, we want to have fairly high confidence that those things are accurate and believable. Microsoft and Nvidia are coming out with these very high power machines. They have a lot of computational power in the Azure sphere and the Jetson Nano spaces where you can do machine learning on these, but there's also support for digital signatures of data and whatnot. And so now you can imagine building supply chain systems that keep track of temperature and humidity, as food is transported from one location to another, you can try to make sure that the food shows up in a safe and healthy way, that you can learn if there are particular parts of the trip that have some kind of complication in it, those kinds of things.

We want to be able to detect if there is an anomaly or some kind of issue on the device and having the capacity to push more complex models to high power, but small, low power consumption devices is something to consider as well.

So again, this isn't just about necessarily pushing everything up into the cloud, it's about putting the data and the code in the right place, where we want it.

PyTorch comes out of… Facebook, sorry. It's just hard to keep track of these things. It is a Python based system, but it's designed to be very scalable.

It's equivalent to a TensorFlow, if you're familiar with TensorFlow, it's perhaps slightly more friendly from a developer experience. TensorFlow's learning from the impact of things like PyTorch and Keras to affect the API design.

But the nice thing about PyTorch is it scales down to run on your notebook. For your coding, you could maybe go to a coffee shop or go to the park or something and do some work and then come back in and the code itself can run in an environment that has access to GPUs or clusters. And the code itself doesn't really have to change all that much. The environment will sort of absorb the new computational resources when in the configuration. And so TensorFlow does that as well.

These suddenly become much more interesting with respect to building things that scale up to large amounts of data running in the cloud or in a data center. But then again, the models that they're creating may not lend themselves all that well to directly run on a phone, so we're gonna need another story with respect to that, but from a training and deployment perspective, from a developer experience perspective, PyTorch is one of the most popular frameworks out there. It is Python based, obviously, PyTorch, but the models that are produced can be consumed in other runtime environments.

If you're in Java, there aren't as many options. Java itself as a language is not really syntactically as clean and easy to read as, say, a Python, therefore non-software developers aren't drawn to it in the same way that they are to Python, but a couple of really exciting platforms have emerged in the last couple of years.

One is called Deep Learning for Java, and it's now officially part of the Eclipse project, and what's nice about it is it's got a core engine that's also written in C++, which can talk directly to GPUs and things like that. So, it's not a requirement, but if you run your Java code on a platform that has GPUs and you have it configured to use the C++ engine through JNI and other native platform interfaces, you can build systems that maybe deploy into your existing Java based ecosystems. And it's easier, if you come from the Java space, um, to not have to worry about learning Python or changing your runtime environment.

Deep Java Library, from Amazon, is a new one that was released last year, and provides a similar sort of capability. I would say they're both pretty similar, but I would say maybe the Deep Java Library maybe has a slight edge in terms of support for some of the newer algorithms, but the Deep Learning for Java one has quite a lot of nice activity. It's got image classification, training support, stuff for word embeddings, which is a type of natural language processing, deep learning architectures for identifying related words.

One of my favorite examples from the Deep Java Library is support for the BERT algorithm that Google announced two years ago and productionalized last year, and it's very useful for question and answering kinds of scenarios. And so, in the example section, they have a model, it's a very large model, but it was trained against a large Corpus, but that model is relatively easy to pull into a Java based environment to do things like… their example code has a paragraph that talks about BBC Japan and the fact that it was operated in Japan for an an 8-10-year period, and then it shut down. It's a little bit of just text that's available. And then you can ask a question and say !When was BBC Japan in operation?” And it may never actually say that, but because of the complexity of the model, it can interpret the language in a way that it understands the environment.

There's a question about Microsoft Visual Studio platform. Either through VS code, or through some of the more formal Visual Studio environments, there are tools, the Cognitive Toolkit, it comes from Microsoft and is available in cloud based services as well, so it's relatively easy to train stuff in Visual Studio and push it up to a cloud based system, but it's got support for pushing to Amazon and other places as well.

The Keras API, which is one that I'm going to mention coming up is nice because you can take the output of that model and very easily deploy it to Cognitive Toolkit with the Microsoft stack or PyTorch or TensorFlow, and so kind of get the best of both worlds. It's a very developer friendly training environment.

You're describing neural networks generally in terms of layers and hooking the layers up. It doesn't take a lot of code. It's not that hard to deal with, but when it runs in one of these other environments, it can take advantage of the computational backends that are available there as well.

If you're committed to Hadoop based systems or you have support for Hadoop file systems or you're doing a lot with MapReduce kinds of systems, then the Spark API obviously provides some nice ways of pushing analytics to those, but the MLlib library had to be written with those capabilities in mind.

For many years, MLlib sort of trailed the support of different types of algorithms or variations on algorithms or some of the newer algorithms that were available in some of the Python frameworks, but if you go and check out these days, they've really caught up.

If your architecture would benefit from this, or if you're dealing with Scala or Python and want to interact with Hadoop based systems, then the MLlib library from Spark is a nice platform to consider for that.

Obviously, TensorFlow is kind of the big-name everybody knows. It's been around for a while. It's gone through a couple of iterations. They're learning from the process. They're learning from other APIs, like Keras, that are influencing it. But these days, I'm actually more interested in a variation of TensorFlow that I'm going to talk about in just a minute. But this version, if you go and download it and install it, it's easy to run on a notebook, but then it also has abstractions in the form of how you express the code as data flow graphs, such that if you attach it to a session that's aware of multiple CPUs or multiple GPUs, your code doesn't have to change, and it will be able to take advantage of that.

TensorFlow scales nicely, it transitions things into runtime systems without having to rewrite code. The models that are produced can be serialized and loaded into a bunch of other language runtimes.

Historically, you had to train in Python, but you could deploy the models in Java, JavaScript, C++, Go, Rust, etcetera. But these days, you can train TensorFlow with JavaScript, Python or Swift, and then there are transitions for taking the models and deploying them into various runtimes. But I encourage you, if you're interested, spend some time investigating TensorFlow TFX. This is their big vision about where they want to go. And they're imagining it both in terms of making it easy architecturally to take advantage of things, to make it easy to migrate things from training into production. But also putting some rigor around the training process involving schema validation, so if you're training off of data and you see a type of data that you've never seen before, there are mechanisms to flag that as an error because your test data or your production data may not have evidence of that yet, and any models trained off of it may work differently than other types of data that you've seen in the past.

They want to build a learning platform that is able to learn arbitrary kinds of computation, arbitrary kinds of data relationships, and they're about… I don't know how to quantify it. They're on their way towards that. They're not there yet, but there are a lot of tools that are growing under the umbrella of TensorFlow TFX, and that's a great place to spend some time learning, because it gives you lots of options with respect to the runtimes of how you productionalize your systems.

As I said, Keras is one of the late comers. It was created in order to make it easier to build complex neural network structures. And then the outputs of these can be loaded into TensorFlow or PyTorch or the Cognitive Toolkit or the one that interests me more, which is TensorFlow JS. And I'll talk about that in just a second.

The guiding principles behind Keras relative to other ones is it was designed to be user friendly. And by user, I mean developer friendly. It's modular, it's extensible. You can put in custom algorithms; you can put in custom libraries and optimizers. Keras really sort of raises the bar with respect to the developer experience.

But the cool thing is we have this path towards taking those models and exporting them so that they are visible and usable in a variety of backend platforms that can scale to take advance of the hardware that's available.

But it’s also possible to take things like Keras models and convert them to run in TensorFlow JS. And you may think, “Wow! That's crazy! JavaScript is not a great language for doing numerical processing or, you know, it's not a high performance numeric language. And if you're running in JavaScript in the browser, how are you going to get that hardware acceleration that you said you needed?” That's part of what's so elegant about this approach.

Why is TensorFlow JS interesting? One of the things that we like about the web as an architecture, and one of the nice things about the rest architectural style is this optional code on demand portion of it. Dr. Fielding's thesis is that an architectural style is a set of constraints that yields certain properties and the optional code on demand part of that allows us to push code down to a client and have zero installation costs, so there's no need to install stuff on the device if you run in a browser, which is basically a universal client.

That's one of the nice things about this, is that we can push code down to clients and update it just by having them reload as long as there's good performance.

Now, the path to getting to the design of TensorFlow JS was by way of something called the TensorFlow Playground, and if you have not played around with this, I encourage you to go there. It's a great way of experimenting and learning about neural network structures and design changes, and how does that impact the learning process, but in order to make this work, they had to figure out how to make it fast.

Because again, JavaScript itself is not an all that optimized language, the JavaScript engines have gotten pretty fast, but the language itself is hard to optimize, which is why we have web assembly, and I'll talk about that with respect to TensorFlow JS in a second.

But what they did was something really clever that happened outside of the browser space many years ago. So, 25 years ago, I worked on what was the first whole Earth visualization environment. So, if you think about Google Earth, that was based on one of our competitors, so obviously they were able to sell better than we were, but we were there first.

And so, you know, 25 years ago, we were building systems that had terabytes of data of terrain information, digital terrain information, hyperspectral imagery, satellite imagery, all kinds of stuff. And we all had to have silicon graphics workstations. But, within a year or two of us producing this software, the PC market had GPUs suddenly start to become available, and we were able to port the app to run on a commodity PC with a nice GPU in relatively short order. And we were able to do that because of advances in 3D graphics in gaming as a platform. And when scientists saw that, they said “Hmm, We want to be able to use that cheap hardware as well”, but the programming language to do 3D graphics, it's called a shader language. It looks kind of like C and they decided, “Hey, if I pretend like I'm doing 3D graphics, I can do this.” And so, that's basically what happened here. Eventually the GPU vendors said “Guess what? We want you to be able to do this directly”, so CUDA and OpenCL as libraries were released to make it easier to do that.

But that's exactly what the TensorFlow team did here. They were able to push the code down to the browser with zero installation and pretend like they're doing 3D graphics because everybody supports WebGL, so WebGL was an indirect path towards getting hardware acceleration in the browser.

Now, in about a year, we will see WebGPU become a standard, and that will be a more formal, direct way of accessing GPU hardware in a browser based environment, but until then we needed some way of getting there and indirectly getting it through 3D graphics was a clever way of doing it.

So why would we'd be interested in doing machine learning with JavaScript?

Well, it's obviously a very popular language. And if you want developers to be able to push machine learning capabilities into their applications, it's not very nice to say “Okay, now go learn Python.” If we can have them work within their own environments and tools and ecosystems and everything, that's a lot nicer.

But it's also very easy to share and deploy systems that run in a browser-based environment. Now, the nice thing about TensorFlow JS is it also runs in Node. Now, in Node, there are different security considerations there than there are in the browser. It can talk directly to CUDA libraries, but the application itself is designed to shield you from that transition.

And that's the main thing that I want you to take away from this, is depending on your approach, you can build software using these very complex training models like Keras, like TensorFlow, like PyTorch on the backend, but produce models that then can be deployed over the web and run in an environment like TensorFlow JS or run in the backend in a node. That's a very exciting time to be able to operationalize machine learning systems.

What are some of the challenges? Obviously performance, variations between the platforms, so we need to support that somehow. And one of the ways that they did was to design an API where the bigger sort of yellowish triangle or diamond here is the low-level API, but they built a really nice layers API influenced by the Keras API.

Great developer experience in terms of building this, but it can run in a browser that has WebGL support or not, or it can run in Node with hardware acceleration in the form of CPU, 70 SSE AVX2 kinds of optimizations, or access GPUs, or access TPUs, tensor processing units. But the main thing for you as a developer, as an architect, is that the code doesn't have to change as it migrates through these various platforms, but they're also not running slowly. They're able to take advantage of the hardware that's there.

As I wrap up, I just want to highlight a couple of things. The plain JavaScript version with just a CPU, no hardware acceleration for an image classification activity takes up to three and a half, 3.3 seconds in that environment, but the exact same code running in an environment with WebGL support on a mediocre, embedded hardware like a MacBook or a low-end MacBook Pro that has Intel integrated GPU on the device, notice the same application goes from 3.3 seconds down to 49 milliseconds. And if you take the same application and run it on a box with an even beefier, bigger sort of a more powerful GPU, like a GTX 1080, you can get it down to five milliseconds. Then, if you run it in a Node environment, you can get it down as far as three milliseconds, so it's a thousand times faster, but the application doesn't change.

We've got a question: “In production, which is better between TensorFlow and PyTorch?” I would say they're about the same. I don't think anyone has a particularly strong… They're both good. They're both beneficial. They both have good models for image classification and feature extraction, things like that. I would say they're roughly comparable in terms of the results, so I don't think you can really go wrong.

This was very exciting, when TensorFlow JS was released targeting JavaScript platforms, but while people were playing around with it, they went and built a web assembly backend as well. And that web assembly backend is comparable to the WebGL backend. So now, just based on being able to write it in a higher level language, like C++, recompile it to web assembly and run it in a browser or run it outside of the browser, we have the same ability.

The only thing that's changing is the backend. We've got these various backend implementations. We'll swap out the one that talks to the GPU for one that talks to web assembly and notice the higher-level application doesn't change. So that's the main thing that I want you to take away from this talk, is that we have the ability to do that.

One final point for you to take away and think about is, if we're running in the browser, like we can with TensorFlow JS, then we can take advantage of other technologies in the browser, like WebRTC, and WebRTC allows us to build direct peer-to-peer your connections. It does network address translation. There are web torrent builds on top of WebRTC so that you can share large data sets and models and things like that. But we can also take advantage of the other features of WebRTC in the browser, like permission to access to the camera.

If you go to, I'm not sure if this is going to work, I'll try this in this browser.

Here, it's going to ask for permission to have access to my camera and I will say yes. Now, we see two copies of me. Sorry about that. But I can apply things like CSS style sheets, CSS filters against the real-time video. I can also take snapshots, and set that as the source of a canvas, because we have access to WebRTC in the browser, as well as machine learning capabilities tied all together. This is an example of using WebRTC with TensorFlow JS.

There's an existing model called MobileNet. You can deploy WebRTC with very minimal code. If you go to that GitHub link that is in the slides, you can play around with this example, but it's very simple to just pull in some JavaScript code that gives you access to the camera. It does not take much codes to activate the camera. You have permission to access, you're not able to just turn on the camera, but once you have the video element going, then you can tie it to a machine learning model like MobileNet, and it can do real time image classification in the browser with hardware acceleration.

It's pretty good. Here's a coffee mug. It got that one right. Here's a knife. It got that one right. Here's a tomato that it thought it was a bell pepper, so it's not perfect, but you can see how being able to combine the various features of what's in the browser with machine learning models that are trained in the backend, but available to load in the front end like this is a really great combination.

I have a link here that has a bunch of resources about learning machine learning. If you've got different backgrounds, if you're new to everything, you can start at the top. If you're wanting to focus on statistics or data science or machine learning or deep learning kinds of systems.

There's a lot of recommendations of books and courses and videos. A lot of it's free, a lot of it's available through Safari. I hope you are interested in pursuing that and successful in moving forward. I thank you for your time.

See Highlights of

Hear What Attendees Say

“Once again Wurreka has knocked it out of the park with interesting speakers, engaging content and challenging ideas. No jetlag fog at all, which counts for how interesting the whole thing was."

Cybersecurity Lead, PwC

“Very much looking forward to next year. I will be keeping my eye out for the date so I can make sure I lock it in my calendar"

Software Engineering Specialist, Intuit

“Best conference I have ever been to with lots of insights and information on next generation technologies and those that are the need of the hour."

Software Architect, GroupOn

Hear What Speakers & Sponsors Say

“Happy to meet everyone who came from near and far. Glad to know you've discovered some great lessons here, and glad you joined us for all the discoveries great and small."

Scott Davis, Web Architect & Principal Engineer, ThoughtWorks

“What a buzz! The events have been instrumental in bringing the whole software community together. There has been something for everyone from developers to architects to business to vendors. Thanks everyone!"

Voltaire Yap, Global Events Manager, Oracle Corp.

“Wonderful set of conferences, well organized, fantastic speakers, and an amazingly interactive set of audience. Thanks for having me at the events!"

Dr. Venkat Subramaniam, Founder - Agile Developer Inc.