Nicolas Pinto Predicted the Deep Learning Tsunami but Not the Velocity - Selling to Apple Was His Answer

Nicolas Pinto, Deep Learning Lead at Apple © Robert Wright/LDV Vision Summit

Nicolas Pinto, Deep Learning Lead at Apple © Robert Wright/LDV Vision Summit

Nicolas Pinto is focused on mobile deep learning at Apple. At the 2016 LDV Vision Summit, he spoke about his journey as a neuroscientist and artificial intelligence researcher, turned entrepreneur who sold his company to Apple.

Good morning everyone. My name is Nico and I have ten minutes to talk to you about the past ten years. When I went from being a neuroscientist at MIT and Harvard then moved to create a self deep learning startup in Silicon Valley and finally sell it to Apple. Let’s go back to 2006, about 10 years ago. That's when I started to work with neuroscientist Jim DiCarlo at MIT and David Cox at Harvard. They had a very interesting approach. They both wanted to do reverse and forward engineering of the brain in the same lab. Usually, these things would be done in different labs, but they wanted to do both in the same lab. I thought the approach was very interesting. They really wanted to do, to study natural systems of real brains, the system that works, and also build artificial systems at scale. Scale that approach at natural systems, so really big scale.


LDV Capital invests in deep technical people building visual technologies. Our LDV Vision Summit explores how visual technologies leveraging computer vision, machine learning, and artificial intelligence are revolutionizing how business and society.

LDV Capital regularly hosts Vision events – check out when the next one is scheduled.


What do we see when we study natural systems? Let me go very quickly here. The first thing is we see when we study the visual cortex, we are focusing on vision obviously, is that it's basically a deep neural network, composed of many layers and that kind of share similar properties. Properties that are kind of similar, repeated patterns. A lot of these properties have been described in the literature. Many, many, many, many studies from the sixties have described these things in the physiology literature so you can look at them. What other things we saw is like, if you look at the modeling literature, there are many, many, many, many different ideas about how these things could be working, starting in the sixties.

I think this will work really well.

Many, many, many, many, many different studies, many different models, many different ideas, and parameters. Many ideas and ultimately culminating in convolutional neural networks. That's probably what you've heard of. These convolutional networks have been popularized by Yann LeCun in the machinery community but also by Tommy Poggio in the computational neuroscience community with the HMAX model. All these models cannot work the same, cannot look very different. You don't really know and what's very interesting about them is that they have very specific details and some of them have to do with learning, some of them have to do with architecture. It's really hard to make sense of all that in terms of learning aspects, there are many ways you can do learning. Many, many, many, many, many different ideas about how you can do learning starting in the sixties and moving on from computational neuroscience into machine learning. Many different ideas. I'm not going to go into it but like so many different years it's so hard to explore this particular space.

© Robert Wright/LDV Vision Summit

© Robert Wright/LDV Vision Summit

What we saw back in the days that the hypothesis space of all these different ideas, and you know a combination of all of them, was overwhelming to explore. As a graduate student, you're looking at all of this, they all kind of make sense, kind of not, you're not really sure how to combine them. The space was largely unexplored. If you just take one particular idea, for example here, you will see that one particular idea has many, many, many, many, many different parameters depicted in red here. These parameters are many, like, many, you have a lot of them and you have a lot of models and it's very, very overwhelming. Again, here, for deep learning, so many parameters. How do you set those parameters?

The usual formula is that you take one grad student in a given lab. You just take one particular model, usually the model will be derived from the particular lab and the size will be limited by runtime. At the time everyone was running in MATLAB. You tweak all different parameters by hand, one by one, all by hand, and you hopefully can crush a few benchmarks. You hope that you can get this kind of work published and you claim success. Don't forget, at the end of all of this, you get one Ph.D.

But if you tweak all of these different parameters by hand and one by one, not really knowing what you want to do, it's a little bit like what some people will call graduate student descent. Just taking one grad student and kind of exploring this space slowly at a time. That's very aggravating and very, very boring.

We wanted to do something a little bit differently. We would take one grad student, that would be me in this case. But we wanted to do is test many, many, many, many different ideas and take big models, big models at kind of scale and approach the scale of natural systems. Hopefully we could crush a few benchmarks as well. Maybe we could even get that published and hopefully get one Ph.D at the end of it.


If you want to have really good ideas, it's fairly simple. You just need to get many, many, many, many different ideas and just throw the bad ones away.

-Nico Pinto


The inspiration, I got it from this guy, Linus Pauling, double Nobel prize winner. It told me, well it told everyone, if you want to have really good ideas, it's fairly simple. You just need to get many, many, many, many different ideas and just throw the bad ones away. Very simple. In biology we are told to do that. It's called high-throughput screening. Very face to fancy name and it's a very beautiful technique that cannot imitate natural selection. Let me show you how it works in biology.

What you do is you plate a diversity of organisms, like you're looking for some organism that will have some properties you're looking for. You allow them to grow and interact with the environment. You apply some sort of challenge to the properties you're looking for. You collect the surviving colonies and ultimately you study and repeat until you find an organism that fits the bill. In biology inspired computer vision, you'll do the same thing. You'll generate a bunch of random models from these many, many different ideas. You know, some from you, some from the literature, apply some sort of learning to learn the synaptic ways and interact with the environment, test with a screening task. A particular vision task is, in this case, skim off the best models, study, repeat, and validate on other tasks and hopefully get the proper things you're looking for, a really good visual system.

What's nice about this particular technique is that even though we call it high-throughput screening, it's basically just brute force. Right? I mean it's a very nice name, we like that as scientists, but it's just pure brute force. In this particular case we needed to have a lot of compute power to run all of these models. But we were back in 2006 and this is basically at a time where these GPUs is very cheap graphics processing units, started to be very, very powerful and actually programmable. We got lucky because we found that trend very early on. Basically ten years ago. The problem with this is that it took a while in the beginning because it was quite complicated to program. We had to build everything from scratch. The computers, the software stack, the programming, it was a lot of fun but quite hard at the beginning since there was no library for it. We even went as far as building a cluster of precision trees back in 2007 to get this raw brute force power that we needed for these particular experiments.

We also got access to hundreds of GPUs because at the time national computing, supercomputing centers were building a lot of these supercomputers with tons of GPUs but not many people knew how to use them properly so we had access to all of them because they wanted to get used. With a brute force approach we can do that. We also taught these things back in 2008 and 2010 at Harvard and MIT how to use these GPUs to be able to do more computational science cheaply with these graphical units.

Let me skip forward. We basically applied this technique, came up with a model that is all encompassing, has tons of parameters and can now compress all these different ideas as I just mentioned. Apply our brute force technique and hopefully at the end of the day we basically got very good results and we we were very surprised. The results that we got were so surprising that even got featured in Science in 2010. Not only we were surprised, but even Yann LeCun himself was surprised. He told us that some of this work was actually influential in such a way that we uncovered some very important non-linearities using this kitchen sink approach.

© Robert Wright/LDV Vision Summit

© Robert Wright/LDV Vision Summit

Since we had some very interesting kind of results, we wanted to see if we could apply this to the real world. We compared of technology to a commercial system called face.com, got bought by Facebook. We were able to crush their performance. We even got in touch with Google back in 2011 and they told us that we were basically kind of influencing a little bit of their work, the early Google brainwork back in 2008.

We decided to start a company based on this and this, the company was called Perceptio. It was a very early startup, you probably won't see much information about it. The goal of Perceptio was basically to come up with some brain inspired A.I. that you can trust. Trust was very important to us. We wanted to make sure we preserved the privacy of users.

Why a startup? Well, what we saw is that we wanted to obtain nice progress but we saw that academia and industry were kind of optimizing for progress but kind of not. On one side you had academia where we would be a credit economy. In a credit economy, what you do is you plan flags and you guard territory. It's all about me, me, me first. You don't really know what's going on, you just have to plant flags, that's how you get a career. In industry it's a profit economy. You have to make money and all of time what that means they're selling user data. We wanted to kind of create a new organization that would not be operating like this. We had grandiose ideas like many others. It didn't work out in the end but we got this grandiose idea of starting this intersection of incubator, industry lab, academic lab, focusing on progress only. It didn't work out but that's what we wanted to do.

The applications that we were focusing on was a small social camera and our moats, our competitive advantage was going mobile first. Everyone was going in the cloud, we wanted to bet against the cloud and go mobile first. Everyone was surprised at the time. It was 2012. Everyone was running deep learning in hundreds of thousands of CPU cores or big GPUs. I'd say, "Why would you even try to do this? It's not even possible." If you look at it carefully, it cannot be done on the computer side. Not much compute going on. But ultimately we did it and a lot of the things that we uncovered back in 2012 are now being rediscovered by the community. Some people claimed that we could not do it because we would not see enough data. Well it turns out if you're the most popular camera in the world, you get to see a lot more data than the cloud. This camera, if you sit right next to the sensor, you can get dozens of frames per second and only a fraction of that will go in the cloud.

People got it now. We could preserve privacy. Ultimately, we were able to predict the timing of the deep learning tsunami but not it's velocity. We had to scale with this small company and the only way for us to scale was to go to an acquisition. The problem with the acquisition is that most of the companies will go about this profit economy by selling user data so it was really hard for us to find the right home for our technology and scale. We thought very hard and we kind of found a little small fruit company back in Cupertino that has, you know, think very differently. They think different about these things and they really do care about the user's privacy and not selling your user data. That's where I am right now. That's where Perceptio is right now and that's it. This is like the end of the ten minutes, ten years of my life. Thank you very much.