An Academic Budget Inspired Raquel Urtasun to Design Affordable Solutions for Self-Driving

Raquel Urtasun.jpg

One week until our LDV Vision Summit 2018 - May 23 & 24 in NYC at the SVA Theatre. Limited tickets are still available to see 80 speakers in 40 sessions discuss the cutting edge in visual tech. Register now!

Raquel Urtasun is a recipient of NVIDIA Pioneers of AI Award, three Google Faculty Research Awards and several more. She lectures at the University of Toronto and the Vector Institute and is the head of Uber ATG, Toronto. At our LDV Vision Summit 2017, she spoke about how autonomous vehicles with human perception will make our cities smarter and better to live in.

It's my pleasure to be here today, and I wanted to introduce who I am just in case you guys don't know.

So I have three jobs, which keeps me quite busy. I am still an academic, one day a week I am I the University of Toronto and the Vector Institute which I co-found with a whole bunch of people that you see in the picture including Geoff Hinton. And the latest greatest news, I guess, as of May 1st 2017, I'm also heading a new lab of Uber ATG in Toronto, so self-driving cars are in Canada now and that's really, really exciting.

Today, I'm going to talk about what led to the Uber acquisition [of the University of Toronto team]. Perhaps you have already seen another discussion about why we need self-driving cars, but what is very important for me is actually that we need to lower the risk of accidents, we need to provide mobility for many people that right now cannot go to the place they want to go, and we need to think of the future of public transportation or ride sharing. In particular, we need to share resources. Ninety-five percent of the time the car is parked, so we are just utilizing our planet without real reason.

 © Robert Wright/LDV Vision Summit 2017

© Robert Wright/LDV Vision Summit 2017

If we look at typically what is going on in self-driving car companies we find: they're pretty good at localization, path planning, and obstacle avoidance, but there are two things that they do which actually made them not super scalable. The first thing is LIDAR, the prices are dropping, but it is still quite expensive to buy a decent LIDAR. And the other thing, which is the been in the closet, is actually mapping.

What I have been working for the past seven years is how to make solutions that are scalable, meaning cheap sensors and trying to drive without maps or with as little prior knowledge as possible.

Now if you want to do something of this form, we need to think about many different things at once. The first thing that us at academic was difficult was data and so we created many years ago, I guess, it's still the only benchmark for self-driving which is KITTI. And to my despair, this is still the only benchmark, which I don't understand.

If we want to get rid of the LiDAR, get rid of the maps, one of the things that we need to...have is robust, good, and fast, stereo 3D reconstruction.

The other thing that is important is learning. Right, one can't just handcraft everything, because we need to be robust with scenarios that we have never seen before. We need holistic models to reason many things. At the end of the day, we have fixed computation for many things, many tasks, and we need to think of hardware at the same time.

If we want to get rid of the LiDAR, get rid of the maps, one of the things that we need to do is apply deep learning to have is robust, good, and fast, stereo 3D reconstruction. This can run real-time and after forty meters can basically almost replace the LIDAR.

Other things that you need to do is work on perception. You spend the past year and a half obsessed with instance segmentation. This is where you're segmenting the image. The idea is that you have a single image and you are interested in labeling every pixel but not just with the category of car, road, but also you want to estimate - this is one car, this is another car, etc... And this is a particularly difficult problem for deep learning because the loss function is agnostic, dupe or imitation. So we've built some interesting technology lately based on the what they should transform. It scales really well. It's independent of the number of objects so you can run real-time for anything. And this is triangularization. It's trained in a set of cities and tested in another set of cities. You see the prediction in the middle and the ground truth on the right. Okay so, even with crowded scenes [the model] can actually do pretty well.

Now, if you want to do self-driving, labeling pixels is not going to get you there. Right, so you need to really estimate what's happening everywhere in the scene. This is our latest, greatest results during detection and tracking. This is actually very technically interesting. You can bug propagate through solvers. And here, you see the results of what we have as well.

In general, what you want to do is estimate everything that is in the scenes. So here, we have some results that we had even a couple of years ago, with a single camera mounted on top of the car. The car is driving in intersections it has never seen before and is able to estimate the local map of the intersection. It is creating the map on the fly. It is estimating, whether your car is doing localization as well as estimating where every car is in this scene. And the traffic situation that you see on the bottom left, even though it doesn't see traffic scenes or things like that. So the cars that are color-coded in varying intentions. Basically, here we are estimating where everybody is going in the next couple of seconds. And this is as I said, [with a] single camera [and] new scenarios that we haven't trained.

Other things that you need to do is localization. Localization is an interesting problem, because typically the ways zone is that same way with us. If you go around and then you collect how the world looks like and that's really expensive, meaning that basically you need to know the appearance of the world that [the cars] are in every point in time.

It takes thirty-five seconds of driving to actually localize with a precision of 2 meters

We look at a cartographic environment and the motion of the vehicle to estimate really quickly where the vehicle is in the global coordinate system. Okay, so you see here, so you have a probability distribution over the graph of the road. The vehicles are driving, you have a few miles of the distribution and very quickly we know exactly where this vehicle is.

This is a Manhattan-like scenario, there are two miles of the distribution but again soon we are going to do something where there is only a single location. And this for the whole city of Kalser (NJ) which is two thousand kilometers of road. It takes thirty-five seconds of driving to actually localize with a precision of 2 meters, which is the precision of the maps that we use. These maps are available for free online for sixty percent of the world. So you can just download, you don't need to capture anything; it's free.

Now, in terms of mapping rights, why do car companies or self-driving car, or players use maps? You can think of a map as a sensor, which basically tells you the static part of the scene. It gives you robustness and it allows you to only look at the dynamic objects.

The problem with the way the mapping is done is that you have, say one of these cars with these expensive sensors, and basically you drive around the world, you have your data and then there is some labeling process where you basically say where are the roads, where are the lanes, where are the possible places where can park, etc. Okay, that makes you have very small coverage, because this is at the vehicle level and is very expensive. As an academic I look at "Can we actually do this by spending zero dollars?"

In those terms, we figure you can use aerial images or satellite images. Satellites pass around the earth twice a day so you have this up-to-date view of the world. And we create methods that can automatically extract the HD maps of the form that you see on the top where you have lanes, parking spots, sidewalks, etc. Yes, automatically it takes only 3 seconds in a single computer to get to estimate this perpendicular road. Basically, with a very small cluster of computers, you can run the whole world having up-to-date estimates.

 © Robert Wright/LDV Vision Summit 2017

© Robert Wright/LDV Vision Summit 2017

Five and a half years ago, I created KITTI. And one thing that's bugged me about mapping is that is only the players, the companies, that are actually working on this. So, I created Toronto city. This is about to go online soon. The greater Toronto area is twenty percent of the population of Canada; it's huge, and we have all these different views: panoramas, LiDAR, cameras from the area views, drones, etc.

Now, as an academic, I cannot pay Labelers to label [the images]. Just the aerial images are going to cost between twenty to thirty million dollars to label it. What I did was I went to the government and I put all this information from maps that the government has captured through 3D maps of the city, every single building, etc. And then basically, with the veil of algorithms that can align the sources of information including all the different sources of imagery as well as the maps and automatically created ground truth. And here you see the quality of the ground truth is really, really, good. Now, we have ground truth for the whole Greater Toronto Area and we're gonna put online the benchmark where it sends. So this area is the tasks that you can participate with, for instance, semantic segmentation.

A little thing that we have built since then is also implementing ways to be able to extract these maps automatically. You can see from aerial images and one of the thing that was interesting is from the panoramas, you can actually get automatically centimeter accurate maps. That was actually quite interesting. Alright, to conclude, the last seven years, my group has been working on ways to make affordable self-driving cars that scale with a sense and perception, localization, and mapping. Thank you.

LDV Capital is focused on investing in people building visual technology businesses. Our LDV Vision Summit explores how visual technologies leveraging computer vision, machine learning and artificial intelligence are revolutionizing how humans communicate and do business.

Tickets are available for the LDV Vision Summit 2018, where you can hear from other amazing visual tech researchers, entrepreneurs, and investors.

Facebook is building a visual cortex to better understand content and people

 Manohar Paluri, Manager of Computer Vision Group at Facebook ©Robert Wright/LDV Vision Summit

Manohar Paluri, Manager of Computer Vision Group at Facebook ©Robert Wright/LDV Vision Summit

Manohar Paluri is the Manager of the Computer Vision Group at Facebook. At our LDV Vision Summit 2017 he spoke about how the Applied Machine Learning organization at Facebook is working to understand the billions of media content uploaded everyday to Facebook in order to improve people’s experiences on the platform and connect them to the right content.

Good morning everyone. Hope the coffee's kicking in. I'm gonna talk about a specific effort, an umbrella of efforts, that we are calling ‘building Facebook's visual cortex.’

If you think about how Facebook started it was people coming together, connecting with friends, with people around you. And slowly through these connections using the Facebook platform to talk about things that they cared about, to upload their moments, whether it's photos. Some may not be the right thing for the platform. Some, obviously moments that you care about. Slowly moving towards video.

This is how Facebook has evolved. The goal for applied machine learning, the group that I'm in, is to take the social graph and make it semantic. What do I mean by that? If you think about all the notes there, the hard notes are basically what people actually interact with, upload, and so on. But those soft notes, the dotted lines, are what the algorithms create. This is our understanding of the people and the content that is on the platform.


LDV Capital invests in people building visual technology businesses. Our fifth annual LDV Vision Summit on May 23 & 24 will discuss the disruptive force of computer vision, machine learning and AI.

Tickets are now available for the LDV Vision Summit 2018 to hear from other amazing visual tech researchers, entrepreneurs, and investors.

Special prices available for startups, researchers, & groups.


This is useful, and thinking about this in this way is scalable because whatever product or the end technology or experience of your building now have access to not only the social graph, but also the semantic information. So you can use it in various ways. This is something that has actually revolutionized the use of computer vision specifically.

Now, if you take a step forward, most likely the last thing that you liked on Facebook is either a photo or a video. When I started in 2012 we were doing a lot of face recognition, but we've started moving beyond that. Lumos is the platform that kind of born out of my internship. So I'm super excited because Lumos today processes billions of images and videos. It has roughly 300-odd visual models that are being built by computer vision experts and general engineers. They don't necessarily need to have machine learning and computer vision expertise. It uses millions of examples. Even though we are making significant process on supervised and unsupervised learning, the best models today are still fully supervised models.

Now, going through a very quick class, the state-of-the-art models today are deep residual networks. Typically what you do is you have a task. You take this state-of-the-art deep network trained for that task. Takes a few weeks, but if you have distributed training then you can bring it down to hours. If you have a new task, the obvious baseline is to take a new deep net and train for the new task.

But, think about Facebook, and think about billions of images and hundreds of models. They don't necessarily multiply and it's not feasible. So what do you do? The nice thing about deep networks are their hierarchical representations. So things that are hierarchical representations, the below parts of the layers, the lower parts of the layers are generalized representations, and the top parts of the layers are specific for the task. Now, if you are at Facebook and you have a task, you should be able to leverage the compute on billions of these images again and again for your task.

image1.jpg

So with Lumos, what people can do, is actually plug at various parts of the layers that suits them. And they make the trade off between computer and accuracy. This is crucial to scale to all the efforts that we are doing for billions of images. Now, as the computer vision group, we might not understand the implications of a loss of accuracy or making something faster because of accuracy. But the group that is building these models know this very well. So with Lumos they are able to do this in a much more simpler manner and in a scalable way.

What is a typical workflow for Lumos? You have lot of tools that allow you to collect training data. One of the nicest things of being at Facebook is you have a lot of data, but it also comes with a lot of metadata. It could be hashtags. It could be text that people write about, and it could be any other metadata. So you can use lots of really cool techniques to collect training data. Then you train the model and you have this control on the trade off between accuracy and compute.

You deploy the model with the click of a button, and the moment you deploy the model every new photo and video that gets uploaded now gets run through your model without you doing any additional bit of engineering. And you can actually refine this model by using active learning. So you are literally doing research and engineering at scale together every day. And you don't need to be an expert in computer vision to do that.

Here is a series of photos that come in and get classified through Lumos, and the concepts that are built through the Lumos. Obviously you can only look at certain portion because we get an image every four minutes.

Lumos today powers many applications. These are some of them. A specific application that I thought would be interesting to talk here is the population density map estimation. So here what happened was connectivity labs cared a lot about where people live so that we can actually provide connectivity technology, different kinds of it, whether it's an urban area or a rural area. So what did they do? They went to Lumos and they trained a simple model that actually takes a satellite file and says whether it's a house or not. And they apply it billions of times on various parts of the world.

Here is a high resolution on the right side that we were able to generate using this Lumos model. And they didn't have to build any new deep net. They just use a representation of one of the existing models. If you apply this billions of times you can detect the houses. And if you do it at scale ... This is Sri Lanka. This is Egypt. And this is South Africa. So based on the density of where people live, you can now use different kinds of technology, connectivity technology whether it's drones, satellites, or Facebook hardware installed in urban areas.


...What Lumos is trying to do, it's actually trying to learn a universal visual representation irrespective of the kind of problem that you are trying to solve.


If you think about what Lumos is trying to do, it's actually trying to learn a universal visual representation irrespective of the kind of problem that you are trying to solve. At F8, which is the Facebook developer conference, we talked about Mask R-CNN. This is the work that came out of Facebook research where you have a single network that is doing classification, detection, segmentation, and human pose estimation.

Think about it for a minute. Just five years ago if somebody had told you that you have one network, same compute, running on all photos and videos, that would give all of this, nobody would have believed it. And this is the world we are moving to. So there is a good chance we'll have a visual representation that is universal.

image3.jpg

Here are some outputs, which are detection and segmentation outputs, and when you actually compare it to ground rule, even for smaller objects, the ground rules and predictions of the algorithm actually match pretty well, and sometimes we cannot distinguish them. And taking it further you can do segmentation of people and pose estimation. So you can actually start reasoning what activities people are engaging in in the photos and videos.

Now, the way we are moving, as rightly pointed out before, is the understanding and this technology is moving to your phone. So your device, what you have, it's pretty powerful. So here are a couple of examples where the camera is understanding what it's seeing whether it's classification of the scene and objects or it's understanding the activities of the people. This is Mass To Go, which is running at few frames per second on your device. To do this we took a look at the entire pipeline, whether it's the modeling or the runtime engine, or the model size.

Taking it a step further, the next frontier for us is video understanding. Here, I'm not gonna play the video, but rather show you the dashboard that actually tells you what is happening in the video. Here is the dashboard. We use our latest face recognition technology to see when people are coming in and going out. We use the latest 3D connect based architectures to understand actions that are happening in the video. And we understand the speech and audio to see what people are talking about. This is extremely important. Now, with this kind of dashboard we have a reasonable understanding of what's happening in the video. But we are just getting started. The anatomy of video is much more complex.

So how do you scale this to 100% of videos? That is non-trivial. We have to do lot of really useful things and interesting things to be able to scale to 100% of videos. You're doing face recognition and actually doing friend tagging. The next one is actually taking it a step further, doing segmentation in video, and doing pose estimation. So we are able to understand people are sitting, people are standing, talking to each other, and so on, with the audio.

That is basically the first layer of peeling off the onion in the video, and there's a lot more we can do here. Now another step that we are taking is connecting the physical world and the visual world. As rightly pointed out we need to start working with LIDAR and 3D data. Here, what you see is the LIDAR data. That is actually we are using deep net to do semantic segmentation of this three-dimensional LIDAR data and doing line of sight analysis on the fly.

We brought down the deployment of Facebook hardware to connect urban cities from days and months to hours, because we were able to use computer vision technology. I have only ten minutes to cover whatever we could do, so I'm going to end it with one statement. I really believe to be able to bring AI to billions of people you need to really understand content and people. Thank you.

You Are Brilliant and You Want More Exposure

Our LDV Vision Summit is coming up on May 23-24, 2018 in New York. We bring together top technologists, researchers, startups, media/brand executives, creators and investors with the purpose of exploring how visual technologies leveraging computer vision, machine learning and artificial intelligence are revolutionizing how humans communicate and do business.   

Through March 25 we are collecting applications to the Entrepreneurial Computer Vision Challenge and the Startup Competition.

Every second of every day, people around the world are publishing research papers and launching new startups that leverage computer vision, machine learning and artificial intelligence.

Researchers and professors want their work to be noticed in the midst of a flood of new work.

Entrepreneurs want to build valuable businesses, get covered in Techcrunch, Wired, Wall Street Journal, want to raise financing and want happy customers.

We want to help you!

We have been organizing the premier annual visual technology summit since 2014 called the LDV Vision Summit with the main focus of showcasing brilliant people like YOU!

Entering competitions increases your odds of being recruited, raising capital, or selling for over $100M because your work becomes visible to an audience of actors working to advance your field.  The key is to focus on attending and competing where it is most contextually relevant for you to further your goals. If you’re in visual tech, that means the LDV Vision Summit.

We bring together top technologists, researchers, startups, media/brand executives, creators and investors with the purpose of exploring how visual technologies leveraging computer vision, machine learning and artificial intelligence are revolutionizing how humans communicate and do business. 

Speakers and judges come from Apple, Cornell Tech, Qualcomm, NBCUniversal, Stanford, Facebook, MIT, Greylock Partners, CMU, Wired, Spark Capital, Nvidia, First Round Capital, Flickr, Refinery29, Lytro, Timer Warner, Samsung, Magic Leap, Ooyala, Hearst, Google and many more.

Enter and present your brilliance at our 2018 LDV Vision Summit Startup Competition or the Entrepreneurial Computer Vision Challenge (ECVC). Application deadline is March 25, 2018.   

 Sean Bell, CEO & Co-Founder, GrokStyle from Cornell Tech. ©Robert Wright/LDV Vision Summit

Sean Bell, CEO & Co-Founder, GrokStyle from Cornell Tech. ©Robert Wright/LDV Vision Summit

Past competitors in the ECVC, like 2016 winner, GrokStyle, have reaped the rewards of competing. “The most valuable part of the Vision Summit was connecting with three different companies potentially interested in building on our technology, and with four different potential investors/advisors,” said CEO & Co-founder Sean Bell after the Vision Summit.

 Divyaa Ravichandran, from CMU and she showcased her project “Love & Vision” ©Robert Wright/LDV Vision Summit

Divyaa Ravichandran, from CMU and she showcased her project “Love & Vision” ©Robert Wright/LDV Vision Summit

For other 2016 ECVC competitors like Divyaa Ravichandran, who was a recent graduate of Carnegie Mellon University at the time, “attendance at LDV Vision Summit last year gave me visibility and I came in contact with my current employer at Facebook!”

 Rosanna Myers CEO & Co-Founder, Carbon Robotics Startup Competition Winner ©Robert Wright/LDV Vision Summit

Rosanna Myers CEO & Co-Founder, Carbon Robotics Startup Competition Winner ©Robert Wright/LDV Vision Summit

2016 Startup Competition winner, Carbon Robotics, was looking to “connect with recruits and investors in NYC. The experience was great for that. Getting to pitch on the main stage was amazing, because it made it easy for people to learn about what we’re working on. After the pitch, we were approached by tons of high-quality engineers and potential partners, so it was a great success,” said Rosanna Myers CEO & Co-founder.

“Following the summit, [London based startup] The Smalls raised investment with an angel investor who was in the audience. The funding was used to make key hires and improve technology. The Smalls has continued to grow at 300% per year and now has offices in both London and Singapore,” reports CEO & Founder of The Smalls, Kate Tancred after finalizing in the 2015 Startup Competition.

The 2018 LDV Vision Summit on May 23 & 24 in NYC will include over 80 international speakers with the purpose of exploring, understanding, and shaping the future of imaging and video in human communication. The best startups and computer vision experts who compete in the Startup Competition and the ECVC will be showcased alongside these industry leaders.

The Startup Competition is for promising visual technology companies with less than $2M in funding.

The ECVC is for any Computer Vision, Machine Learning and/or Artificial Intelligence students, professors, experts or enthusiasts working on a unique solution that leverages visual data to empower businesses and humanity. It provides contestants the opportunity to showcase the technology piece of a potential startup company without requiring a full business plan. It is a unique opportunity for students, engineers, researchers, professors and/or hackers to test the waters of entrepreneurism.

Competitions are open to anyone working in our visual technology sector such as: photography, videography, medical imaging, analytics, robotics, biometrics, LIDAR, radar, satellite imaging, computer vision, machine learning, artificial intelligence, augmented reality, virtual reality, autonomous vehicles, media and entertainment, gesture recognition, search, advertising, cameras, e-commerce, visual sensors, sentiment analysis and much more.

Judges for the competitions include top industry venture capitalists, entrepreneurs, journalists, media executives and companies that are recruiting. Past judges included Josh Elman of Greylock, Tamara Berg of U. North Carolina, Chapel Hill, Larry Zitnick of Facebook, Andy Weisman of Union Square Ventures, Ramesh Raskar, of MIT Media Lab, Alex Iskold of Techstars, Gaile Gordon from Enlighted, Jessi Hempel of Wired and many more. The list of phenomenal 2017 judges continues to evolve on the 2018 Competition’s website.

All competition sub-finalists will receive remote and in-person coaching by Evan Nisselson and in person mentoring during the sub-finalist judging session by Jan Erik Solem, Rebecca Paoletti, Andy Parsons, Evan Nisselson, Serge Belongie and other experts.

It would be a horrible feeling to be sitting behind your computer or in the audience when someone else presents an idea that you had years ago. Take a risk, prove yourself, compete. 

We are waiting to to see YOUR brilliance!

Enter and present your brilliance at our 2018 LDV Vision Summit Startup Competition or the Entrepreneurial Computer Vision Challenge (ECVC). Application deadline is March 25, 2018.   

Hired at Facebook After Showcasing Research in Visual Technology at the LDV Vision Summit: An interview with Divyaa Ravichandran

 Divyaa Ravichandran, from CMU and she showcased her project “Love & Vision” ©Robert Wright/LDV Vision Summit  

Divyaa Ravichandran, from CMU and she showcased her project “Love & Vision” ©Robert Wright/LDV Vision Summit
 

The LDV Vision Summit is coming up on May 23 & 24, 2018 in New York. Through March 25 we are collecting applications to the Entrepreneurial Computer Vision Challenge and the Startup Competition.

Divyaa Ravichandran was a finalist in the 2016 Entrepreneurial Computer Vision Challenge (ECVC) at the LDV Vision Summit. Her project, “Love & Vision” used siamese neural networks to predict kinship between pairs of facial images. It was a major success with the judges and the audience. We asked Divyaa some questions on what she has been up to over the past year since her phenomenal performance: 

How have you advanced since the last LDV Vision Summit?
After the Vision Summit I began working as an intern at a startup in the Bay Area, PerceptiMed, where I worked on computer vision methods to identify pills. I specifically worked with implementing feature descriptors and testing their robustness in detection tasks. Since October 2016, I’ve been working at Facebook as a software engineer. 

What are the 2-3 key steps you have taken to achieve that advancement?
a. Stay on the lookout for interesting opportunities, like the LDV Vision Summit
b. ALWAYS stay up-to-date in the tech industry so you know what counts and who's who

What project(s)/work is your focus right now at or outside of Facebook?
Without any specifics, I'm working with neural networks surrounded by some of the brightest minds I have come across as yet, and along with the use of Facebook's resources, the opportunities to improve are boundless.
 

 Divyaa Ravichandran  ©Robert Wright/LDV Vision Summit

Divyaa Ravichandran  ©Robert Wright/LDV Vision Summit

What is your proudest accomplishment over the last year?
Snagging this gig with Facebook was kind of the highlight of my year; working on projects that have the potential to impact and improve so many lives has me pretty psyched!

What was a key challenge you had to overcome to accomplish that? How did you overcome it?
I think visibility was one big point: I wasn't highly visible as a candidate for the Facebook team since I had only just graduated from school and didn't have any compelling publications or such to my name. Fortunately, my attendance at the LDV Vision Summit last year gave me that visibility, and the Facebook team got in touch with me because of that.

Did our LDV Vision Summit help you? If yes, how?
Yeah, it was through LDV that I  came in contact with my current employer at Facebook! I also met some really interesting people from some far-off places, like Norway, for instance. It put into perspective how the field is growing the world-over.
 

 Divyaa Ravichandran  ©Robert Wright/LDV Vision Summit

Divyaa Ravichandran  ©Robert Wright/LDV Vision Summit

What was the most valuable aspect of competing in the ECVC for you?
The fact that the summit puts the guys with the money (the VCs) in touch with the guys with the tech (all the people making Computer Vision pitches) really bridges the gap between two shores that I think would do very well juxtaposed with each other. Personally, it opened my eyes to new ideas that people in the field were looking at and what problems they were trying to tackle, something that I wouldn't have been able to think up myself.

What recommendation(s) would you make to teams submitting their projects to the ECVC?
Stay current, but if you're bringing something entirely new to the table, that would be best! Everybody at ECVC is looking to be blown away (I think) so throwing something totally new and unexpected their way is the best way to get their attention.

What is your favorite Computer Vision blog/website to stay up-to-date on developments in the sector?
I generally read Tombone's CV blog, by Tomasz Malisiewicz*, and follow CV conferences like ECCV, ICML, CVPR to look up the bleeding edge in the industry and this usually gives a fair idea of the biggest problems people are looking to tackle in the current age.

*Editor’s Note: Tomasz Malisiewicz was a speaker at the 2016 Vision Summit

Applications to the 2018 ECVC and the Startup Competition at the LDV Vision Summit are due by March 25, apply now.

AutoX is Democratizing Autonomous Driving with a Camera-First Solution

 ©Robert Wright/LDV Vision Summit

©Robert Wright/LDV Vision Summit

Leaving his role as founding director of Princeton's Computer Vision and Robotics Lab, Jianxiong Xiao (Professor X) founded AutoX. He spoke at our LDV Vision Summit 2017 about how he is working to lower the price of entry into the autonomous driving field with an innovative camera-first solution.

Early Bird tickets are now available until March 25 for the LDV Vision Summit 2018 to hear from other amazing visual tech researchers, entrepreneurs and investors.

Today I'm going to talk about AutoX. We're a company working on the self-driving car. Why self-driving cars? If you look at the tech revolution in the past few decades, we have personal computers, we have internet, we have SmartPhones. This tech revolution already changed everyone's life. It's not just a fancy tool for scientists, but we actually changed everyone's life.

If you think about the future, many things are going to happen. But if you think about what is the major difference 30 years ahead from now, one of the biggest things is probably all the cars will be ready to drive by themselves. That's what made me very excited about self-driving cars. Transportation is a huge player in human society as well. So I would see this as one of the biggest applications ever for my expertise in computer vision and robotics.

AutoX is a company focused on self-driving technology with the mission to democratize autonomy.

What does that mean? Here we draw an analogy with the computer technology. If you think about computers in the past few decades, a few decades ago, yes, we do have computers, but each computer is so big, and what's even more is they are so expensive. With a million dollar computer in a huge server room, only a very small number of people in the world, including top scientists, top researchers, can have access to computation. At that time I would say that technology is amazing, but the impact of this technology to society is very, very limited.

Now think about life today. Everyone nowadays has a $500 SmartPhone. In this stage, I would say that this is what truly made me excited about technology, is it creating universal impact for everyone.

If you think about self-driving car technology today, it's pretty similar. Each self-driving car costs $1,000,000, or even more. It's much more expensive than to hire a few people just to drive for you. So self-driving car technology, at this stage, I would say, does not make much sense to the general public.


We believe self-driving cards should not be a luxury, it should be universally accessible to everyone.


At AutoX, our mission is to democratize autonomy. It's to make self-driving cars affordable and at the same time technically robust for the people, for every citizen to use. We believe self-driving cars should not be a luxury, it should be universally accessible to everyone.

If you think about self-driving cars, why are they so expensive? Here is a picture of the Baidu self-driving car. Each car costs about $0.8 million USD. Most of the costs come from the sensor that people use, high-end Differential GPS, use high-end IMU, as well as this monster, the LIDAR. The LIDAR on the top is the Velodyne 64 big LIDAR, costs $80,000 USD these days.

Putting aside the cost of the LIDAR, if you look at the LIDAR data, I would say the autonomous driving industry has a blind faith in LIDAR. For example, the LIDAR has very, very low resolution.

Here is a simple question for you, is this LIDAR protocol representing a pedestrian or not? Look here. Everyone here has perfect intelligence. You may see that, okay, maybe this is a pedestrian. But how about this? Is this a pedestrian, or is it a Christmas tree? In fact, both of them are actually pedestrians, coming from this.

A pedestrian viewed from low resolution is still probably able to recognize, but if you want to drive your car safely, you need to recognize some more subtle detail. Like, for example, the curve of the road. If you cannot recognize the curve of the road, the car is going to drive to the sidewalk, which endangers the human pedestrian. So I would say that high resolution really matters. High resolution enables detailed analysis of complex scenes, which is required for level 5 autonomous driving

 ©Robert Wright/LDV Vision Summit

©Robert Wright/LDV Vision Summit

The other draw-back for LIDAR is that it only depicts the 3D shell of the object. But most complex situations in the world actually depict by the appearance, rather than the 3D shape, such as road marking, traffic signs, curves, traffic lights, and so on. At AutoX, we focus on a Camera-First Solution. We're not against any sensors, but we are focused on using the video camera as our primary sensor, to ensure most of the information necessary for very safe autonomous driving.

We're a company building Full-Stack Software for autonomous driving, which includes understanding perception, the study of auto-dynamic objects, as well as the ability to make decisions and train the car how to drive. The last step of our Full-Stack Software is to control the vehicle, to execute this plan. To train the vehicle to detail the full plan and carry it out.

We're a very young company, we were founded in September 2016. In the past eight months, we're making a tremendous progress. Our company is based in San Jose in California, big enough for doing a lot of testing of autonomous driving.


We are accepting applications to our Vision Summit Entrepreneurial Computer Vision Challenge for computer vision research projects and our Startup Competition for visual technology companies with <$2M in funding. Apply now &/or spread the word.


Here's a demonstration where we're using a purely camera-cased system, with no LIDAR, no radar, no ultrasonic, no Differential GPS, to drive the vehicle. Here, we show some autonomous driving things. On the top left, we're showing our car driving in a dense urban scenario, with a lot of traffic, making turns and so-on. On the bottom left, we show our car driving in a curvy road, making a lot of sharp turns, to demonstrate that it's very important that our car perception system can be able to recognize the road in better detail and in real time.

On the right, we're showing some video we've taken at nighttime. Using the camera, it is still possible to drive at nighttime. To demonstrate the power of this video-based approach. And may I mention, this demo, we're using only cameras with GPS as the only sensor. We're not using any other sensors, but in the production of the cars, we are welcome to other sensors for integration as well. The reason for this demo, is to demonstrate the power of camera. Because personally I believe it is mostly ignored or under-praised by the autonomous driving industry.

In the past eight months, we have built a thing, a very, very small thing, but very good thing, to carry out this mission. And we're very excited to continue following this path to make the self-driving technology become a reality.

 ©Robert Wright/LDV Vision Summit

©Robert Wright/LDV Vision Summit

Here is another video demonstrating our camera-based system, driving under a different scenario. Here, as you know, in California it is actually very difficult to find bad weather. So in the past two months, we're finally finding data when it actually rains, and we're so excited that we brought out the car to take a video like this. You can see that our camera-based system actually drives quite well under the heavy rains, and you can see that here our car is actually driving in a residential neighborhood. There's no real marking on the road, that's also made it particularly challenging. Auto-rendering the road is very challenging to authorize.

Here is another video during a raining day, there we see that our car is going through the bridge. That makes the lighting very bad, then very bright again, but we still demonstrate that this camera-based system is possible to work. Some of you guys can probably recognize where we're driving, doing this test demo. The logo it says here is the city of Cupertino.

As I mentioned, we're a very, very young company. At this very early stage we're still demonstrating the potential for this camera-based system.

Watch the video:

Josh Kopelman Looks to Find the "Investable Window" on New Technologies

At the LDV Vision Summit 2017, Evan Nisselson had the privilege to sit down with Josh Kopelman, the self-described "accidental VC" and Partner at First Round Capital, to discuss investment trends and what Josh looks for in a founder at the seed-stage level

According to Josh, First Round Capital either invests early or way too early. Each technology has a window in which it is investable and you want to avoid funding a company too early. At First Round, they are investing in things that are trying to solve common problems. Watch Josh and Evan's fireside chat to learn more:

 

Our fifth annual LDV Vision Summit will be May 23 & 24, 2018 in NYC. Early bird tickets are currently on sale. Sign up to our LDV Vision Summit newsletter for updates and deals on tickets.

Hearst, AlphaPrime, ENIAC and Samsung Next Talk Opportunities in Visual Tech Investing

At the LDV Vision Summit 2017, Erin Griffith of Fortune spoke with Vic Singh of ENIAC Ventures, Claudia Iannazzo from AlphaPrime Ventures, Scott English of Hearst Ventures and Emily Becher from Samsung Next Start about trends and opportunities in visual technology investing.

Watch their panel discussion to learn more:

Our fifth annual LDV Vision Summit will be May 23 & 24, 2018 in NYC. Early bird tickets are currently on sale. Sign up to our LDV Vision Summit newsletter for updates and deals on tickets.

LDV Capital Raises $10M Second Seed Fund for Visual Technologies

 Evan Nisselson, General Parter &amp; Founder of LDV Capital © Ron Haviv

Evan Nisselson, General Parter & Founder of LDV Capital © Ron Haviv

We are very excited to announce the close of our second fund for investing in people building visual technology businesses at the pre-seed or seed stage. You can read more about it on the Wall Street Journal. Our press release is below. Also check out our Jobs page to learn more about the exciting new roles available with us at LDV Capital.

Press Release -- LDV Capital, the venture fund investing in people building visual technology businesses, today announced a new $10M seed fund. It is the second fund for the thesis-driven firm that specifically invests in deep technical teams that leverage computer vision, machine learning and artificial intelligence to analyze visual data.

Investors in this second fund include top technical experts in the field including Mike Krieger, Instagram Co-founder/CTO and Steve Chen, YouTube Co-founder/CTO. Other investors came from family offices, fund-of-funds, an endowment, a sovereign wealth fund, and more.

“Because of their domain expertise and leadership in visual technology, LDV Capital is at the forefront of innovations in the space. They invest in and empower technical founders with the greatest potential for harnessing the power of computer vision to disrupt industries. The opportunities are tremendous.” Mike Krieger, Instagram, Co-Founder & Director of Engineering.

"Capturing and analyzing visual data with the aid of computers create a paradigm shift in the approach to content. I believe LDV Capital helps founders grow companies at the helm of this evolution." Steve Chen, Youtube, Co-Founder & CTO.

LDV Capital investments at the pre-seed stage include Clarifai - an artificial intelligence company that leverages visual recognition to solve real-world problems for businesses and developers, Mapillary - delivering street-level imagery for the future of maps and data solutions, and Upskill - delivering augmented reality solutions for the industrial workforce. They have assisted their portfolio companies in raising follow-on capital from Sequoia, Union Square Ventures, NEA, Atomico and others.

“Visual technologies are revolutionizing businesses and society,” says LDV Capital General Partner, Evan Nisselson, a renowned thought leader in the visual tech space. “By 2022, our research has found there will be 45 billion cameras in the world capturing visual data that will be analyzed by artificial intelligence. Our goal is to collaborate with technical entrepreneurs who are looking to solve problems, build businesses and improve our world with that visual data.”

LDV’s horizontal thesis spans all enterprise and consumer verticals such as: autonomous vehicles, medical imaging, robotics, security, manufacturing, logistics, smart homes, satellite imaging, augmented/virtual/mixed reality, mapping, video, imaging, biometrics, 3D, 4D and much more.  

Every May, LDV Capital hosts the two-day LDV Vision Summit in NYC known to top technologists, investors and entrepreneurs as the premier global gathering in visual tech. The fifth annual LDV Vision Summit will be May 23 and 24, 2018. Since 2011, LDV Capital also holds invite-only, gender-balanced monthly LDV Community dinners that bring together leading NYC entrepreneurs and investors to help each other succeed. Both are part of their LDV Platform initiatives.

LDV Capital is one of the growing number of single GP funds, founded by Nisselson in 2012 after building four visual technology startups over 18 years in Silicon Valley, NYC and Europe.  The firm boasts an exceptionally strong expert network with their experts-in-residence including computer vision leaders such as Serge Belongie, a professor of Computer Science at Cornell University who also co-founded several companies and Andrew Rabinovich, Director of Deep Learning at Magic Leap, and Luc Vincent, VP of Engineering at Lyft and Gaile Gordon, Vice President Location Products at Enlighted.

Find out more about our open opportunities on our Jobs page.

Building an MRI Scanner 60 Times Cheaper, Small Enough to Fit in an Ambulance

 ©Robert Wright/LDV Vision Summit

©Robert Wright/LDV Vision Summit

Matthew Rosen is a Harvard Professor, he and his colleagues at the MGH/A.A Martinos Center for Biomedical Imaging in Boston are working on applications of advanced biomedical imaging technologies. At the LDV Vision Summit 2017 he spoke about how he is hacking a new kind of MRI scanner that’s fast, small, and cheap.

It's really a pleasure to talk about some of the work we've been doing in my laboratory to revolutionize MRI, not by building more expensive machines with higher and higher magnetic fields, but by going in the other direction. By turning the magnetic field down and reducing the cost, we hope to make medical devices that are inexpensive enough to become ubiquitous.

MRI is the undisputed champion of diagnostic radiology. These are very expensive, massive machines that are really confined to the hospital radiology suite. That's due, in large measure, to the fact that they operate at very high tesla strength magnetic fields. If you imagine taking an MRI scanner and putting it in an environment like a military field hospital, where there may be magnetic shrapnel around, you could really injure someone or worse.

Our approach is to go all the way down at the other end of the spectrum, at around 6.5 millitesla, roughly 500 times lower magnetic field than a clinical scanner, and I'll talk about work we've done in a homemade scanner that's based around a high performance electromagnet, with high performance linear gradients for spatial encoding. You can't really just turn the magnetic field down of an MRI scanner and expect to make high quality images. This really comes down to the way we make measurements in MRI.


LDV Capital is focused on investing in people building visual technology businesses. Our LDV Vision Summit explores how visual technologies leveraging computer vision, machine learning and artificial intelligence are revolutionizing how humans communicate and do business.

Early Bird tickets are now available for the LDV Vision Summit 2018 to hear from other amazing visual tech researchers, entrepreneurs, and investors.

We are accepting applications to our Vision Summit Entrepreneurial Computer Vision Challenge for computer vision research projects and our Startup Competition for visual technology companies with <$2M in funding. Apply now &/or spread the word.


We use inductive detection. This is something you're all familiar with as a child, where you take a magnet and move it through a loop of wire, and you generate a voltage. In this case, the moving magnet actually comes from the nuclear polarization of the water protons typically in your body, and, in fact, what Richard Ernst calls "the powers of evil" has to do with the fact that nuclear magnetic moments are very, very small.

If you're interested in making images of this quality, over the span of a few seconds or minutes, it means you need to make this multiplicative term B, the magnetic field, very, very high. That means that all clinical scanners operate, in the tesla range, typically around 3 tesla. Knowing that, what sort of images do you think we'd be able to make at our field strength, roughly 500 times lower magnetic field, which is a calculated SNR of around 10,000 times lower? Well, you'd probably guess that we couldn't make very good images, and, in fact, you'd be right.

Up until a few years ago, these were the kind of images we were making in our scanner. This is, in fact, a human head, if you can believe it. It's a single slice, took about an hour to acquire, and nobody was very interested in this at all.

If this is all we had, I wouldn't be here today, so let me tell you how we solved these problems.

Really, how do you solve a hard problem? What we've been working on is a suite of technology, half of it based in physics, half of it based in the availability of inexpensive compute. The physics applications are really about improving the signal strength, or the signal-to-noise, coming out of the body and into our detectors, and then the compute side is really about reducing the noise or getting more information from the data we have, or fixing it in post, as some people in this audience might call it.

Let's start really at the beginning, our acquisition strategy. The way you do NMR, or at least the way we do NMR at, remember, very, very low magnetic fields with very, very low signals, is we take our magnetic field. We turn it on. In red is that very small nuclear polarization I talked about. We apply a resonant radiofrequency pulse. We tip the magnetization into the transverse plane, and then we apply a series of coherent radiofrequency pulses to drive that magnetization back and forth very, very rapidly.

Then, again analogous with this inductive detection approach, we detect our signal, but not using a giant hand and a magnet moving, but instead using a 3D printed coil, in this case around the head of my former colleague, Chris LaPierre, to detect this very, very small, but with a very high data rate signal. We call this Balanced Steady-state Free Precession. That's a bunch of words. What it really means is that we now have an approach to very rapidly sample this, although very, very small, signal coming from the head.

What this has allowed us to do is to make images like this. In six minutes, we can make a full 3D dataset, roughly 2.5 millimeter in plane resolution, 15 slices. Just remember, this is the same machine, okay? The difference between these images has to do with the way we interrogate the nuclear spins, the fundamental property of the body of water, in this case, and the way we sample it. That's pretty nice. Having a high data rate actually allows us to now build up even higher quality images by averaging, and those are some images shown here, but there are other approaches, and this is where we start really talking about compute.

Pattern matching is an interesting approach people are very familiar with in the machine learning world, but we all know about this from basic physics. As an example, think of curve fitting, which you could think of as pattern matching. Curve fitting, you have some noisy data, shown as open circles here. You have some model for the way that data depends on some property, say time, so you take your functional form. You fit that function to the data, and you extract not only the magnitude of the effect, but also additional information, in this case a time constant of some NMR CPMG data.

The MRI equivalent of pattern matching is known as magnetic resonance fingerprinting. In contrast to what we did above, where we add up all of these very noisy images to make a higher quality image, in this case, we don't average actually. We just acquire the raw data. You see the data coming in to the lower left. These are very, very noisy, highly under-sampled images that normally you would sum together. The interesting thing we do here is we sort of dither the acquisition parameters a little bit. In the upper left, we show exactly how much we tip the magnetization, and in the upper right, we vary a little bit about the time in between individual acquisitions.

 ©Robert Wright/LDV Vision Summit

©Robert Wright/LDV Vision Summit

What do we do with this data? Well, here is one of those images. I'll plot the time dependence of the signal. We call that the fingerprint. Why do we call it a fingerprint? Well, very much analogous with the partial fingerprint, smudged fingerprint, you might find at a crime scene, there are lots of ridges and valleys and things that distinguish that information or that fingerprint. If you were trying to identify who this fingerprint belonged to, you would search your database, and then you would find, hopefully, a match, which gives you not only the complete fingerprint, which is interesting, but actually it's tied to a record, in this case, my collaborator, Chris Farrar.

What we do in this case, for the MRI equivalent, is we take our MRI fingerprint. We search a database, in this case, of precomputed NMR trajectories, which is the physics that defines how these magnetization depends as a function of time. We find our best match in red. That tells us not only the intensity, M0, of the signal at that particular pixel, but also other parameters, which in this case tell you about the local magnetic environment, both of the machine and of the body.

What does this compute-based pattern matching approach do for our data at low field? Well, in addition to giving us images, like on the first line, which are very similar to the last images I showed you, we get all of this additional information for free. In this case, it's quantitative information, again about the local magnetic environment of the tissue, so-called T1 and T2, as well as properties of the instrument and the local magnetic fields. Okay, so really compute with our noisy data--ah, thank you--compute with our noisy data actually allows us to have more information than we would get with a standard approach.

The last thing I want to talk about, really, is something that my collaborator talked about early on, which is a new thing that we've only talked about publicly for about a month, which is the idea of is there something to be learned from natural vision?

It comes down to a very interesting point, which is that the brain is really, really good at taking noisy data, especially in low light, and doing pattern matching on textures and edges, and at low field, we generate noisy data all day long, so can we take that low SNR data and process it in a framework that's based around the way the retina handles data through the neuronal currents into the reconstructing of a final image through perceptual learning, which is a data-driven, lifelong approach? Can we analogously build a way of handling the voltages coming out in our NMR coil and the actual data to reconstruct images, using a similar data-driven training approach?

 ©Robert Wright/LDV Vision Summit

©Robert Wright/LDV Vision Summit

We call that AUTOMAP, which is automated transform by manifold approximation. It's broader than MRI, but I'll talk about it specifically in this case. It allows us to recast image reconstruction as a supervised learning task. In this case, we train up a joint manifold. One manifold consists of the data, the voltages coming in from the scanner itself, and then the other manifold is the image representation of that.

The reason to do this, and we've built it up as a deep neural network. The reason to do this is that we can take that matrix of sensor data, those voltages coming in again from that inductively ... We're talking macroscopic things here, right? A coil wrapped around the head of a person in a magnetic field. Put that data in on the left side of this. Out comes a reconstructed image, and the reason it does a good job, as I'll show you, is because it not only subsumes the mathematical transform between the sensor data and its final data, but it also takes advantages of properties of natural images, such as image sparsity.

Here's very quickly some examples of this. This is radially sampled MRI data, an SNR of around 100, and the conventional reconstruction, which is a complicated iterative reconstruction looks like this. The same data, fed into AUTOMAP, reconstructs like this. I'm not just cherry picking. It doesn't matter what acquisition strategy you use here. In all cases, you get superior immunity to noise, using this neural network based approach to reconstruct these raw voltages into images.

The interesting thing about this, like all supervised learning approaches, is that it can learn any encoding. That makes it relevant beyond MRI, but also in MRI, because there is a whole zoo of acquisition strategies that people use. This really reminds me of the Google DeepMind Atari Breakout program, right? Where that's interesting, if you've seen this before. A neural network was taught to play Breakout, which is interesting enough, but actually, if you watch this for a while, you'll see that the neural network pretty quickly learned a really good acquisition strategy for playing the game, where it runs the ball up one side and uses the back wall to maximize its points.

Think about that for a minute. Are there optimal ways of sampling this data that we just haven't thought of? You can see, actually, all these encodings shown on the left side are geometric, radial, spiral, Cartesian. That's because we're logical people, and we think about things in terms of geometry, right? But if you let all the parameters run, you can imagine doing much, much better.

In conclusion, I've shown that MRI is possible outside the scanner suite, through a combination of physics and compute, both sensors and sequences, as well as these fingerprinting approaches and AUTOMAP. Now, what are the implications for health care?

Well, fortunately, as you can see the scanner in the upper right, we are not limited to the existing footprint of our test system. The physics and the compute are basically length and variant. They scale. You can build a smaller scanner that takes advantages of a lot of the innovations we've developed. It's really built around the idea of using inexpensive hardware with scalable, mostly GPU-based compute.

The question really is what is the clinical implication of time and resolution, because there's a trade-off between them. Our images will never be as good as a 3 tesla scanner. That's just physics, okay? But every day, clinicians make a decision between speed, specificity, resolution, and cost in medical imaging and in health care. A really good example of a highly optimized version of that is the stethoscope, right? That's a $50 object. Its resolution is like this, if you even want to think of it as having a resolution, but in the hands of a clinician, it can tell if someone has a pneumonia or a cardiac arrhythmia.


Imagine if you could use the MRI scanner as a ubiquitous tool, say that's in a CVS Minute Clinic, military field hospital, sports arena, neuro ICU, chronic care conditions, or at home, monitoring, say, long-term effects of chemotherapy.


Imagine if you could use the MRI scanner as a ubiquitous tool, say that's in a CVS Minute Clinic, military field hospital, sports arena, neuro ICU, chronic care conditions, or at home, monitoring, say, long-term effects of chemotherapy. As long as the cost becomes low enough, and this metric of time versus resolution is positive, net positive, I think it's a really useful tool. This really reminds of, of course, everyone's favorite scene from Wall Street, right?

This is the first time pretty much anyone saw a cellphone, and that was sort of neat, but the cellphone, of course--and this audience knows this very clearly--the cellphone has become useful, because it's ubiquitous. Everyone has one, and that's led to new ways of connecting between people.

Imagine what you can do, just adding layers of data mining and health care and telemedicine on top of the idea of these ubiquitous sensors. With that, I want to acknowledge my group members, both past and present, and, of course, our funding agencies, and you guys for listening. Thanks so much.

We are accepting applications to our Vision Summit Entrepreneurial Computer Vision Challenge for computer vision research projects and our Startup Competition for visual technology companies with <$2M in funding. Apply now &/or spread the word.

JW Player, VH1 and Tout Discuss What's On TV Now

At the LDV Vision Summit 2017, Rebecca Paoletti of Cake Works spoke with Brian Rifkin of JW Player, Michael Downing of Tout and Orlando Lima from Viacom/VH1 about how traditional TV broadcasters are becoming digital while digital platforms are investing in traditional television models and programming.

Listen to their thoughts on creating valuable branded content, applying machine learning, and much more: 

 

Our fifth annual LDV Vision Summit will be May 23 & 24, 2018 in NYC. Early bird tickets are currently on sale. Sign up to our LDV Vision Summit newsletter for updates and deals on tickets.

How Professor Ira Kemelmacher-Shlizerman Built Dreambit & Sold it to Facebook

 © Robert Wright/LDV Vision Summit  

© Robert Wright/LDV Vision Summit
 

Ira Kemelmacher-Shlizerman is a professor at the University of Washington and a Research Scientist at Facebook. At our LDV Vision Summit 2017 she spoke about how and why she evolved her research into a company called Dreambit which she then sold to Facebook.

I'm supposed to give a talk about how to combine academia with industry and so, I chose to do it by telling three stories. I'm both a research scientist at Facebook and a professor at University of Washington. And so, I will make the talk super over simplified and humorous. 

Story number one, I go to The Weizmann Institute of Science to be a grad student and my advisor is Ronen Basri. The first problem that I decided to work on is, I want to be able to take a single photo of a person and reconstruct it through the 3 dimensional shape of her face. It kind of makes sense that we should be able to do it, because as humans when we look at the face - based on the shading, based on the prior knowledge of faces that we have, we can imagine how she looks from different sides, just from a single photo. So, I wanted to create an algorithm that can do it automatically. And it didn't exist when I started my grad studies, so I thought it's a worthwhile problem.

I worked on it and we created a math for doing it from a single photo, and the cool part about the math is that I could apply it to anything. I could reconstruct Mona Lisa, I could reconstruct Clint Eastwood and so on. And we published a paper and compared reason at conference, and I was so, so excited. Here's the business part. 
 


LDV Capital invests in deep technical people building visual technologies. Our LDV Vision Summit explores how visual technologies leveraging computer vision, machine learning and artificial intelligence are revolutionizing how business and society.

Early Bird tickets are now available for the LDV Vision Summit May 23 & 24, 2018 in NYC to hear from other amazing visual tech researchers, entrepreneurs and investors.


I wanted to show the results so I went to my family, to my husband Elliott and my brother Mike and I said, "Check this out! The results I got are so cool." And we started talking and we came up with some business ideas, it was 2006. We said "oh, it can be used in Second Life, and it can be used as avatars in games and all that."

And my brother said "I know this guy, that knows this VC that is going to come from the US next week, should we just talk with him?" And I said "Yes, sure. Let's do it." And so, the VC came from the US and we pitched to him the idea, and he said, "well, that sounds cool. Here's half a million, let's make a business out of it.” And so we were so excited, that we went to celebrate, we were like "oh my gosh this is a paper, we can do a business, a startup". So we went to celebrate in Mexico.


"If you want to swim with the sharks, you have to swim faster - you do not do vacations in Mexico."


While in Mexico, we were doing Skype conversations with a vista firm and negotiating and so on, and at one point they ask us, "where are you?" And we were like, "we're in Mexico". And they freaked out, and I still remember the quote that he said. He said "If you want to swim with the sharks, you have to swim faster - you do not do vacations in Mexico." At that point, it seemed like “maybe I'm not ready to swim with the sharks quite yet,” and it also didn't help that they wanted more than 51 percent of the company. The non-existent business. So we decided not to create a company, learned a bit about VCs, wrote more papers, and I got my PhD. That was exciting.

 © Robert Wright/LDV Vision Summit

© Robert Wright/LDV Vision Summit

Story number two, I finished my PhD, I wanted to do my postdoc at University of Washington. I came to Seattle working with Steve Seitz. The first problem that we wanted to solve, I said, "Okay, so I've been working on this single photo idea of reconstruction, but actually every one of us has so many photos out there. And so it's not just one photo, we're going to have bigger and bigger collections, so wouldn't it be amazing just to visualize those big collections somehow?"

What existed at that time was just slideshows, right? And this is just a random showing of photos, not super exciting. So I started playing with big collections, and I found out that if I focus on the person and just align by the location of the eyes, I already get a really cool effect. I kind of see her grow in front of my eyes. And there is something interesting about visualizing the person through photos.

Eventually we thought it was really cool, and we developed the algorithm further. It was taking into account facial expressions and the head pose and so on. And I started showing those results to Steve, and he loved everything. So, then at that time, it was 2010 or 2011, Steve went to spend time with Google, and he said, "hey this looks so cool and practical, how about I'll show it to my boss at Google."

The boss at Google said, "This is interesting, but let's see on my daughter’s photos." So I did it, I tried it out on his daughter’s photos, and he liked it. Then, I went to spend half a year at Google and with an amazing team we did ship it. The final product was, with the click of a button you could create face movies.


We are accepting applications to our Vision Summit Entrepreneurial Computer Vision Challenge for computer vision research projects and our Startup Competition for visual technology companies with <$2M in funding. Apply now &/or spread the word.


That was exciting, I learned how to make a product, and the product was used by millions. I wrote more papers, I finished my postdoc, and I got a faculty job. I went on the academic market, super competitive, but actually my experience in industry, plus all the papers helped me to get really cool jobs. And it helped me to get my dream job, where I could stay as faculty at UW, same place where I did my postdoc.

Story number three, I'm at University of Washington, but now as a professor. And I established my own group, I have students and we work on all sorts of cool projects and publish papers and so on, but in my free time, kind of as a joke, but I'm deeply concerned about a problem. I want to see will black hair fit me? But I don't want to go and dye my hair before I know if it looks good.

I started kind of as a toy project to render myself with black hair to see how it may look like, and then I continued to render myself with curly hair. And I started building this system, and I built it in a way that it looks like an image search engine, where I could type anything, for example, "India", and imagine how I would look. Or I could even go back in time and type "1930" and imagine how I would look in the 1930's. I kept going and could type any query, different hairstyles, and colors, shaved and traditional, clothing and so on.

 © Robert Wright/LDV Vision Summit

© Robert Wright/LDV Vision Summit

I published a paper about it in SIGGRAPH, it was a single author paper, and it got in, it got accepted. It looked a success story, but I kept working on it and my husband was like, "why are you working on that so hard?" And I said, "I don't know it just seems like a cool, I feel like there is a business around it, I want to establish a company." And so he says, "It does seem exciting, so how about we do it together?" And I said, "Yeah, let's do it." So, we created this company, immediately became the CEO and the CTO and after our kids would go to sleep, we would code.

We bought a ton of equipment to put in our basement and we created a real time system that lets you do what I just described and we're ready to let people in to try it. SIGGRAPH came, and I gave a demo during a talk with them and a bunch of companies, big companies, got interested. SIGGRAPH is known for its parties, and Michael Cohen at Facebook, Steve Seitz at Google, and me were talking. And then Steve just kind of randomly says, "Hey, did you know that Ira has a company now?" And Michael Cohen's like "What?! This is interesting," then one thing leads to another and the company's acquired.

So, I'm at Facebook plus UW now. And it's really fun to do, they're ten minutes away in Seattle. And some lessons were kind of interesting for myself and maybe will be useful for you. Research, academic research means becoming a specialist in a very, very narrow field. And that could be considered as a bad thing maybe because you're in some particular niche, you're maybe stuck. But on the other hand, the way I see it, it's a unique opportunity to know when technology is right for a product, and you're actually in a unique space to do it before everyone else can. Before everyone else realizes.

Making products that millions use is super fun, but I find it really just exciting to just create something that I will use first. Because if everyone else will not like it, then at least one person likes it. Connections you make during school, postdoc, and jobs are the best. Do not forget to go to parties.

Watch Professor Ira Kemelmacher-Schlizerman's keynote at our LDV Vision Summit 2017 below and checkout other keynotes on our videos page.

Early Bird tickets are now available for the LDV Vision Summit May 23 & 24, 2018 in NYC to hear from other amazing visual tech researchers, entrepreneurs and investors.

We are accepting applications to our Vision Summit Entrepreneurial Computer Vision Challenge for computer vision research projects and our Startup Competition for visual technology companies with <$2M in funding. Apply now &/or spread the word.

Lyft & Arteris Discuss How Autonomous Vehicles, the Most Disruptive Innovation of a Generation, Will Impact Society

At the LDV Vision Summit 2017, Josh Brustein of Bloomberg Businessweek asked Taggart Matthiesen of Lyft and Charles Janac from Arteris - how will, autonomous vehicles, the most disruptive innovation of a generation, impact society?

Ultimately, says Charles, autonomous driving will be one of the most meaningful changes in how we move people and goods in the history of the world. It won't be just about manufacturing the vehicle, but creating an integrated experience according to Taggart. Watch their panel discussion to learn more:

Our fifth annual LDV Vision Summit will be May 23 & 24, 2018 in NYC. Early bird tickets are currently on sale. Sign up to our LDV Vision Summit newsletter for updates and deals on tickets.

Glasswing Ventures & GM Ventures Agree, Combining Vision with Additional Functionalities Poses Immense Opportunity

Jessi Hempel of Backchannel sat down with Rudina Seseri of Glasswing Ventures and Rohit Makharia of GM Ventures to discuss trends and investment opportunities in visual technologies at the LDV Vision Summit 2017.

An amalgamation of technologies that work together is most interesting for Rohit at GM. While Rudina says that Glasswing is seeing both startups that are trying to retrofit themselves with vision as part of their value proposition as well as startups, mostly coming out of universities, that are solving a real technical problems with vision. They both agree that multimodal functionality on devices - i.e. vision, voice, touch, etc - will open a whole new universe of experiences and products. Watch their panel discussion to learn more:

Our fifth annual LDV Vision Summit will be May 23 & 24, 2018 in NYC. Early bird tickets are currently on sale. Sign up to our LDV Vision Summit newsletter for updates and deals on tickets.

Albert Wenger's Views On Investment Opportunities In Visual Technologies & Data Network Effects

Evan Nisselson had the opportunity to sit down with Albert Wenger, Managing Partner at Union Square Ventures to discuss future investment trends and early stage opportunities at the LDV Vision Summit 2017.

According to Albert, the key to generating above average investment returns, is going where others aren't. Watch their fireside chat to learn more:

Our fifth annual LDV Vision Summit will be May 23 & 24, 2018 in NYC. Early bird tickets are currently on sale. Sign up to our LDV Vision Summit newsletter for updates and deals on tickets.

45 Billion Cameras by 2022 Fuel Business Opportunities

LDV Capital - 5 Year Visual Tech Market Analysis 2017.001.jpeg

Exclusive research by us at LDV Capital is the first publicly shared, in-depth analysis which estimates how many cameras will be in the world in 2022. We believe it is a conservative forecast as a additional sectors will be included in future research.

The entire visual technology ecosystem is driving and driven by the integration of cameras and visual data. Visual technologies are any technologies that capture, analyze, filter, display or distribute visual data for businesses or consumers. They typically leverage computer vision, machine learning and artificial intelligence. 

Over the next five years there will be a proliferation of cameras integrated into products across industries and markets. A paradigm shift will take place in the meaning and use of a camera.

Taking into account the industries that will embed cameras into products, those that will add additional cameras to products, and new vision-enabled products that will arise, the number of cameras will grow at least 220% in the next five years. 

This growth in cameras delivers tremendous insight into business opportunities in the capture, analysis and interpretation of visual data. Cameras are no longer just for memories. They are becoming fundamental to improving business and society. Most of the pictures captured will never be seen by a human eye.

This 19 page report is the first of a multi-phased market analysis of the visual technology ecosystem by LDV Capital. Facts and trends include:

  • Global Camera Forecast
  • Paradigm Shift in Visual Data Capture
  • Depth Capture & New Verticals Driving Growth
  • LDV Market Segments To Watch
  • Visual Technology Ecosystem Growth
  • Processing Advances Enable Leaps in Visual Analysis
  • War Over Artificial Intelligence Will Be Won with Visual Data

Key Findings:

  • Most of the pictures captured will never be seen by a human eye.
  • A paradigm shift will take place in the meaning and use of a camera.
  • Over the next five years there will be a proliferation of cameras integrated into products across industries and markets.
  • Where there is growth in cameras there will be tremendous business opportunities in the capture, analysis and interpretation of visual data.
  • Depth capture will double the number of cameras in handheld cameras.
  • By 2022, the number of cameras will be nearly 12X the 2012 figures.
  • Your smartphone will have between 4 and 10 cameras by 2022.
  • The Internet of Eyes will be larger than the Internet of Things. 
  • In the next five years, robotics will have 20X more integrated cameras.
  • By 2022, all new vehicles will be equipped with more than 25 cameras and this does not include Lidar or Radar.

Download the full report from our Insights page.

We look forward to hearing your insights, learning about your startups and reading your research papers on how businesses are addressing these challenges and opportunities.

Timnit Gebru Wins 2017 ECVC: Leveraging Computer Vision to Predict Race, Education and Income via Google Streetview Images

 Timnit Gebru, Winner of the 2017 ECVC © Robert Wright/LDV Vision Summit

Timnit Gebru, Winner of the 2017 ECVC © Robert Wright/LDV Vision Summit

Our annual LDV Vision Summit has two competitions. Finalists receive a chance to present their wisdom in front of 600 top industry executives, venture capitalists, and companies recruiting. The winning competitor is also awarded $5,000 Amazon AWS credits. The competitions:

1. Startup competition for promising visual technology companies with less than $2M in funding

2. Entrepreneurial Computer Vision Challenge (ECVC) for computer vision and machine learning students, professors, experts or enthusiasts working on a unique solution to empower businesses and humanity.

Competitions are open to anyone working in our visual technology sector such as: empowering photography, videography, medical imaging, analytics, robotics, satellite imaging, computer vision, machine learning, artificial intelligence, augmented reality, virtual reality, autonomous cars, media and entertainment, gesture recognition, search, advertising, cameras, e-commerce, visual sensors, sentiment analysis, and much more.

The ECVC provides contestants the opportunity to showcase the technology piece of a potential startup company without requiring a full business plan. It provides a unique opportunity for students, engineers, researchers, professors and/or hackers to test the waters of entrepreneurism in front of a panel of judges including top industry venture capitalists, entrepreneurs, journalists, media executives and companies recruiting.

For the 2017 ECVC we had an outstanding lineup of finalists, including:

  • Timnit Gebru, PhD from Stanford University on “Predicting Demographics Using 50 Million Images”
  • Anurag Sahoo, CTO and Mick Das, CPO of Aitoe Labs
  • Akshay Bhat, PhD Candidate and Charles Herrmann, PhD Candidate from Cornell University on “Deep Video Analytics”
  • Elena Bernardis, PhD of the University of Pennsylvania Children’s Hospital with “Spot It - Quantifying Dermatological Conditions Pixel-by-Pixel”
  • Bo Zhu, PhD of Harvard Medical School’s Martinos Center for Biomedical Imaging presenting “Blink” about synthetical human vision
  • Gabriel Brostow from University College London with “MonoVolumes” a combination of MonoDepth and Volume Completion to understand 3D scene layout

Congratulations to our 2017 LDV Vision Summit Entrepreneurial Computer Vision Challenge Winner: Timnit Gebru  

 © Robert Wright/LDV Vision Summit

© Robert Wright/LDV Vision Summit

What was the focus of your winning research project?
We used computer vision algorithms to detect and classify cars in 50 million Google Street View images. We then used the characteristics of these detected cars to predict race, education, income levels, voting patterns and income segregation levels. We were even able to see which city has the highest/lowest per capita CO2 footprint.
 
As a PhD candidate - what were your goals for attending our LDV Vision Summit? Did you attain them?
I mostly wanted to meet other people in the field who might have ideas for future work or collaborations. After the competition, I was contacted by venture capitalists and people whose startups are working on related things. In addition to that, I received some interesting ideas from  conference attendees (e.g. analyzing the frequency of trash collection in neighborhoods to get some signal regarding neighborhood wealth).
 
Why did you apply to our LDV Vision Summit ECVC? Did it meet or beat your expectations and why?
I applied because Serge Belongie (Professor at Cornell Tech and Expert in Residence at LDV Capital) thought it was a good idea. One of his many research interests is similar to my line of work. Since our work has real world applications, I think he felt that presenting it to the LDV community would help us think of ways to make it more accessible. I didn’t know what to expect but it definitely beat my expectations. I have never been at a conference that brings together entrepreneurs who are specifically interested in computer vision. I didn’t know there that the vision community was so large, and that many VCs were thinking of companies with a computer vision focus (this is different from thinking of AI in general).
 
Why should other computer vision, machine learning and AI researchers attend next year?
This is unlike any other conference out there because it is the only conference I know of that is only focused on computer vision but also brings together researchers, investors and entrepreneurs. 
 

 © Robert Wright/LDV Vision Summit

© Robert Wright/LDV Vision Summit

What was the most valuable part of your LDV Vision Summit experience aside from winning the ECVC?
Meeting others whose work is in a similar space: for example, people who founded companies that are based on analyzing publicly available visual data. One of the judges founded such a company. It helped me think of ways in which my research could be commercialized (if I decided to go that route).
 
Do you have any advice for researchers & PhD candidates that are thinking about evolving their research into a startup business and/or considering submitting their work to the ECVC?
I advise them to think of who exactly their product would benefit and what their API would be like. Even though I was an entrepreneur for about a year, I am still coming from a research background. So I wasn’t thinking about who exactly the customers of my work would be (except for other researchers) until my mentoring sessions with Evan [Nisselson, GP of LDV Capital].
 
What are you looking to do with your research & skills now that you have completed your PhD?
I will be a postdoctoral researcher continuing the same line of work but also studying the societal effects of machine learning and trying to understand how to create fair algorithms. We know that machine learning is being used to make many decisions. For example, who will get high interest rates in a loan, who is more likely to have high crime recidivism rates, etc...The way our current algorithms work, if they are fed with biased datasets, they will output biased conclusions. A recent ProPublica investigation started a debate on the use of machine learning to predict crime recidivism rates. I am very worried about the use of supervised machine learning algorithms in high stakes scenarios.
 

 © Robert Wright/LDV Vision Summit

© Robert Wright/LDV Vision Summit

Thank You for Making Our 4th Annual LDV Vision Summit a Success!

 Startup Competition Judges Day 2: (in no particular order) Judy Robinett, JRobinett Enterprises, Founder, Author "How to Be a Power Connector", Tracy Chadwell, 1843 Capital, Founding Partner, Vic Singh, General Partner, ENIAC Ventures, Zack Schildhorn, Lux Capital, Partner, Jenny Fielding, Techstars, Managing Director, Emily Becher, Samsung, Managing Director, Clayton Bryan, 500 Shades, 500 Startups Fund, Venture Partner. Dorm Room Fund, Partner, Jessica Peltz-Zatulove, KBS Ventures, Partner, Eric Jensen, Aura Frames, CTO, Claudia Iannazzo, AlphaPrime, Managing Partner, Scott English, Hearst Ventures, Managing Director©Robert Wright/LDV Vision Summit

Startup Competition Judges Day 2: (in no particular order) Judy Robinett, JRobinett Enterprises, Founder, Author "How to Be a Power Connector", Tracy Chadwell, 1843 Capital, Founding Partner, Vic Singh, General Partner, ENIAC Ventures, Zack Schildhorn, Lux Capital, Partner, Jenny Fielding, Techstars, Managing Director, Emily Becher, Samsung, Managing Director, Clayton Bryan, 500 Shades, 500 Startups Fund, Venture Partner. Dorm Room Fund, Partner, Jessica Peltz-Zatulove, KBS Ventures, Partner, Eric Jensen, Aura Frames, CTO, Claudia Iannazzo, AlphaPrime, Managing Partner, Scott English, Hearst Ventures, Managing Director©Robert Wright/LDV Vision Summit

Our 2017 Annual LDV Vision Summit was an absolutely amazing event, thanks to all of you brilliant people.

YOU are why our annual LDV Vision Summit gathering is special and a success every year. Thank You!

We are honored that you fly in from around the world each year to share insights, inspire, do deals, recruit, raise capital and help each other succeed!  

Congratulations to our competition winners:
- Startup Competition:  Fantasmo.io, Jameson Detweiler, Co-Founder & CEO
- Entrepreneurial Computer Vision Challenge: Timnit Gebru, Stanford Artificial Intelligence Laboratory, PhD Candidate

"LDV is a really interesting intersection of technologists, researchers, large tech companies, investors and entrepreneurs. There is nothing else like this out there. People are very open to sharing and helping the community advance together." Jameson Detweiler, Fantasmo.io Co-Founder & CEO

"I've never seen a conference like this - you have pure computer vision conferences like CVPR or ICCV or you have GTC-type conferences that are based on one company's resources.  This is an interesting mix of something computer vision and entrepreneurial - it is very unique in that sense, I have never seen anything like it before. It is a lot of fun." Timnit Gebru, PhD Candidate at Stanford Artificial Intelligence Laboratory

 Day 2  Fireside Chat : Albert Wenger, Partner at Union Square Ventures And Evan Nisselson, General Partner at LDV Capital &nbsp;©Robert Wright/LDV Vision Summit

Day 2 Fireside Chat: Albert Wenger, Partner at Union Square Ventures And Evan Nisselson, General Partner at LDV Capital  ©Robert Wright/LDV Vision Summit

A special thank you to Rebecca Paoletti and Serge Belongie as the summit would not exist without collaborating with them!

“Loved hearing about all the practical applications for computer vision at LDV Vision Summit. Feels like the time has finally come for amazing transformation!" Jenny Fielding, Managing Partner at TechStars

The quotes below from our community is why we created our LDV Vision Summit. We could not have succeeded without the tremendous support from all of our partners and sponsors:

 Panel Day 1:&nbsp;Trends and Investment Opportunities in Visual Technologies Moderator: Jessi Hempel, Backchannel, Head of Editorial with Panelists:&nbsp;Rudina Seseri, Glasswing Ventures, Founder &amp; Managing Partner and Rohit Makharia, GM Ventures, Sr. Investment Manager ©Robert Wright/LDV Vision Summit

Panel Day 1: Trends and Investment Opportunities in Visual Technologies
Moderator: Jessi Hempel, Backchannel, Head of Editorial with Panelists: Rudina Seseri, Glasswing Ventures, Founder & Managing Partner and Rohit Makharia, GM Ventures, Sr. Investment Manager
©Robert Wright/LDV Vision Summit

"The LDV Vision Summit is vibrant, all around me there is so much curiosity and conversation because it is the people who are working on the very edge of these new technologies. These are the conversations that are going to make everything happen and you can just feel that when you're here." Jessi Hempel, Head of Editorial at Backchannel

"My main takeaway is that there are lots of people focused on so many aspects of bringing computer vision to market. This reaffirms my belief that vision is going to play a central role in so many aspects of our lives - from enterprise to retail to autonomous vehicles, etc. The LDV Vision Summit is geeky + fun. It is a collaborative, vibrant environment that brings together a community of likeminded people with very different backgrounds." Rohit Makharia, Senior Investment Manager at GM Ventures

"The LDV Vision Summit is very unique, usually academic conferences are very research focused and business conferences are business orientated. This is a unique combination of the two and, especially in a field like computer vision, with the way that it is growing, it seems very necessary. This is a fantastic place to meet both researchers and business people." Ira Kemelmacher-Shlizerman Research Scientist at Facebook and Assistant Professor at U. Washington (Sold Dreambit to Facebook)

“The energy is amazing, everyone is curious, interested outside of their wheelhouse. Everyone wants to see what is the next big thing and what are the big things that are happening right now.” Matt Rosen, Director, Low-field MRI Lab at MGH/Martinos Center for Biomedical Imaging

“There have been a lot of very exciting discussions around visual technology and autonomous driving. It is interesting to see many different perspectives on it from sensors, from AI, from computer vision, all these different perspectives coming together. It is still a futuristic technology that we want to address and the LDV Vision Summit is great because it gathers top scientists and researchers as well as VCs to discuss how to get to that future.” Jianxiong Xiao, "ProfessorX", Founder & CEO of AutoX

"LDV Vision Summit looks at the cutting edge of all visual technology...you have a lot of brainpower in the room and you can feel the wheels turning as you watch the speakers."  Mia Tramz, Managing Editor, LIFE VR at Time Inc

"Computer vision sits at the heart of the big emerging platforms including autonomous transport, robotics, AR and AI. The LDV Summit provided a great foray into the future of computer vision and more importantly the impact it has on market sectors today through an impressive lineup of speakers, presenters, domain experts and startups." Vic Singh, Founding General Partner, Eniac Ventures

 Keynote Day 1:  Godmother of VR Delivers Immersive Journalism to Tell Stories That Hopefully Make a Difference and Inspire People To Care,&nbsp; Nonny de la Peña, Godmother of VR, Embelmatic ©Robert Wright/LDV Vision Summit

Keynote Day 1: Godmother of VR Delivers Immersive Journalism to Tell Stories That Hopefully Make a Difference and Inspire People To Care, Nonny de la Peña, Godmother of VR, Embelmatic ©Robert Wright/LDV Vision Summit

“The business sector that is going to be most disrupted by computer vision and AI in the short term is transportation, so companies like Uber, taxi companies and the entire car and automatiove industry will completely change in the next  years. The coolest thing I learned this morning was from the godmother of VR, how they are looking to change journalism and the way we capture events. The Vision Summit is pretty amazing, I am really impressed by the content, I am really glad I made it.” Clement Farabet VP of AI Infrastructure at Nvidia (Sold MADBITS to Twitter)

“We are seeing visual technologies, especially combined with AI and machine learning, disrupt a broad array of existing markets and create new ones. From the role they are playing in autonomous vehicles, to transforming marketing technologies, to the roles they are playing in physical and cyber security - and of course the role they are playing around consumer electronics and robotics. It is comforting to know everyone is just as excited as I am about computer vision and AI, and to see how big the opportunity is and how early in the cycle we are as well.” Rudina Seseri, Founder & Managing Partner of Glasswing Ventures

"My second time attending the LDV Vision Summit was even better than the first.  A great mix of accomplished technical people and energetic young entrepreneurs." Dave Touretzky, Research Professor, Computer Science at Carnegie Mellon University

"It was fascinating to see a broad range of new visual technologies. I left the Summit full of ideas for new applications." Tom Bender, Co-Founder of Dreams Media, Inc.

“The LDV Vision Summit gave me the opportunity to discover new applications of computer vision and meet leaders at the forefront of really interesting innovations and startups.” Elodie Mailliet Storm JSK Fellow in Media Innovation at Stanford.

 "This cross-pollination of all different sectors is quite unique - especially coming from an academic setting. To interact with all of these different folks from industry, research and sciences, and from media really inspires me to think about all sorts of new ideas." Bo Zhu, Postdoctoral Research Fellow at MGH/Martinos Center for Biomedical Imaging

"It was enlightening and fascinating to see the potential of the tech that's driving a visual communications revolution." Scott Lewis Photography

 Panel Day 2:  What’s On Now? &nbsp; Moderator: Rebecca Paoletti, Cake Works, CEO with Panelists: Brian Rifkin, JW Player, Co-Founder, SVP Strategic Partnerships, Michael Downing, Tout, Founder &amp; CEO, Orlando Lima, Viacom/VH1, VP Digital ©Robert Wright/LDV Vision Summit

Panel Day 2: What’s On Now? 
Moderator: Rebecca Paoletti, Cake Works, CEO with Panelists: Brian Rifkin, JW Player, Co-Founder, SVP Strategic Partnerships, Michael Downing, Tout, Founder & CEO, Orlando Lima, Viacom/VH1, VP Digital ©Robert Wright/LDV Vision Summit

"It is a great opportunity to meet diverse people from all different industries, a good opportunity to network with interesting talks." James Philbin, Senior Director of Computer Vision at Zoox

"If you work in visual tech, you simply can't afford to miss the LDV Summit – it's a two-day power punch of engaging talks and wicked smart attendees." Rosanna Myers, Co-Founder & CEO of Carbon Robotics

"The LDV Vision Summit is somewhere in between an academic workshop and a venture capital roundtable - it is the kind of event that didn't exist before. You have academics, researchers, grad students, professors but you also have investors, VC and angels like you've never had before. It is very high energy, the atmosphere here is fun to see the two worlds come together. From the academic side, there are grad students and other researchers who have been inside a safe bubble for a long time. They are starting to hear that visual tech are really promising and they are curious about what is going on in the entrepreneurial world and the big companies out there. This is an event where there is enough familiar content for them to feel at home but enough new content, new people, contacts and so on to go outside of their comfort zone." Serge Belongie, Professor of Computer Vision at Cornell Tech

"The LDV Summit is two curated days of outside the box ideas with the key players from diverse industries that are collectively creating the future." Brian Storm, Founder & Executive Producer at MediaStorm
 

 ECVC judges Day 1 (L to R)&nbsp;- Aaron Hertzmann, Adobe, Principal Scientist, &nbsp;Ira Kemelmacher-Shlizerman Facebook, Research Scientist, U. Washington, Assist. Professor, Andrew Zhai, Pinterest, Software Engineer, Tali Dekel, Google, Research Scientist, Yale Song, Yahoo, Senior Research Scientist,&nbsp;Jan Erik Solem, Mapillary, CEO &amp; Co-founder (not pictured:&nbsp;Vance Bjorn, CertifID, CEO &amp; Co-Founder, Rudina Seseri, Glasswing Ventures, Founder &amp; Managing Partner, James Philbin, Zoox, Senior Director, Computer Vision, Josh Kopelman, First Round Capital, Managing Partner, Clement Farabet, Nvidia, VP AI Infrastructure, Adrien Treuille, Carnegie Mellon University, Assistant Professor, Serge Belongie, Cornell Tech, Professor, Manohar Paluri, Facebook, Manager, Computer Vision Group, Rohit Makharia, GM Ventures, Sr. Investment Manager) @Robert Wright/LDV Vision Summit

ECVC judges Day 1 (L to R) - Aaron Hertzmann, Adobe, Principal Scientist,  Ira Kemelmacher-Shlizerman Facebook, Research Scientist, U. Washington, Assist. Professor, Andrew Zhai, Pinterest, Software Engineer, Tali Dekel, Google, Research Scientist, Yale Song, Yahoo, Senior Research Scientist, Jan Erik Solem, Mapillary, CEO & Co-founder
(not pictured: Vance Bjorn, CertifID, CEO & Co-Founder, Rudina Seseri, Glasswing Ventures, Founder & Managing Partner, James Philbin, Zoox, Senior Director, Computer Vision, Josh Kopelman, First Round Capital, Managing Partner, Clement Farabet, Nvidia, VP AI Infrastructure, Adrien Treuille, Carnegie Mellon University, Assistant Professor, Serge Belongie, Cornell Tech, Professor, Manohar Paluri, Facebook, Manager, Computer Vision Group, Rohit Makharia, GM Ventures, Sr. Investment Manager) @Robert Wright/LDV Vision Summit

"Evan sets the tone with a lot of energy, it is pretty amazing. I am typically around a lot of engineers and it is always great to get Evan up there with his big energy - he asks you honest questions. I also spend a lot of time in the hallway because you get to meet people from other years and keep up those relationships. This is an awesome opportunity to meet the whole mix, from employers, to startup people and investors." Oscar Beijbom, Machine Learning Lead at nuTonomy

"The LDV Summit is the perfect combination of a window into the future of some of the most interesting technologies and a welcoming place to make new connections. " Tracy Chadwell, Founding Partner of 1843 Capital

"The community that is assembled here, isn't anywhere else. There's not a place where all the operators in the computer vision space are in the same place at the same time. Everybody here is capturing the electricity of whats going on inside computer vision right now and being surrounded by everybody who cares about it like you do, is really invigorating. I was just having beers with the head of Uber ATG and he's making self-driving cars, I'm never going to, but he had an optimization method that is absolutely applicable to a thing I am working on, fighting human trafficking. The cross-disciplinary nature of this group creates a lot of opportunities to learn about techniques that are absolutely applicable to your problem domain that you would never see anywhere else. If you are into computer vision this is a place you need to be every year." Rob Spectre, Brooklyn Hacker. Former VP Developer Network at Twilio

"The summit far surpassed my expectations. The bringing together of entrepreneurs, researchers, executives, and investors provided for an exchange of ideas not usually possible in other forums. I definitely recommend the summit for anyone tangentially associated with computer vision and visual technologies!" Joshua David Cotton

 ©Dean Meyers/Vizworld

©Dean Meyers/Vizworld

 Fireside Chat Day 1: Josh Kopelman, Managing Partner of First Round Capital and Evan Nisselson, General Partner of LDV Capital @Robert Wright/LDV Vision Summit

Fireside Chat Day 1: Josh Kopelman, Managing Partner of First Round Capital and Evan Nisselson, General Partner of LDV Capital @Robert Wright/LDV Vision Summit

 Keynote Day 1:&nbsp; How and Why Did University of Washington Professor Ira Kemelmacher-Shlizerman Build Dreambit and Sell To Facebook ,&nbsp;Ira Kemelmacher-Shlizerman, Facebook, Research Scientist University Washington, Assist. Professor ©Robert Wright/LDV Vision Summit

Keynote Day 1: How and Why Did University of Washington Professor Ira Kemelmacher-Shlizerman Build Dreambit and Sell To Facebook, Ira Kemelmacher-Shlizerman, Facebook, Research Scientist University Washington, Assist. Professor ©Robert Wright/LDV Vision Summit

Learn more about our partners and sponsors:

Organizers:
Presented by Evan Nisselson, LDV Capital
Video Program: Rebecca Paoletti, CakeWorks, CEO
Computer Vision Program: Serge Belongie, Cornell Tech
Computer Vision Advisors: Jan Erik Solem, Mapillary; Samson Timoner, Cyclops; Luc Vincent, Lyft; Gaile Gordon, Enlighted; Alexandre Winter, Netgear; Avi Muchnick, Adobe
Universities: Cornell Tech, School of Visual Arts, International Center of Photography
Sponsors: Amazon AWS, Facebook, GumGum, JWPlayer
Media Partners: Kaptur, VizWorld, The Exponential View
Coordinators Entrepreneurial Computer Vision Challenge: Hani Altwaijry, Cornell University, Doctor of Philosophy in Computer Science, Shaojun Zhu, Rutgers University, Doctor of Philosophy Candidate in Computer Science, and Abhinav Shrivastava, Carnegie Mellon University, Doctor of Philosophy in Robotics (Vision & Perception)

AWS Activate Amazon Web Services provides startups with low cost, easy to use infrastructure needed to scale and grow any size business. Some of the world’s hottest startups including Pinterest, Instagram, and Dropbox have leveraged the power of AWS to easily get started and quickly scale.  

CakeWorks is a boutique digital video agency that launches and accelerates high-growth media businesses. Stay in the know with our weekly video insider newsletter. #videoiscake

Cornell Tech is a revolutionary model for graduate education that fuses technology with business and creative thinking. Cornell Tech brings together like-minded faculty, business leaders, tech entrepreneurs and students in a catalytic environment to produce visionary ideas grounded in significant needs that will reinvent the way we live.

 Panel Day 2:  Trends and Investment Opportunities in Visual Technologies  Moderator: Erin Griffith, Fortune, Senior Writer with Panelists: Vic Singh, General Partner, ENIAC Ventures, Claudia Iannazzo, AlphaPrime Ventures Managing Partner &amp; Co-Founder, Scott English, Hearst Ventures, Managing Director, Emily Becher Managing Director, Head of Samsung Next Start &nbsp; ©Robert Wright/LDV Vision Summit

Panel Day 2: Trends and Investment Opportunities in Visual Technologies
Moderator: Erin Griffith, Fortune, Senior Writer with Panelists: Vic Singh, General Partner, ENIAC Ventures, Claudia Iannazzo, AlphaPrime Ventures Managing Partner & Co-Founder, Scott English, Hearst Ventures, Managing Director, Emily Becher Managing Director, Head of Samsung Next Start  
©Robert Wright/LDV Vision Summit

Facebook’s mission is to give people the power to share and make the world more open and connected. Achieving this requires constant innovation. Computer Vision researchers at Facebook invent new ways for computers to gain a higher level of understanding cued from the visual world around us. From creating visual sensors derived from digital images and videos that extract information about our environment, to further enabling Facebook services to automate visual tasks. We seek to create magical experiences for the people who use our products.

JW Player is the world’s largest network-independent video platform.  The company’s flagship product, JW Player, is live on more than 2 million sites with over 1.3 billion monthly unique viewers across all devices — OTT, mobile and desktop.  In addition to the player, the company’s services include advertising, analytics, data services, video hosting and streaming.

GumGum is a leading computer vision company with a mission to unlock the value of every online image for marketers. Its patented image-recognition technology delivers highly visible advertising campaigns to more than 400 million users as they view pictures and content across more than 2,000 premium publishers.

The International Center of Photography is the world’s leading institution dedicated to the practice and understanding of photography and the reproduced image in all its forms. Since its founding in 1974, ICP has presented more than 700 exhibitions and offered thousands of classes, providing instruction at every level.

 Day 2 Keynote:  100 Million Pictures of Human Cells and Computer Vision Will Accelerate the Search for Disease Treatments  Blake Borgeson, Recursion Pharmaceuticals, CTO &amp; Co-Founder ©Robert Wright/LDV Vision Summit

Day 2 Keynote: 100 Million Pictures of Human Cells and Computer Vision Will Accelerate the Search for Disease Treatments
Blake Borgeson, Recursion Pharmaceuticals, CTO & Co-Founder
©Robert Wright/LDV Vision Summit

Kaptur is the first magazine about the photo tech space. News, research and stats along with commentaries, industry reports and deep analysis written by industry experts.

LDV Capital Investing in people around the world who are creating visual technology businesses with deep domain expertise.

Mapillary is a community-based photomapping service that covers more than just streets, providing real-time data for cities and governments at scale. With hundreds of thousands of new photos every day, Mapillary can connect images to create an immersive ground-level view of the world for users to virtually explore and to document change over time.

The MFA Photography, Video and Related Media Department at the School of Visual Arts is the premiere program for the study of Lens and Screen Arts. This program champions multimedia integration, interdisciplinary activity, and provides ever-expanding opportunities for lens-based students. 

VizWorld.com covers news and the community engaged in applied visual thinking, from innovation and design theory to technology, media and education. VizWorld is also a contributing member of the Virtual Reality/Augmented Reality Association. From the whiteboard to the latest OLED screens and HMDs, graphic recording to movie making and VR/AR/MR, VizWorld readers want to know how to put visual thinking to work and play. SHOW US your story!

AliKat Productions is a New York-based event management and marketing company: a one-stop shop for all event, marketing and promotional needs. We plan and execute high-profile, stylized, local, national and international events, specializing in unique, targeted solutions that are highly successful and sustainable. #AliKatProd

Robert Wright Photography clients include Bloomberg Markets, Budget Travel, Elle, Details, Entrepreneur, ESPN The Magazine, Fast Company, Fortune, Glamour, Inc. Men's Journal, Newsweek (the old one), Outside, People, New York Magazine, New York Times, Self, Stern, T&L, Time, W, Wall Street Journal, Happy Cyclist and more…

Prime Image Media works with clients large and small to produce high quality, professional video production. From underwater video to aerial drone shoots, and from one-minute web videos to full blown television pilots... if you want it produced, they can do it.

 We are a family affair! Serge, August, Kirstine and Emilia Belongie along with Evan Nisselson celebrating Timnit Gebru's win in the 2017 Entrepreneurial Computer Vision Challenge. See you next year! #carpediem ©Robert Wright/LDV Vision Summit

We are a family affair! Serge, August, Kirstine and Emilia Belongie along with Evan Nisselson celebrating Timnit Gebru's win in the 2017 Entrepreneurial Computer Vision Challenge. See you next year! #carpediem
©Robert Wright/LDV Vision Summit

Computer Vision Delivers Contextual And Emotionally Relevant Brand Messages

The power of object recognition and the transformative effect of deep learning to analyze scenes and parse content can have a lot of impact in advertising. At the 2016 Annual LDV Vision Summit, Ken Weiner CTO at GumGum told us about the impact of image recognition and computer vision in online advertising.

The 2017 Annual Vision Summit is this week, May 24 &25, in NYC. Come see new speakers discuss the intersection of business and visual tech.

I’m going to talk a little bit about advertising and computer vision and how they go together for us at GumGum. Digital images are basically showing up everywhere you look. You see them when you're reading editorial content. You see them when you're looking at your social feeds. They just can't be avoided these days. GumGum has basically built a platform with computer vision engineers that tries to identify a lot of information about the images that we come across online. We try to do object detection. We look for logos. We detect brand safety, sentiment analysis, all those types of things. We basically want to learn as much as we can about digital photos and images for the benefit of advertisers and marketers.

The question is: what value do marketers get from having this information? Well, for one thing, if you're a brand, you really want to know: how are users out there engaging with your brand? We look at the fire hose of social feeds. We would look for, for example, at brand logos. In this example, Monster Energy drink wants to find all the images out there where their drink appears in the photo. You have to remember about 80% of the photos out there might have no textual information that’s going to identify the fact that Monster is involved in this photo, but they are. You really need computer vision in order to understand that.

Why do they do that? They want to look at how people engage with them. They want to look at how people are engaging with their competitors. They may want to just understand what is changing over time. What are maybe some associations with their brand that they didn't know about that might come up. For example, what if they start finding out that Monster Energy drinks are appearing in all these mountain biking photos or something? That might give them a clue that they should go out and sponsor a cycling competition. The other thing they can find out with this is who are their main brand ambassadors and influencers out there. Tools like this give them a chance to connect with those people.


What makes [in-image] even more powerful is if you can connect the brand message with that image in a very contextual way and tap into the emotion that somebody’s experiencing when they’re looking at a photo.

-Ken Weiner


Another product that’s been very successful for us is something we call in-image advertising. We came up with this kind of unit about eight years ago. It was really invented to combat what people call banner blindness, which is the notion that, out on a web page, you start to learn to ignore the ads that are showing at the top and the side of the page. If you were to place brand messages right in line with content that people are actively engaged with, you have a much better chance of reaching the consumer. What makes it even more powerful is if you can connect the brand message with that image in a very contextual way and tap into the emotion that somebody’s experiencing when they’re looking at a photo. Just the placement alone for an ad like this receives 10x the performance of traditional advertising because it’s something that a user pays attention to.

Obviously, we can build a big database of information about images and be able to contextually place ads like this, but sometimes situations will come from advertisers that won’t be able to draw upon our existing knowledge. We’ll have to go out and develop custom technology for them. For example, L’Oréal wanted to advertise a product for hair coloring. They asked us if we could identify every image out on different websites and identify the color of the hair of the people in the images so that they could strategically target the products that go along with those hair colors. We ran this campaign from them. They were really, really happy with it.

They liked it so much that they came back to us, and they said, “We had such a good experience with that. Now we want you to go out and find people that have bold lips,” which was a rather strange notion for us. Our computer vision engineers came up with a way to segment the lips, figure out, “What does boldness mean?” Loral was very happy. They ran a lipstick campaign on these types of images.

A couple years ago, we had a very interesting in-image campaign that I think might be the first time that the actual content that you're viewing became part of the advertising creative. What we did is, for Lifetime TV, they wanted to advertise the TV series, Witches of East End. We looked for photos where people were facing forward. When we encountered those photos, we dynamically overlaid green witch eyes onto these people. It gives people the notion that they become a little witchy for a few seconds. Then that collapses and becomes a traditional in-image ad where somebody can then, after being interested by the eyes, can go ahead and click on this to watch a Video LightBox to see the preview for the show.

I just thought this was one of the most interesting ad campaigns I’ve ever seen because it mixes the notion of content and creative into one. What’s coming after this? Naturally, this will extend into video. TV networks are already training you to look at information in the lower third of the screen. It’s only natural that this will get replaced by contextual advertising the same way we’ve done it for images online.

Another thing that I think is coming soon is the ability to really annotate specific products and items inside images at scale. People have tried to do this using crowdsourcing in the past, but it’s just too expensive. When you're looking at millions of images a day like we do, you really need information to come in a more automated way. There’s been a lot of talk about AR. Obviously, advertising’s going to have to fit into this in some way or another. It may be a local direct response advertiser. You're walking down the street. Someone gives you a coupon for McDonald’s. Maybe it’ll be a brand advertiser. You see a car accident, and they’re going to remind you that you need to get car insurance.

Lastly, I wanted to pose the idea of in-hologram ads that I think could come in the future if these things like Siri and Alexa … Now they’re voice, but in the future, who knows? They might be 3D images living in your living room, and advertisers are going to want a way to basically put their name on those holograms. Thank you very much.

Get your tickets now to the next Annual LDV Vision Summit.

Get Ready to See More 3D Selfies in Your Facebook Feed

+20160525_LDV_1549.jpg

Alban Denoyel, CEO and Co-Founder of Sketchfab spoke at the 3rd Annual LDV Vision Summit in 2016 about 3D content and 3D Ecosystems and its impact on virtual reality.

At the 2017 Annual Vision Summit this week, we will be expanding upon the conversation with new speakers in the AR, VR and content creation spaces. Check out the agenda for more.

 

As Co-Founder and CEO of Sketchfab, I'm going to talk about User Generated Content in a volumetric era. The VR headsets are all hitting the market today and tomorrow it's going the be the AR headsets, and we're starting to see holographic devices. And the big question is, of course, the content. What content are we going to consume with all this hardware?

If you look at your content today, I put it in two brackets. One is to germinate content like the Henry movie by Oculus. It's really great. There are two issues with that. One is that it takes time to make and the other one is that it takes money. And the result is that there is very little studio-made VR content. And if you go to the Oculus store today, you'll see that for yourself.

And the other bracket of content, is user-generated content. And it has to be the bulk of your content. It has to be user generated. And today, user generated content for VR is mostly 360 video.

We live in a 3D world as you all know and we have six degrees of freedom. I can walk in a space in real life and VR is able to recreate the same thing and this is what we need to get a real sense of presence. And the advanced VR headsets are able have positional tracking, and let's you walk inside a space in all freedom. And so, which content is going to be able to serve this ultimate VR promise.

The good news is that we're entering an era of 3D creation for all thanks to two trends. One is much easier tools to create 3D content. I think the most iconic example of that is Minecraft. Maybe you don't think of it as a 3D creation tool, but hundreds of 3D creations coming from Minecraft on Sketchfab. Just by assembling small cubes you are able to build entire walls and then you can navigate into them in VR.

Another great example is Tilt Brush that let's you make VR content in VR, I don't know if you tried it, but it’s really fascinating. You create in VR and then you're able to revisit that in VR.

© Robert Wright/LDV Vision Summit

The second mega trend for 3D creation is 3D Capture and it is really fascinating to see how it has evolved over the past five years. The most famous project is maybe Project Tango by Google. They are shipping their first phone with a 3D sensor this summer with Lenovo. And also, if you look at the events on the Apple side, they bought PrimeSense three or four years ago.  PrimeSense was a company making the Kinect and all this points to our future iPhone with 3D camera. And the day we have an iPhone with a 3D camera you'll be able to capture spaces and people in 3D. And if you look at how we've captured the world, we started with drawing and then we started taking pictures and then we started taking videos. But as we live in a 3D world, 3D capture is going to be the next way we capture the things.

And so, here is an example with my son, William. I make a 3D portrait of him every month. I took it with just a phone so it’s hard to show a 3D file on a 2D screen. And it’s not dancing yet, but I also have dancing versions of him.

3D capture is super important but being able to distribute this content is equally important. When it comes to user generated content you have to share it online and help it travel across the web. And so that's what we do at Sketchfab, we're a platform to host and distribute 3D files. And with technologies like WebGL and WebVR we are able to browse this content in VR straight from a browser. And a pretty good example of that is we are natively supported in Facebook, which means that I can share this 3D portrait of my son, William, in a Facebook post and then prompt a VR view straight from my Facebook feed just from the browser without having to go to a store and to install crazy setup.

One area where user generated 3D content is really booming is around cultural heritage. A lot of museums are starting to digitize their collections in 3D.  But also a lot of normal people, when they go to the museums they're just starting to take pictures from various angles of statues and then publishing it on the web. They're very interesting initiatives that started like two years ago and are still happening is around what happened in Syria when ISIS started destroying art and museums there were lot of people on the internet started crowdsourcing the reconstruction in 3D of places like that. So here's an example of a temple in Palmyra that was preserved forever in a digital format.

Another very interesting vertical to me is documenting world events. Now, with this technology we're able to see 3D data from an event pretty much the day it happens. It really gives a new perspective to an event that is super interesting. On the left, you can see Kathmandu just after the terrible earthquake that happened last summer. The day it happened a guy flew a drone over Kathmandu and then generated a 3D map from it, then published it on Sketchfab. And you were able the same day to walk through the devastated Kathmandu in VR just from the web. That was pretty fascinating. And then on the right, super different, is the memorial that happened the day of Prince's death. People started putting flowers and guitars in front of a concert place and a guy just made a 3D picture and it’s a great way to document this place and this event.

3D capture is all areas of content and we are starting to see the same trends as what we saw on Instagram. People shooting their things, their food, their faces, so I think you can get ready to see more and more 3D selfies in Facebook news feeds.

Don't miss our 4th Annual LDV Vision Summit May 24 & 25 at the SVA Theatre in NYC.

Image Recognition Will Empower Autonomous Decision Making

 Rudina Seseri, Founder &amp; Managing Partner of Glasswing Ventures

Rudina Seseri, Founder & Managing Partner of Glasswing Ventures

Rudina Seseri is the Founder & Managing Partner of Glasswing Ventures. With over 14 years of investing and transactional experience, she has led technology investments and acquisitions in startup companies in the fields of robotics, Internet of Things (IoT), SaaS marketing technologies and digital media.

Rudina will be sharing her knowledge on trends and investment opportunities in visual technologies as a panelist and startup competition judge at the 2017 Annual LDV Vision Summit. We asked her some questions this week about her experience investing in visual tech and what she is looking forward to at the Vision Summit...

You are investing in Artificial Intelligence (AI) businesses which analyze various types of visual data. In your perspective, what are the most important types of visual data for artificial intelligence to succeed and why?
Nowadays, a key constraint for AI to succeed in perception tasks are good (i.e. labeled) datasets. Deep learning has allowed us to achieve "super-human" performance in some tasks, and computer vision is a key pioneering area - from LeCun's OCR in the 90s, to the new wave of AI excitement spurred by Andrew Ng – and others – in the unsupervised tagging of YouTube videos and deep nets performance in ILSVRC competition (an annual image recognition competition which uses a massive database of labeled images). 

Image recognition has now moved from single object labeling, to segment labeling and full scene transcription. Video has also seen impressive results.  An important next step will be to see how we can move from perception tasks like image recognition, to autonomous decision making. The results achieved already in games, and self-driving cars are promising. One can think of applications in just about anything from autonomous vehicles, visual search, (visual) business intelligence, social media, visual diagnostics, entertainment, etc. However, I think the most important thing for success is to be able to match the type of data and algorithm to whichever problem you're trying to solve. The ability to create valuable datasets in new use cases will be essential for startups.


I believe AI and vision will have a massive impact across sectors and industries which is why we decided to launch [Glasswing Ventures].

-Rudina Seseri


What business sector do you believe will be most disrupted by computer vision and AI?
That’s a tough one because I believe AI and vision will have a massive impact across sectors and industries which is why we decided to launch the firm. From a vision point of view, we need to ask which are the business sectors that rely (or could rely) the most on images, and those are likely to be the ones "most disrupted" by AI.  Within the enterprise, marketing and retail are likely to be one of the earliest adopters. In terms of sectors, it's easy to see the impact that AI will have on e-commerce, transportation, healthcare diagnostics, security etc.
 
You are speaking and judging at our LDV Vision Summit. What are you most excited about?
The LDV Vision Summit is a key event for anyone involved in computer vision. Being a speaker and a judge, sharing the stage with some of the pioneers in the domain and hearing the pitches of some of the most promising entrepreneurs in the area. Being able to spend two days with all of you and discuss trends and the future of computer vision is invaluable. 

You’ve said “the skillset of data scientists will be rendered useless in 12-18 months. They will need to either evolve with new AI tools or become a new category of Machine Language Scientists.” How does this rapid evolution in AI impact your investing strategy?
Data science is indeed evolving at a very fast pace. The exponential improvement in computing power, the ability of GPUs to parallelize data processing (crucial for CNNs), and the sheer abundance of data available, has required data scientists to rethink how they can better leverage these capabilities and experiment with what was previously unthinkable. While most of the algorithms considered state-of-the-art today have been developed over decades, the way in which data scientists use them, has changed considerably - i.e. moving from feature engineering to architecture engineering.

Additionally, the community has fully embraced open-source, with most breakthroughs being published and algorithms shared. This means that savvy data scientists have to: know the advantages and limitations of each approach for their use case given the new computing/data constraints; be willing to experiment with new methods and embrace open-source while being able to build a sustainable competitive advantage; and be on top of the new developments in their area.

Finally, the emergence of data science at the center of AI development has created a new, major stakeholder in product teams (along with engineering & PM) so a good dynamic between these three teams, with constant collaboration to push the limits of technology, while always focusing on creating a product that delivers superior value vs status quo to their target customer is key.

This is the last week to get your discount tickets for the 4th Annual LDV Vision Summit which is featuring fantastic speakers like Rudina. Register now before ticket prices go up!