From Publisher of Spy Magazine to Reimagining Business Insights as Entertainment

TomPhillips01.png

As we gear up for our 6th annual LDV Vision Summit, we’re highlighting some speakers with in-depth interviews. Check out the full roster of our speakers here. Regular Price tickets now available for our LDV Vision Summit on May 22 & 23, 2019 in NYC at the SVA Theatre.  80 speakers in 40 sessions discuss how visual technologies are empowering and disrupting businesses. Register for tickets now!

The term “cult following” is typically reserved for movies or books, not defunct media enterprises from the pre-internet era. But that’s what comes to mind with Spy magazine. Spy’s founding publisher Tom Phillips is quick to point out that the satirical monthly magazine that captured urban discourse in the late ‘80s never grew beyond 160,000 readers, minuscule by contemporary heavyweights.

Still, the significance of Spy can’t be overstated. In Tom’s case, the success of the magazine set him on a far-reaching trajectory that included multiple businesses sold, a nearly four-year stint at Google, and in the present at the helm of a company, Section4, that’s aiming to do for business what Netflix did for entertainment.

In this article, Tom talks about how Section4 will take business intelligence to a whole new level. Read on to learn about his entrepreneurial journey, his tips for startup founders, why he’s excited for our LDV Vision Summit this May, and more.

How Spy Reinvented the Magazine

Tom Phillips makes what some might consider a controversial statement. When asked which entertainment outlets with personalized feeds are doing algorithms right, he says no one is. “Netflix, I gotta say is kind of disappointing,” he adds.

When a serial entrepreneur calls a company with a $150 billion-dollar market cap disappointing you know they have something big in the works.


“Spy was the first hyperlinked publication before there were hyperlinks.”


As the founding publisher of Spy, Tom was at the forefront of media in a time when “clicks” didn’t count. Still, not being an Internet entity — Spy shuttered in 1998, well after Tom departed, without ever having a web presence — didn’t stop it from becoming an outlet that broke ground. In many ways, Spy’s successes and distinctions popularized and foreshadowed elements of media that prevail in today’s digital age. “[Spy] was known for really reinventing the way magazine graphics were done,” says Tom. “Spy was the first hyperlinked publication before there were hyperlinks.”

Spy ’s founding team: Tom Phillips, Kurt Andersen, and Graydon Carter for Barneys, 1988.  Photo by Annie Leibowitz

Spy’s founding team: Tom Phillips, Kurt Andersen, and Graydon Carter for Barneys, 1988.
Photo by Annie Leibowitz

He explains, “This whole idea that there were endless layers of information was something that Spy brought to life and it brought it to life editorially and graphically. So in a given article you would have not played the article straight like a New Yorker article, or the article played straight with a sidebar like a TIME magazine or Fortune magazine article. Or an article with a certain arc that ran for X thousand words and then had sort of a subset piece here and a subset piece here or a linked article like an Esquire or Rolling Stone.”

“A given article would have 10 different sidebars. It would have data attached to it. It would have charts. Some of the charts were kind of serious, some of them were just funny ways to show information.”

Some of the concepts driven by Spy seem especially pertinent in today’s media-rich digital landscape.

“It was a visualization of information, of editorial point of view, and of data that really broke new ground,” says Tom. “The editor of Entertainment Weekly at the time will tell you this if you can track him down. They basically looked at Spy and said, ‘Let's do that for a mainstream audience.’”

Photo courtesy Tom Phillips

Photo courtesy Tom Phillips

From Old Media to New Media

Tom moved on from Spy when he started getting the sense that it was time to shift his focus to the then-nascent Internet. “When I left Spy, it was the magazine I'd always dreamed of. And I didn't want to stay in the magazine [business]. I had a little bit of foresight at the time to think that the idea of print journalism doesn't really have legs as we move into a future that's looking like it's gonna be digital, computer-based.” From Spy, Tom has had a career that follows old media’s transformation to new media: from President of ABC News Internet Ventures to ESPN to Deja.com (sold to Google and eBay in 2001) to Google (as the director of search & analytics) to Dstillery, where he served as CEO.

After Spy, Tom says the shift went from visual depiction of media to a data focus. “We chose to do a web-based sports service because bandwidth was so constrained,” he says of his time at Starwave. “And with sports, scores and headlines tell a big part of the story.”

“We could do box scores, we could do real time scores, we could do headlines. We could do headlines from different perspectives. The stuff that we could do with very limited bandwidth. Where with most information intensive and entertainment oriented media, limited bandwidth is just death. There's just nothing you can do.”

Tom’s path reflects less a diminishing interest in the media-rich imagery that Spy was known for and more the limitations. To hear him tell it, the industry has just been catching up to the big things that are possible with high bandwidths and a visual focus.

“The big winners of the 90s were Yahoo, Amazon and Ebay. So which one of these is producing content? None of them at that stage. They're all just channeling user generated information and selling stuff, right? And even selling stuff was user generated at Ebay.

“So I just figured, you know, as much as I loved being a magazine publisher and being a publisher, and being a creator of inspiring and rich entertaining information, I went the other direction. I went more and more toward data centricity and abstraction. And really only with this venture, with Section4, am I back to, ‘Oh, okay. It's now 2019. We can now produce incredibly rich content and make a business out of it.’"

Tom playing in a house band called the  Algorhythms in Memphis (courtesy Tom Phillips)

Tom playing in a house band called the Algorhythms in Memphis (courtesy Tom Phillips)

Coming Full Circle with Section4

The thesis at Section4 is simple, but ambitious, and it’s something Tom says no one else is doing: “If we can generate business insights for professionals and actually deliver them like TV, then we can create the business insights Netflix.That we can deliver a whole smorgasbord of great entertainment that is also edifying to professionals.”

“Business media today is stuck in the 20th century,” says Tom. “It's all linear TV and text-based news and analysis. To translate that into a rich digital medium is a lot of work. It's hard, but we can do that. We're convinced that our approach will appeal to 32-year-olds, not 68-year-olds.”

The key is that cutting-edge technologies like AI and machine learning haven’t even come into play yet in media. “We're fully capable of thinking in those terms and I've run companies that are big data-based AI companies. That's not the domain that is important here. What's important here is paying attention to what people need professionally and respond to emotionally.”

The goal, says Tom, is “to create great short form TV out of professional services, in a professional services domain. No one's done that before.”

Algorithms, although essential, aren’t the hard part. “Our content, because we're professional, is much easier to quantify and categorize and create meta tags around. The algorithms are gonna be easy once we get them.” It will be a step up from Netflix because “[Netflix is] so squishy in terms of what the content is. It's hard to capture it in any kind of meta sense.”


“When people see it, they're gonna say, ‘Whoa. I didn't know I could be entertained and edified at the same time. I didn't know I could be professionally enhanced while I watched something really fun.’”


What’s more critical than any technology or algorithm is what media companies are still struggling with: attention and monetization. Section4 is ready to take advantage of the critical mass that media has reached.

“Getting attention in a crowded landscape is the biggest [constraint]. And then convincing people that this is good enough to pay for. That has changed dramatically in the last couple years. It used to be nothing was good enough to pay for, and partly because nothing was good enough to pay for.

“And now you have all these over-the-top subscription services that are all consumer-based, entertainment-based. We’re [trying to] create a network that's professional, that's even more premium-priced, and we think we can.”

What Visual Data Means for Publishing

As Section4 gears up for launch, Tom reflects on the future, and highlights where he thinks publishing still has a ways to go. “We live in a world where our access to data, and thereby our interest, is increasing by a multiple every year. It's crazy in terms of what's available to us.”


“The visualization of the data, to making it meaningful and digestible, will make great leaps in the next few years.”


“It has to happen because people are overwhelmed. And people need to see it to understand it. There's so much to be done. That's the thing that will be on everybody's mind and will make great strides, and frankly, we'll change the way we communicate.”

LDV Vision Summit and Tips for Founders

As he gets ready for LDV Vision Summit next month, Tom praises the diversity that he knows will be present.

“As an old white guy ... I know I'm gonna be in the minority, not in the majority. I spent my whole professional life being in the majority, and I like not being in the majority. It's kind of cool. It's refreshing. It's good for me and good for everyone.”

Last tips for the founders who’ll present onstage in the LDV Vision Summit’s two competitions: confidence is key.

“You may not have a unique vision or be the most competent and qualified person to start this business, but you better believe that both of these things are true.”

As for what founders can learn from his journey, Tom adds, “I spent most of my career chasing and striving great business ideas and trying to build wildly successful businesses.  In retrospect, I spent too little of it building products I love. The combination, of course, is magical.”

If you’re building a unique visual tech company, we would love for you to join us. At LDV Capital, we’re focused on investing in deep technical people building visual technology businesses. Through our annual LDV Vision Summit and monthly community dinners, we bring together top technologists, researchers, startups, media/brand executives, creators and investors with the purpose of exploring how visual technologies leveraging computer vision, machine learning and artificial intelligence are revolutionizing how humans communicate and do business.

2019 LDV Vision Summit Visual Technology Trends

Rebirth of Medical Imaging by Daniel Sodickson, Vice-Chair for Research, Dept of Radiology, Director, Bernard & Irene Schwartz Center for Biomedical Imaging. Principal Investigator, Center for Advanced Imaging Innovation & Research at New York University Langone Medical Center ©Robert Wright/LDV Vision Summit 2018

Rebirth of Medical Imaging by Daniel Sodickson, Vice-Chair for Research, Dept of Radiology, Director, Bernard & Irene Schwartz Center for Biomedical Imaging. Principal Investigator, Center for Advanced Imaging Innovation & Research at New York University Langone Medical Center ©Robert Wright/LDV Vision Summit 2018

We launched the first annual LDV Vision Summit five years ago, in 2014, with the goal of bringing together our visual technology ecosystem to explore how visual technology is empowering business and society.

Our gathering is built from the ground up, by and for our diverse visual tech ecosystem - from entrepreneurs to researchers, professors and investors, media and tech execs, as well as content creators and anyone in between. We put our Summit together for us all to find inspiration, build community, recruit, find co-founders, raise capital, find customers and help each other succeed.

Every year we highlight experts working on cutting edge technologies and trends across every sector of business and society.  We do not repeat speakers and we are honored that many attendees join every year.

Below are many of the themes that will be showcased at our 6th LDV Vision Summit May 22 & 23 in NYC. Register here and hope to see you this year.


Visual Technologies Revolutionizing Medicine

Visual assessment is critical to healthcare — whether that is a doctor peering down your throat as you say “ahhh” or an MRI of your brain. Over the next ten years, healthcare workflows will become mostly digitized, more personal data will be captured and computer vision, along with artificial intelligence, will automate the analysis of that data for precision care. Our speakers will showcase how they are deploying visual technologies to revolutionize medicine:

  • CIONIC is superpowering the human body.  

  • Ezra is the new way to screen for prostate cancer.  

  • MGH/Harvard Medical School has developed AI that is better than most experts at diagnosing a childhood blindness disease.

  • Teledoc Health provides on-demand remote medical care.  

  • and more...


Computational Imaging Will Power Business ROI

Computational imaging refers to digital image capture and processing techniques that use digital computation instead of optical processes. Entrepreneurs and research scientists from Facebook, Sea Machines, Cornell Tech, and University College London will enlighten on how their research is delivering valuable results.

  • GM and Cruise Automation are using state-of-the-art software and hardware to create the world's first scalable AV fleet.

  • The inference of 3D information from the video acquired from a single moving camera.

  • Deep convolutional neural network (ConvNet) for multi-view stereo reconstruction.

  • Image Quality and Trust in Peer-to-Peer Marketplaces.

  • and more...

Synthetic Data Is Disrupting Legacy Businesses

Synthetic data is computer-generated data that mimics real data; in other words, data that is created by a computer, not a human. Software algorithms can be designed to create realistic simulated, or “synthetic,” data. This computer generated data is disrupting legacy businesses including Media Production, E-Commerce, Virtual Reality & Augmented Reality, and Entertainment. Experts speaking on this topic include:

  • Synthesia Delivers AI-driven video production.

  • Forma Technologies is building photorealistic avatars that are a dynamic form for people’s online identity.

  • and more...


Where Are The Next Visual Tech Unicorns?

A large number of visual technology businesses have already broken the $1B ceiling: Pinterest, DJI, Magic Leap, Snap, Zoom, Zoox, etc. With applications of computer vision and machine learning on an exponential rise, top investors in visual technologies will discuss the sectors and trends they see with the most potential for unicorn status in the near future:

  • Nabeel Hyatt, Spark Capital

  • Rachel Lam, Imagination Ventures

  • Matt Turck, FirstMark Capital

  • Laura Smoliar, Berkley Catalyst Fund

  • Hadley Harris, Eniac

  • Zavian Dar, Lux Capital

  • and more...

Experiential Media is the Future

The Internet and digital media have built reputations for nameless, faceless actors and disconnection but advances in tech and new approaches are changing that. Whether through interactive video or a live music video game, visual technologies are creating experiences that connect people to content & each other.

  • FTW Studios is creating experiences designed to bring people together — to be a part of live, shared moments.

  • Section4 is reinventing professional operational media, making it succinct, discoverable, provocative and actionable.

  • Eko is an interactive storytelling platform that continues to be a leader in Choice Driven Entertainment.

  • and more...

Nanophotonics are Pushing the Envelope

Nanophotonics can provide high bandwidth, high speed and ultra-small optoelectronic components. These technologies have the potential to revolutionize telecommunications, computation and sensing.

  • Voyant is creating the next generation of chip-scale LIDAR

  • MacArthur Fellow Michal Lipson is a physicist known for her work on silicon photonics. Working on many projects such as drastically lowering cost and energy of high power computing for artificial intelligence.

  • and more...


Farm to Factory to Front Door, Visual Tech is Improving Logistics & Agriculture

Breakthrough visual technologies will transform the speed, safety and efficiency of agriculture, manufacturing, supply chain and logistics. Legacy actors and startups alike are finding fascinating use cases to implement computer vision and machine learning to improve their processes:

  • Plus One’s software & computer vision tackles the challenges of material handling for logistics.

  • Non-invasive, real time food quality information delivered via hyperspectral imaging.

  • Level 4 Autonomous Vehicles for Urban Logistics

  • and more...

The Next Decade Will See Trillion Dollar Sectors Disrupted by Visual Technologies, According to Hadley Harris

Hadley Harris, Founding Partner at Eniac, Captured by DepthKit

Hadley Harris, Founding Partner at Eniac, Captured by DepthKit

Hadley Harris is the Founding General Partner at Eniac. He has done a little bit of everything on the path to co-founding Eniac, starting out as an engineer at Pegasystems, a product manager at Microsoft and a strategist at Samsung. He ran a few aspects of the business across product and marketing at Vlingo prior to its sale to Nuance. He also served as CMO at Thumb until it was acquired.

Hadley will be sharing his knowledge on trends and investment opportunities in visual technologies as a panelist and startup competition judge at the 6th Annual LDV Vision Summit May 22 & 23. Early Bird tickets are available through April 16 , get yours now to come see Hadley and +60 other top speakers discuss the cutting edge in visual tech.

In lead up to our Summit, Evan Nisselson, General Partner at LDV Capital asked Hadley some questions about his experience investing in visual tech and what he is looking forward to at our Vision Summit...

Evan: You have extensive entrepreneur, technical and business operation experience before you co-founded Eniac Ventures. Which aspects of your expertise do you believe helps you empower entrepreneurs to succeed and why?

Hadley
: I was very fortunate to be a senior leader at two startups that were acquired. I worked with a bunch of super talented entrepreneurs and executives that I learned a lot from. I was also lucky to have some great investors and board members who helped me see how VC’s could help empower teams to thrive. Interestingly, what stuck with me the most is some of the terrible behavior I witnessed by VC’s during fundraising -- spending the whole meeting on their phones, leaving the room several times during the pitch, eating 3-course meals without looking up at what we were presenting. I told myself that if I ever became a VC I’d focus on helping entrepreneurs with empathy and respect for the amazingly difficult task they had chosen.


I consider the most interesting technological theme right now to be the way data and machine intelligence are changing every industry and our daily lives. A strong argument could be made that visual technology is the most important input to data+machine intelligence systems

-Hadley Harris


Evan: Eniac has invested in many visual technology businesses that leverage computer vision. Please give a couple of examples and how they are uniquely analyzing visual data.

Hadley:
By my count, 30% of the investments we’ve made over the last few years have a visual technology component. A handful are in autonomy and robotics. For example, iSee is an autonomous transportation company that has developed a humanistic AI that is able to flourish in dynamic and unpredictable situations where other solutions fail. Obviously, they can only do that by leveraging CV as an input to understand the vehicle’s surroundings. Another one that is really interesting is Esports One. They use computer vision to understand what’s going on in esports matches and surface real-time stats and analytics to viewers. It’s like the first down marker on steroids.  

Evan: In the next 10 years, which business sectors will be the most disrupted by computer vision and why?

Hadley:
Over the next 10 years there are a number of trillion $ sectors we’re exploring at Eniac that will be disrupted by visual technology – food & agriculture, construction, manufacturing, transportation, logistics, defense but if I were to pick one it would be healthcare. We’re already seeing some really interesting use cases taking place in hospitals but that’s just the very tip of the iceberg. When these technologies move into the home so that individuals are being monitored on a daily basis the way we think about health and wellness with dramatic change.

Evan:   We agree that visual technologies will have a tremendous impact on the healthcare industry. Actually, our annual LDV Insights deep dive report last summer analyzed Nine Sectors Where Visual Technologies Will Improve Healthcare by 2028.

Eniac & Sea Machines - [L to R] Hadley Harris (Eniac), Jim Daly (COO of Sea Machines), Michael Johnson (CEO of Sea Machines), Vic Singh (Eniac)

Eniac & Sea Machines - [L to R] Hadley Harris (Eniac), Jim Daly (COO of Sea Machines), Michael Johnson (CEO of Sea Machines), Vic Singh (Eniac)

Evan:  Eniac and LDV Capital are co-investors in Sea Machines who capture and analyze many different types of visual data to deliver autonomous workboats and commercial vessels. What inspired you guys to invest in Sea Machines?

Hadley:
We’ve had a broad thesis over the last 4 years that everything that moves will be autonomous. When investment in the best autonomous car and truck companies became prohibitively competitive and expensive we started looking for underserved areas where autonomy could drive significant value. This drove us to look at the autonomous boat space.  We found a few teams working on this problem, by far, the best of which was Sea Machines. They stood out because they married strong AI and CV abilities with a very deep understanding of the navel space based on decades in the boating ecosystem.

Evan: LDV Capital started in 2012 with the thesis of investing in people building visual technology businesses and some said it was “cute, niche and science fiction.” How would you characterize visual technologies today and tomorrow?

Hadley: I consider the most interesting technological theme right now to be the way data and machine intelligence are changing every industry and our daily lives. A strong argument could be made that visual technology is the most important input to data+machine intelligence systems. So no, I don’t think visual technologies are cute, niche or science fiction; they are one of the primary drivers of the biggest technological theme of our time.

Hadley Harris, Eniac

Hadley Harris, Eniac

Evan: You frequently experience startup pitches. What is your one sentence advice to help entrepreneurs improve their odds for success?

Hadley:
Know the ecosystem your startup is playing in absolutely cold.  

Evan: What are you most looking forward to at our 6th LDV Vision Summit?

Hadley:
I’m excited to be at such a focused event where I can hear from amazing entrepreneurs and scientists about the cutting edge projects they’re working on.  

Get your Early Bird tickets by April 16th for our 6th Annual LDV Vision Summit which is featuring fantastic speakers like Hadley. Register now before ticket prices go up!

At Facebook Camera, Everyday They Think About Delivering Value to AR Users

Matt Uyttendaele, LDV Vision Summit 2018 ©Robert Wright

Matt Uyttendaele, LDV Vision Summit 2018 ©Robert Wright

Early Bird tickets now available for our LDV Vision Summit 2019 - May 22 & 23 in NYC at the SVA Theatre.  80 speakers in 40 sessions discuss the cutting edge in visual tech. Register now!

Matt Uyttendaele is the Director of Core AI at Facebook Camera. At our 2018 LDV Vision Summit , Matt spoke about enabling persistent Augmented Reality experiences across the spectrum of mobile devices. He shared how, at Facebook Camera, they are solving this and giving creators the ability to author these experiences on their platform. He showcased specific examples and highlighted future challenges and opportunities for mobile augmented reality to succeed.

Good morning LDV. I am Matt Uyttendaele. I work on the Facebook camera and today I'm going to talk about our AR efforts on smartphones.

We at Facebook and Oculus believed that AR wearables are going to happen someday, but we're not waiting for that. We want to build AR experiences into the apps that our community uses every day. That being Messenger, Facebook and Instagram. And I'm going to do a deep dive into some of those efforts.

How do we bring AR to mobile? There's three major investments that we're making at Facebook. First is just getting computer vision to run on the smartphone. We take the latest state of the art computer vision technology and get that to run at scale on smartphones.

Second, we're building a creator platform. That means that we want to democratize the creation of AR into our apps. We want to make it super simple for designers to create AR experiences on our apps.

And then we're constantly adding new capabilities. The Facebook app updates every two weeks. And in those cycles, we're adding new capabilities and I'll dive into some of those in the talk.

One of our challenges in bringing AR to mobile devices at this scale is that there's a huge variety of hardware out there, right? Some of these are obvious, like camera hardware. We need to get computer vision to run on a huge variety of phones. So that means we have to characterize exactly the cameras and lenses on all these phones. Inertial sensors are super important for determining how the phone moves. That works pretty well on the iPhone, not so much on Android. It was telling that on the Pixel 2 one of the top bullet items, marketing bullet items was IMU synchronized with camera because that enables AR. But that's a challenge that we face in bringing these experiences at scale. All told we support 10,000 different SKUs of cameras with our apps.

So let's dive a little bit into the computer vision, some of the computer vision that's running in our AR platform. On the left, take a video frame in, take an IMU data, and a user may select a point to track within that video. First we have a tracker selector that's analyzing the frame that's coming in and it is also aware of the device capabilities that we're operating on.

©Robert Wright/LDV Vision Summit 2018

©Robert Wright/LDV Vision Summit 2018

Then we've built several state of the art computer vision algorithms. I think our face tracker is probably one of the best monocular, or maybe the best monocular, face tracker out there running on a mobile phone. But we also have a simple inertial tracker that's just using the IMU. And we've implemented a really strong, simultaneously localization and mapping algorithm which is also known as SLAM. At any given time one of these algorithms is running while we're doing an AR experience. And we can seamlessly transition between these algorithms.

For example, if we're using SLAM and we turned the phone to a white wall and there's certainly no visual features to track, we can seamlessly transition back to the IMU tracker. And that lets us deliver a consistent camera pose across the AR experiences, that your AR object doesn't move within the frame. Okay, so that's the dive into our computer vision.

Here's a look at our creator platform. Here's somebody wiring up our face tracker to an experience that he has designed, right? So these arrows are designed by this. Here's similarly, somebody else taking our face tracker and wiring up to accustom mask that they have developed. So this is our designer experience in something called AR Studio that we deliver.

AR Studio is cross platform obviously because our apps running cross platform so you can build an AR experience and deliver that to both iOS and Android. It's delivered through our Facebook cameras stack, which means that runs across the variety of apps, Messenger, Facebook and Instagram. And we've enabled these AR experiences to be delivered to 1.5 billion people that run our apps. So if you build an app inside this AR Studio, you can have a reach of 1.5 billion people.


“We've enabled these AR experiences to be delivered to 1.5 billion people that run our apps.”


Okay, let me look at now a new capability that we recently delivered. This is called Target AR. And here this user is taking his phone out, pointing it at a target that's been registered in our AR Studio. So this is a custom target. And they've built a custom experience, the overlay on that target. So when we recognize that target, their experience pops up and is displayed there.

And we didn't build this as a one off experience that's shown here. We built this as a core capability into the platform. So here, our partners at Warner Brothers, at South by Southwest, deployed these posters across Austin about the time of Ready Player One launch and they use the AR studio to build a custom experience here where when we recognize that poster, their effect populations up in the app. And here's, one of my partners on the camera team, doing a demo. And that Warner Brothers experience popped up as it recognized that poster.

What I want to leave you with is we at Facebook deliver value to users in AR and that's something that we think about every day in the Facebook camera. I think I've shown you some novel experiences, but what we really strive to do is deliver real user value through these things. And that's something that, please look at what we're doing over the next year in our Facebook camera apps across Messenger, Facebook and Instagram, because that's something that we hope to achieve.

Thank you.

Watch Matt’s keynote at our LDV Vision Summit 2018 below and checkout other keynotes on our videos page.

Early Bird tickets are now available for our LDV Vision Summit May 22 & 23, 2019 in NYC to hear from other amazing visual tech researchers, entrepreneurs and investors.

Delivering AR Beyond Dancing Hotdogs

Early Bird tickets are available through April 16 for our LDV Vision Summit 2019 - May 22 & 23 in NYC at the SVA Theatre.  80 speakers in 40 sessions discuss the cutting edge in visual tech. Register now!

At the LDV Vision Summit 2018, Joshua Brustein of Bloomberg Businessweek spoke with Serge Belongie (Cornell Tech), Ryan Measel (Fantasmo), Mandy Mandelstein (Luxloop) and Jeff McConathy (6D.ai) about how the digital and physical world will converge in order to deliver the future of augmented reality.

They spoke about how the technology stack for augmented and mixed reality will need several new layers of different technologies to work well together. From spatial computing with hyper-precision accuracy in multiple dimensions to filtering contextually relevant data to be displayed to the user based on location, activity, or time of day. What are the technical challenges and opportunities in delivering the AR Cloud?

Watch and find out:

Our LDV Vision Summit 2019 will have many more great discussions by top experts in visual technologies. Don’t miss out, check out our growing speaker list and register now!

Measuring Sleep with a Camera Makes Nanit a Powerful Tool for Research

Assif Glazer, LDV Vision Summit 2018 ©Robert Wright

Assif Glazer, LDV Vision Summit 2018 ©Robert Wright

Early Bird tickets now available for our LDV Vision Summit 2019 - May 22 & 23 in NYC at the SVA Theatre.  80 speakers in 40 sessions discuss the cutting edge in visual tech. Register now!

Assif Glazer is the founder & CEO of Nanit. At our 2018 LDV Vision Summit he shared how Nanit's unique computer vision technology and product helps children and their parents sleep better. Merging advanced computer vision with proven sleep science technologies, Nanit provides in-depth data available for helping babies, and parents, sleep well in those crucial early months and years of a baby’s development. He spoke about how this technology is expandable to the greater population as well, leading to early detection of other disease states like sleep apnea, seizures, autism and more. He shared the state of the art today and how he envisions sleep tech helping society in 10 and 20 years.

Hello, everyone. I'm Assaf, the CEO and founder of Nanit. Our business is human analytics. If you are not familiar with Nanit, we sell baby monitors that measure sleep and parental interaction. We use computer vision for this purpose. And there are thousands of parents today across the U.S. that are using Nanit to sleep better.

Nanit is a camera that is mounted above the crib. The experience with Nanit is very different from any other baby monitor that you can think. I won't go much into reviews but I would say that on BGR, they wrote, "This baby monitor is so impressive, we want to have a baby just to try it."

The camera costs $279 and there is a subscription, $10 a month or $100 a year. If you look at how we do it, actually, we do it in different levels. First, we give you the best view of your child and then we give you some real-time information of what happened in the crib. It helps you to manage nap time, check your child remotely and know, “my baby woke up one hour ago, he sleep for 20 minutes.” We give you daily and weekly updates and every week, we'll also send you sleep tips and recommendations on how to improve sleep based on the personal data that we saw during the last week. And finally, we celebrate achievements and give you rewards for sleep milestones and accomplishment from the last week.


“When you measure sleep with a camera, you can also measure the environment, the behavior and build a picture around the sleep architecture.”


We measure sleep. We actually measure sleep better than the state-of-the-art medical device. There are different ways to measure sleep but when you do it with a camera, you can also measure the environment, the behavior, and build a picture around the sleep architecture. In this context of babies, we also measure the parent and we know when the parent is approaching the crib, when he's touching the baby, when he's taking the baby out of the crib or differentiate it with different kind of moment that we would like to ignore. Then, by combining these all behaviors together, along with other behaviors of the child, we can have a very precise diagnosis of on sleep issues and beyond.

This is deeply anchored in research. When we were part of the runway program at Cornell Tech - they help people looking to commercialize science - and they really helped us build collaboration between different verticals; sleep experts and psychologists, cognitive development, model development, etc. Today, we have plenty of studies in the works, in collaboration with different types of institutes we are publishing papers.

Just last month, we published a paper at the IBSA Conferences. We took three months - 175,000 nights’ sleep - we analyzed and tracked the parental intervention patterns between zero to 24 months age babies.

Assif Glazer, LDV Vision Summit 2018 ©Robert Wright

Assif Glazer, LDV Vision Summit 2018 ©Robert Wright

So Nanit is also a research tool. It's a research tool that can tell you about behaviors and sleep. Here you see across the US. For instance, you can see that babies in Denver tend to wake up one more time than in the rest of the states. I don't know the reason, but it is just a fact. We have very precise data on babies’ sleep so we can tell you every day if the sleep pattern changes. It's interesting to see, for instance, that at Thanksgiving, parents are putting their baby to sleep earlier. Maybe so they will have quality time during their dinner?

Nanit is a very powerful tool. The ability to record the night and then analyze it will serve the need of people in the medical field as well as parents. Looking at Nanit as a research tool, Nanit gives you so much information. By having Nanit in the house and monitoring thousands of normal children, we can learn more about what is normal. And if we know what is normal, then we can know what's not normal and are these the early signs of a future disease?

There are constant movements to try to identify children who are at risk for autism earlier and earlier. With this technology, we could certainly develop some normative data and to be able to identify otherwise unrecognized signs. This technology could also be used in the adult population, a hospital setting, or a hospice setting, or perhaps a nursing care setting.

It can look at restless leg movement, it can look at the breathing, and of course, sleep apnea is much more common in adults than in children. Then it can really open our eyes to things we didn't know as researchers, that we couldn't study in our own labs and can change the way we treat children and adults as well.

So Nanit is also a research tool. It's a research tool that can tell you about behaviors and sleep. Here you see across the US. For instance, you can see that babies in Denver tend to wake up one more time than in the rest of the states. I don't know the reason, but it is just a fact. We have very precise data on babies’ sleep so we can tell you every day if the sleep pattern changes. It's interesting to see, for instance, that at Thanksgiving, parents are putting their baby to sleep earlier. Maybe so they will have quality time during their dinner?

Nanit is a very powerful tool. The ability to record the night and then analyze it will serve the need of people in the medical field as well as parents. Looking at Nanit as a research tool, Nanit gives you so much information. By having Nanit in the house and monitoring thousands of normal children, we can learn more about what is normal. And if we know what is normal, then we can know what's not normal and are these the early signs of a future disease?

There are constant movements to try to identify children who are at risk for autism earlier and earlier. With this technology, we could certainly develop some normative data and to be able to identify otherwise unrecognized signs. This technology could also be used in the adult population, a hospital setting, or a hospice setting, or perhaps a nursing care setting.

It can look at restless leg movement, it can look at the breathing, and of course, sleep apnea is much more common in adults than in children. Then it can really open our eyes to things we didn't know as researchers, that we couldn't study in our own labs and can change the way we treat children and adults as well.

Nanit is the future of consumer-facing health. When we are looking at the future, you can think about application in, of course, pediatrics, but also adult sleep, elder care, big data analysis. Thank you.

Watch Assif Glazer’s keynote at our LDV Vision Summit 2018 below and checkout other keynotes on our videos page.

Early Bird tickets are now available for our LDV Vision Summit May 22 & 23, 2019 in NYC to hear from other amazing visual tech researchers, entrepreneurs and investors.

We are accepting applications to our Vision Summit Entrepreneurial Computer Vision Challenge for computer vision research projects and our Startup Competition for visual technology companies with <$2M in funding. Apply now & spread the word.

Partners at Glasswing, ENIAC, & OMERS Reveal Their Top Industries for Visual Tech

Early Bird tickets are available through April 16 for our LDV Vision Summit 2019 - May 22 & 23 in NYC at the SVA Theatre.  80 speakers in 40 sessions discuss the cutting edge in visual tech. Register now!

At the LDV Vision Summit 2018, Sarah Fay of Glasswing Ventures, Michael Yang of Comcast Ventures (now at OMERS) and Nihal Mehta of ENIAC spoke with Jessi Hempel of Wired (now LinkedIn) about the industries they think carry the most opportunity for visual technology.

Watch what they have to say about the future of transport, VR, cyber security, drones and much more…

Our LDV Vision Summit 2019 will have many more great discussions by top investors in visual technologies. Don’t miss out, check out our growing speaker list and register now!

Ezra is Revolutionizing Cancer Detection with Computer Vision and Artificial Intelligence

As we gear up for our 6th annual LDV Vision Summit, we’ll be highlighting some  speakers with in-depth interviews. Check out our full roster of our speakers here. Early Bird tickets now available for our LDV Vision Summit on May 22 & 23, 2019 in NYC at the SVA Theatre.  80 speakers in 40 sessions discuss how visual technologies are empowering and disrupting businesses. Register for early bird tickets before March 29!

Ezra CEO and Co-founder Emi Gal  (courtesy Emi Gal)

Ezra CEO and Co-founder Emi Gal (courtesy Emi Gal)

When Emi Gal was searching for the cofounder for his New York-based cancer detection startup Ezra, he pored over 2,000+ online profiles of individuals with expertise in medical imaging and deep learning. From there, he reached out to 300 individuals, conducted 90 interviews, and walked four finalists through a four-month project. The entire process took nine months.

All while still at his “day job.” For Emi, that day job was his European startup that had just been acquired by one of its largest American clients. Not your typical founder’s story, but as his meticulous cofounder search shows, Emi is not one to take a blind leap.

It’s this willingness to go methodically down the rabbit hole and to leave no stone unturned that’s defined Emi’s approach, from one venture to the next.

To date, Ezra has raised $4 million in its first round, led by  Accomplice and with participation from Founders Future, Seedcamp, Credo Ventures, Esther Dyson, Taavet Hinrikus, Alex Ljung, Daniel Dines and many others. Emi has since brought on a head of talent to help expand the team from two to 12 in four months.

In this article, Emi talks about how Ezra is changing the game for cancer detection through visual technology. Read on to learn about his scientific research-driven approach, the long-term possibilities for Ezra, and what to expect when he takes the stage at our LDV Vision Summit this May.

Photo courtesy Ezra

Photo courtesy Ezra

MISSION-DRIVEN: MAKING CANCER SCREENING MORE AFFORDABLE, ACCURATE AND NON-INVASIVE

The name Ezra means “help” in Hebrew.

It’s a spot-on moniker for a company that, although powered by artificial intelligence and visual technology, Emi says is more mission-driven than technology-driven.

Multiple components in cancer detection rely on the painstaking study and analysis of visual inputs, making visual technology ripe for leveraging.

In its current stage, Ezra is focused on detecting prostate cancer through full-body MRIs that are then analyzed by artificial intelligence. Ezra’s subscription-based model offers an MRI and access to medical staff and support at $999 for one year.

The full-body MRI is a huge change compared to the most prevalent detection method for a cancer that kills 1 in 41 men: getting a biopsy, which is painful, uncomfortable, and can have unpleasant side effects.

Magnetic resonance imaging, on the other hand, eliminates the discomfort and is more accurate than biopsies or blood tests. MRIs, however, are not without their costs: about $1,500 if an individual books one himself, sans Ezra. And then there’s the time a radiologist needs to scan it.

Radiologist vs AI detection of cancer in MRI scans  (Courtesy Ezra)

Radiologist vs AI detection of cancer in MRI scans (Courtesy Ezra)

“We’re trying to make MRI-based cancer screening affordable,” says Emi. “The cost of getting screened with MRI has two parts: the scanning and the analysis. You need to get a scan, and a radiologist needs to analyze the scan and provide a report based on the scan. Those two things each have a cost associated with them. We’re using AI to help drive the costs down by assisting the radiologist, and by accelerating the MRI scanning time.”

The first thing a radiologist does is make a bunch of manual measurements of the organ in question — in Ezra’s case, the size of the prostate. If you have an enlarged prostate, you have a higher likelihood of having cancer. If there’s a lesion like a tumor in the prostate, radiologists need to measure the size and location of the tumor. They need to segment the tumor so they can make a 3D model so the urologist knows what to focus on. All of those measurements and annotations are currently done manually, which makes up about half of a radiologist’s workload.

“What we’re focusing on is using the Ezra AI to automate all of those trivial, manual tasks, so the radiologist can spend less time per scan doing manual things, and instead focus on analysis and reporting. That will potentially make them more accurate, as well.”

For the future, says Emi, the team is already considering how to use AI to accelerate the scanning process as well.

“The reason this is possible now and it wasn’t possible before is that deep learning has given us the ability to be as good or better than humans at these things, which means it’s now feasible to create these types of technologies and implement them into the clinical workflow.”

THE SEED OF A NEW IDEA: PLANT IT BEFORE YOU’RE READY

While it looks like Emi has seamlessly gone from one successful venture to the next, the reality is a lot more nuanced.

It was while he was still running Brainient, before it was acquired, that he started plotting his next move. In 2015, Emi was introduced to Hospices of Hope in Romania, which builds and operates hospices that care for terminally ill cancer patients. During his visits with doctors and patients, the seed of Ezra was born.

Cancer struck a personal chord. As a child, Emi had developed hundreds of moles on his body, which put him at very high risk of melanoma. He started getting screened and going to dermatologists regularly from the age of 10 years onwards to make sure they weren’t cancerous. While he hasn't yet had any maligned lesions, he’s experienced the discomfort of biopsies firsthand, and he’s always been very conscious about the importance of screening.

“I realized while speaking to doctors that one of the biggest problems is that people get detected late. I started looking into that and realized that [this is] because there’s no fast, affordable, accurate way to screen cancer anywhere in the body,” says Emi. From there, he began researching different ways in which you can screen for cancer. A computer scientist by education, he spent two years on what he calls, “learning and doing an accelerated undergrad in oncology, healthcare, genetics and medical imaging.”

That accelerated educated is supplemented with an incredibly impressive team. It’s no surprise that Ezra cofounder Diego Cantor is equally curious and skilled, and brings an enormous technical repertoire to the table: an undergraduate education in Electronic Design and Automation Engineering, a master’s degree in Computer and Systems Engineering, a PhD in Biomedical Engineering (application of machine learning to study epilepsy with MRI), and Post-doctoral work in the application of deep learning to solve medical imaging problems. The scientific team is rounded out with deep technical experts: Dr. Oguz Akin (professor of radiology at Weill Cornell Medicine and a radiologist at Memorial Sloan-Kettering Cancer Center), Dr. Terry Peters, director of Western University’s Biomedical Imaging Research Centre), Dr. Lawrence Tanenbaum (VP and Medical Director Eastern Division, Director of MRI, CT and Advanced Imaging at RadNet), and Dr. Paul Grewal (best-selling author of Genius Foods).

Ezra’s deeply technical team trained AI with data sets from the National Institute of Health marked up by radiology experts. On new data sets, the Ezra was 90% accurate at agreeing with the experts.

Ezra’s deeply technical team trained AI with data sets from the National Institute of Health marked up by radiology experts. On new data sets, the Ezra was 90% accurate at agreeing with the experts.

LEARNING (AND STARTUP SUCCESS) IS ALL ABOUT COURSE CORRECTING

As a lifelong learner who actively chronicles his year-long attempts to gain new skills and habits, Emi has picked up a thing or two about doing new things for the first time.

“Learning anything of material value is really, really hard,” says Emi, who’s done everything from training his memory with a world memory champion to hitting the gym every single day for one year.

This focus on the process and being comfortable with being uncomfortable came in handy when Emi, who studied computer science and applied mathematics in college, pivoted into cancer detection.

Emi cycled through twelve potential ideas before deciding on Ezra’s current technology. At every turn, he researched methodically and consulted with experts.

One of the promising ideas that Emi considered — shelved for now — is DNA-based liquid biopsies. “We’re at least a decade away from DNA-based liquid biopsies being feasible and affordable,” says Emi, who was searching to make an immediate impact.

Emi had just sold Brainient and was on his honeymoon when he came up with the winning idea in November 2016. “I had this idea: what if you could do a full-body MRI and then use AI to lower the cost of the full-body MRI, both in terms of scanning time as well as analysis in order to make a full-body MRI affordable for everybody? My wife loved the idea, and that’s always a good sign.”

In January 2017, Emi discovered a research paper that was comparing the current way to screen for prostate cancer — a Prostate-Specific Antigen (PSA) blood test followed by a biopsy — with MRIs as an alternative. “An MRI was by far more accurate and a much better patient experience, and so I was like, this is it. It can work. And that’s how Ezra was born.”

THE FUTURE

Emi has big plans for Ezra going forward, and this year’s LDV Vision Summit is one step in that direction. He hopes to meet people working in the vision tech space, particularly within healthcare.

Although Ezra has been live just since January 7th of this year — 20 people were scanned in the first three weeks — its early results are very promising.

“The first person we scanned had early-stage prostate lesions. That really makes you wake up in the morning and go at it,” says Emi.

Out of the first 20 scanned, three had early-stage prostate lesions they were unaware of. Two early users came in with elevated PSA levels, but the MRIs showed no lesions, obviating the need to do prostate biopsies.

The long-term potential for Ezra — going beyond prostate screenings — is also clear.

“We helped one man who thought he was dying from cancer find out that he had no cancer but that he was likely an undiagnosed diabetic,” says Emi. “We gave him the information for his urologist and physician to make that diagnosis. We checked with him a month later and he’s over the moon happy. We helped him get peace of mind that he doesn’t have cancer, as well as diagnose a disease he wasn’t aware he had.”

Even though these results have been powered by AI and MRIs, Emi is emphatic that Ezra is not an AI company.  “We want to help people detect cancer early...and we will build any technology necessary. We think of ourselves as a healthcare company leveraging AI, not the other way around,” he says.

While Ezra’s current focus is prostate cancer, expanding to other cancers that affect women is on the horizon. After all, as Emi points out, “Women are the most preventative-focused type of individual, for themselves and their families.” To underscore that point, Emi says that many of the early adopters for prostate screenings have come at the encouragement of men’s female partners.

“The way we are approaching expansion is based on cancer incidence. We’re starting with the cancers that are most prevalent across society, with prostate being one of them, breast, lungs, stomach, and our ultimate goal is to be able to do one scan a year and find cancer in any of those organs.

“Our ultimate goal is to do a full body MRI analyzed by AI and find any and all cancer. In a decade, I hope we will have gotten there.”

If you’re building a unique visual tech company, we would love for you to join us. At LDV Capital, we’re focused on investing in deep technical people building visual technology businesses. Through our annual LDV Vision Summit and monthly community dinners, we bring together top technologists, researchers, startups, media/brand executives, creators and investors with the purpose of exploring how visual technologies leveraging computer vision, machine learning and artificial intelligence are revolutionizing how humans communicate and do business.

Current Artificial Neural Networks Are Suboptimal Says DeepMind Scientist

Viorica Patraucean, LDV Vision Summit 2018 ©Robert Wright

Viorica Patraucean, LDV Vision Summit 2018 ©Robert Wright

Early Bird tickets now available for our LDV Vision Summit 2019 - May 22 & 23 in NYC at the SVA Theatre.  80 speakers in 40 sessions discuss the cutting edge in visual tech. Register now!

Viorica Patraucean is a Research Scientist at Google DeepMind. At our Vision Summit 2018 she enlightened us with her recent work on massively parallel video nets and how it’s especially relevant for real world low-latency/low-power applications. Previously she worked on 3D shapes processing in the Machine Intelligence group of the Engineering Department in Cambridge, after completing a PhD in image processing at ENSEEIHT–INP Toulouse. Irrespective of the modality - image, 3D shape, or video - her goal has always been the same: design a system that comes closer to human perception capabilities.

As most of you here, I'm interested in making machines see the world the way we see it. When I say machines, I'm thinking of autonomous cars or robots or systems for augmented reality. These are all very different applications, of course, but, in many cases, they have one thing in common, they require low latency processing of the visual input. In our work, we use deep artificial neural networks which consist of a series of layers. We feed in an image, this image is processed by each layer of the network, and then we obtain a prediction, assuming that this is an object detector, and there's a cat there. We care about cats and all.

Just to make it clear what I mean by latency – I mean the time that passes between the moment when we feed in the image and the moment when we get the prediction. Here, obviously, the latency is just the sum of the computational times of all the layers in the network.

Now, it is common practice that, if we are not quite happy with the accuracy of the system, we can make the systems deeper by adding more layers. Because this increases the capacity of the network, the expressivity of the network, we get better accuracy. But this doesn't come for free, of course. This will lead to increasing the processing time and the overall latency of the system. Current object detectors run at around five frames per second, which is great, of course, but what does five frames per second mean in real world?

I hope you can see the difference between the two videos here. On the left, you see the normal video at 25 frames per second and, on the right, you see the five frames per second video obtained by keeping every fifth frame in the video. I can tell you, on the right, the tennis ball appears in two frames, so, if your detector is not perfect, it might fail to detect it. Then you're left to play tennis without seeing the ball, which is probably not ideal.

The question then becomes, how can we do autonomous driving at five frames per second, for example? One answer could be like this, turtle power. We all move at turtle speed, but probably that's not what we are after, so then we need to get some speed from somewhere.

One option, of course, is to rely on hardware. Hardware has been getting faster and faster in the past decade. However, the faster hardware normally comes with higher energy consumption and, without a doubt, on embedded devices, this is a critical constraint. So, what would be more sustainable alternatives to get our models faster?

Let's look at what the brain does to process a visual input. There are lots of numbers there. Don't worry. I'll walk you through them. I'm just giving a list of comparison between a generic artificial neural network and the human brain. Let's start by looking at the number of basic units, which in the brain are called neurons and their connections are called synapses.

Here, the brain is clearly superior to any model that we have so far by several orders of magnitude, and this could explain, for example, the fact that the brain is able to process so many things in parallel and to achieve such high accuracy. However, when we look at speed of basic operation, here we can see that actually our electronic devices are much faster than the brain, and the same goes for precision of computation. Here, again, the electronic devices are much more precise.


“Current systems consider the video as a collection of independent frames. And, actually, this is no wonder since the current video systems were initially designed as image models and then we just run them repeatedly on the frames of a video.”


However, as I said, speed and precision of computation normally come with high power consumption to the point where like a current GPU will consume about 10 times more than the entire human brain so. Yet, with all these advantages on the side of the electronic devices, we are still running at five frames per second when the human brain can actually run at more. The human brain can actually process more than 100 frames per second, so this points to the fact that.

I'm going to argue here that the reason for this suboptimal behavior comes from the fact that current systems consider the video as a collection of independent frames. And, actually, this is no wonder since the current video systems were initially designed as image models and then we just run them repeatedly on the frames of a video. By running them in this way, it means that the processing is completely sequential. Except, the processing that happens on GPU where we can parallelize things. Overall, it still remains sequential, and then, the older layers in the network, they all work at the same pace, and this is opposite to what the brain does.

©Robert Wright/LDV Vision Summit

©Robert Wright/LDV Vision Summit

There is a high evidence that the brain actually exhibits a massively parallel processing mode and also that the neurons fire at different frame rates. All this because the brain rightfully considers the visual stream as a continuous stream that exhibits high correlations and redundancy across time.

Just to go back to the initial sketch, this is how our current systems work. You get an image. This goes through every layer in the network. You get a prediction. The next image comes in. It goes again through every layer and so on. What you should observe is that, at any point in time, only one of these layers are working and all the others are just waiting around for their turn to come.

This is clearly not useful. It's just wasting resources, and the other thing is that everybody works at the same pace, and, again, this is not needed if we take, for example, in account the slowness principle, and I'm just trying to depict here what that means. This principle informally states that fast varying observations are explained by slow varying factors. 

If you look at the top of the figure on the left - those are the frames of a video depicting a monkey. If you look at the pixel values in the pixel space, you will see high variations because of some light changes or the camera moves a bit or maybe the monkey moves a bit. However, if we look at more abstract features of the scene, for example, the identity of the object of the position of the object, this will change much more slowly.

Now, how is this relevant for artificial neural networks? It is quite well-understood now that deeper layers in an artificial neural network extract more and more abstract features, so, if we agree with the slowness principle, then it means that the deeper layers can work at a slower pace than the layers that are the input of the network.

Now, if we put all these observations together, we obtain something like this. We obtain like a Christmas tree, as shown, where all the layers work all the time, but they work at different rates, so we are pipelining operations, and this generates more parallel processing. We can now update our layers at different rates.

Initially, I said that the latency of a network is given by the sum of the computation times of all the layers in the network. Now, very importantly, with our design, the latency is now given by the slowest of the layers in the network. In practice, we obtain up to four times faster response. I know it's not the 10 times, but four is actually enough because, In perception, once you are past the 16 frames per second, then you are quite fine, I think.

We obtain this faster response with 50% less computation, so I think this is not negligible and, again, very important, we can now make our networks even deeper to improve their accuracy without affecting the latency of the network.

I hope I convinced you that this is a more sustainable way of creating a low latency video models, and I'm looking forward to the day where our models will be able to process everything that the camera can provide. I'm just showing here a beautiful video captured at 1,000 frames per second, I think this is the future.

Thank you.

Watch Viorica Patraucean’s keynote at our LDV Vision Summit 2018 below and checkout other keynotes on our videos page.

Early Bird tickets are now available for the LDV Vision Summit May 22 & 23, 2019 in NYC to hear from other amazing visual tech researchers, entrepreneurs and investors.

We are accepting applications to our Vision Summit Entrepreneurial Computer Vision Challenge for computer vision research projects and our Startup Competition for visual technology companies with <$2M in funding. Apply now & spread the word.

LDV Capital is Looking for Summer Analysts - Come Collaborate with Us!

LDV_20180523_0153 (1).jpg

We at LDV Capital are currently recruiting our analyst interns for Summer 2019 in NYC.

LDV Capital is a thesis-driven early stage venture fund investing in people building visual technology businesses that leverage computer vision, machine learning and artificial intelligence to analyze visual data.

We are looking for two entrepreneurial minded, visual tech savvy individuals to join our  team from May - August 2019. We are looking for:

  • Analyst intern who has experience with startups or venture capital. Interested in learning more about venture capital, market mapping, investment research, due diligence and how to build a successful startup.

  • Technical Analyst intern with a deep-seated interest in computer vision, entrepreneurship and venture capital. Looking to learn more about venture capital, business operations, due diligence and how to run a successful start up.

We are a small team and work closely with our summer interns on many aspects of building startups and investing. Our goals are to introduce our interns to the evaluation process of teams, technology and businesses.  We help them connect and collaborate with entrepreneurs across the globe, and to lead them on a deep dive into sectors being disrupted by visual technologies. We work to make our internships unique in three ways:


1. We expose our interns to the leading edge visual technologies that will disrupt all industries and society in the next 20 years.

We invest in companies that are working to improve the world we live in with visual technology. As such, our horizontal thesis drives us to look at and invest in any and all industries where deep technical teams are using computer vision, machine learning and artificial intelligence to solve a critical problem.

Since our interns have a seat at the table with our deal flow - you are pushed to educate yourself on numerous industries and applications of visual tech in order to understand and evaluate the value proposition of the early stage startups coming through the pipeline.

At LDV Capital you could be reviewing the pitch deck of a visual tech company in agriculture in the morning and sitting in on a call with a visual tech healthcare startup in the afternoon.

You will be asked to develop and present market trends, competitive landscapes, business opportunities and more for cutting edge, early stage technology companies. You will be consulted for your valuable opinions on those companies and technologies during weekly deal flow meetings.

While it is challenging work, the versatile knowledge of visual technologies and applications to many industries you develop over the course of the summer are applicable to almost any opportunity you pursue after your internship LDV.

“My summer internship with LDV Capital provided a unique opportunity to interact with countless visual tech entrepreneurs while experiencing the excitement of early-stage investing. Most of all, the experience was a front row seat to the newest sensing, display, and automation trends which underpin my life goals and which will revolutionize the world as we know it.”
-
Max Henkart, Summer Analyst 2018

Max  is currently exploring multiple robotics spin-outs, consulting with camera development/supplier management teams, considering full time roles in VC/CVC/Self-Driving firms, and graduating from CMU’s MBA program in May 2019. ©Robert Wright

Max is currently exploring multiple robotics spin-outs, consulting with camera development/supplier management teams, considering full time roles in VC/CVC/Self-Driving firms, and graduating from CMU’s MBA program in May 2019. ©Robert Wright

2. We empower our interns to own their own projects.

Whether you are conducting a market landscape review, investigating a unique application of computer vision, or doing a trend analysis, we want you to own it. We are here to help guide you on planning, setting milestones, creating materials, presenting your work, but we believe in “teaching you to fish.”

“I really enjoyed working with the LDV team, and I learned a lot from the experience. LDV gave me a lot of responsibility, and I was able to learn what it is like to work as a venture capitalist. There is no better way to learn about entrepreneurship and venture capital.”
-
Ahmed Abdelsalam, Summer & Fall Analyst 2018

Ahmed  is currently completing the final semester of his MBA at the University of Chicago, Booth School of Business ©Robert Wright

Ahmed is currently completing the final semester of his MBA at the University of Chicago, Booth School of Business ©Robert Wright

There is no bigger example than our annual LDV Insights research project, where we deep dive into a sector or trend with prime opportunities for visual technology businesses. Our interns contribute to the project plan, conduct the research, interview experts, analyze the data, write the slides, and are named authors in the publication.

In 2017, our research found that “45 Billion Cameras by 2022 Fuel Business Opportunities” and it was published by Interesting Engineering and others.

LDV Capital - 5 Year Visual Tech Market Analysis 2017.005.jpeg

In 2018, we identified “Nine Trends Where Visual Technologies Will Improve Healthcare by 2028” and published it on TechCrunch.

As one facet of your internship, 2019 Summer Analysts will be working on our third annual LDV Insights report on a very exciting, immensely growing sector of the economy. In your application, let us know what you think the sector for our 2019 Insights report will be.

“Interning for LDV was truly one of the most rewarding experiences in my career thus far. Working in the smaller environment allowed me to work closely with the GP and gain insight into the VC process. Unlike some other busy-work dominated internships, LDV provided an opportunity to own my own project, develop a research report that was ultimately published by the firm.”
-
Sadhana Shah, Summer Analyst 2017

Sadhana  is currently finishing her final semester at NYU Stern School of Business, with a double major in Management and Finance with a minor in Social Entrepreneurship. After graduation, she will be joining the KPMG Innovation Lab as an Associate. ©Robert Wright

Sadhana is currently finishing her final semester at NYU Stern School of Business, with a double major in Management and Finance with a minor in Social Entrepreneurship. After graduation, she will be joining the KPMG Innovation Lab as an Associate. ©Robert Wright

3. We provide opportunities to network with startups, technologists & other investors.

At LDV Capital, you don’t get stuck behind a desk all day, every day. Our interns kick off their summer with our sixth annual LDV Vision Summit which has about 600 attendees, 80 speakers, 40 sessions, 2 competitions over 2 days. Interns also help and attend our LDV Annual General Partners Meeting.  Your second week of the internship looks like this:

  • Monday - Help facilitate the subfinals for our Startup Competition and Entrepreneurial Computer Vision Challenge, watching the pitches of +40 competitors and hearing the feedback and evaluation by expert judges.

  • Tuesday - Assemble aspects of our annual report and attend our Annual General Meeting for investors as well as a dinner for our investors, portfolio companies & expert network.

  • Wednesday - attend our LDV Vision Summit, listening to keynotes about state of the art visual technologies. Attend our VIP Reception for all our speakers & sponsors.

  • Thursday - second day of our LDV Vision Summit.

The rest of the summer is filled with opportunities to attend our gender-balanced LDV Community dinners, meet with startups, go to industry events, watch pitch competitions and more.

“Spending time at LDV Capital was an unforgettable experience. I’m thankful for access to A+ investors and entrepreneurs, collaborating with a world class team and a front row introduction to VC."
-
Danilo Vicioso, Summer Analyst 2018

Danilo  is currently an EIR at Prehype, a corporate innovation and venture studio behind startups like Ro, ManagedByQ, BarkBox and AndCo. ©Robert Wright

Danilo is currently an EIR at Prehype, a corporate innovation and venture studio behind startups like Ro, ManagedByQ, BarkBox and AndCo. ©Robert Wright

Apply before Feb 28, 2019 for consideration.

If you believe you have the skills, experience and motivation to join our team and would like to gain more knowledge over the summer in:

  • Computer vision, machine learning and artificial intelligence

  • Market mapping

  • Investment research

  • Startup due diligence

  • Startup operations

  • Technical research

  • Trend analysis

  • Data analysis

  • Networking with entrepreneurs, other investors & technologists

Read carefully through everything you can find out about us online and then submit a concise application showcasing why you are a great fit for the opportunity by February 28.

We carefully consider all applications and will get back to you ASAP. Thanks!

Apply now.


Rebecca Kaden of USV Discusses Trends in Visual Tech Investing

At the LDV Vision Summit 2018, Rebecca Kaden of Union Square Ventures shared her insights on investing at the intersection of standout consumer businesses and vertical networks.

Rebecca and Evan Nisselson of LDV Capital discussed the greatest challenges they have seen their portfolio companies endure and how strategic team building is critical to success at the earliest stages of company building. Watch their chat here:

Our Sixth Annual LDV Vision Summit will be May 22 & 23, 2019 in NYC. Early bird tickets are currently on sale. Sign up for our newsletter to receive news, updates and discounts.

Thank You for Making Our 5th Annual LDV Vision Summit a Success!

Day 1 Fireside Chat with Eric Fossum, CMOS Image Sensor Inventor and Dartmouth,&nbsp;Professor.&nbsp;&nbsp;©Robert Wright/LDV Vision Summit 2018

Day 1 Fireside Chat with Eric Fossum, CMOS Image Sensor Inventor and Dartmouth, Professor.  ©Robert Wright/LDV Vision Summit 2018

Our fifth annual LDV Vision Summit was fantastic, a giant thank you to everyone who came and participated in making it another spectacular gathering.

We couldn’t do it without you, YOU are why our annual LDV Vision Summit is special and a success every year. Thank you!

We are honored that you fly in from around the world each year to share insights, inspire, do deals, recruit, raise capital and help each other succeed!  

Congratulations to our competition winners:
- Startup Competition:  MoQuality, Shauvik Roy Choudhary, Co-Founder & CEO
- Entrepreneurial Computer Vision Challenge: “Flatcam”, Jesse Adams, Rice University, PhD Candidate

“If you are a startup that is trying to create a breakthrough technology you have to be here. If you are investor that wants to look at interesting technologies and startups you have to be here. This is the place for computer vision.” Shauvik Roy Choudhary - CEO & Co-Founder of MoQuality.

“LDV hosts an event that stands out in the sea of conferences by its focus, quality, and ability to attract exceptional entrepreneurs and investors all deeply curious and immersed in visual technology.” Rebecca Kaden - Union Square Ventures, Partner.

“I go to a lot of academic conferences and this to me feels like a really fresh, different type of conference. It’s a great mix of industry, startups, investors; that mix of people brings a totally different energy and dialogue than the academic conferences I’m used to going to. In terms of the conferences I attend in a year, this is a nice change of pace for me." Matt Uyttendale - Facebook, Director of Core AI in the AR camera group.

A special thank you to Rebecca Paoletti of CakeWorks and Serge Belongie of CornellTech as the summit would not exist without collaborating with them!

Matt Uyttendale, Facebook, Director of Core AI, Facebook Camera &nbsp;©Robert Wright/LDV Vision Summit 2018

Matt Uyttendale, Facebook, Director of Core AI, Facebook Camera  ©Robert Wright/LDV Vision Summit 2018

“What I like about this is it has both focus on the visual area, but a lot of different aspects and a lot of different stakeholders with very different perspectives, so you kind of get a multi-factorial view of the field.” Daniel Sodickson - NYU School of Medicine, Vice Chair Research, Dept. Radiology.

I work quite deeply in my technical research silo in next-generation image sensor technology.  The Vision Summit was a great chance to hear what the rest of the vision community is thinking about and working on, in an effective and condensed format, and to meet and discuss those interesting topics in person.” Eric Fossum - Dartmouth, Professor & CMOS Image Sensor Inventor.

"The LDV summit is the definitive gathering of investors and entrepreneurs in the machine vision world.  And, thanks to Evan and his team, it's a heck of a lot of fun." Nick Rockwell - New York Times, CTO.

“The LDV Vision Summit gave an intimate and powerful look at how computer vision will change almost every sector of the economy. As a native New Yorker, I'm happy that one of my favorite events of the year brought me out of Silicon Valley and back to NYC.” Zach Barasz - BMW i Ventures, Partner.

“The summit provided us with a unique opportunity to hear from and engage with industry influencers, academic thought-leaders, investors and fellow entrepreneurs.  Getting a holistic view of emerging trends from the entire vision ecosystem is incredibly valuable and will enable us to better anticipate and navigate our road ahead.” Yashar Behzadi - Neuromation, CEO.

“I think it’s a very vibrant community and the diversity of people from academia to those in the startup world and those who have already done startups and research is an interesting mix. As well as a set of investors who are dedicated to bringing these visual technologies out there to the real world.“ Spandana Govindgari - Co-Founder of HypeAR.

Panel Day 1: Trends in Visual Technology Investing [L to R]&nbsp;Polina Marinova, Fortune Magazine, Zach Barasz, BMWi Ventures, Jenny Lefcourt, Freestyle Capital &nbsp;©Robert Wright/LDV Vision Summit 2018&nbsp;

Panel Day 1: Trends in Visual Technology Investing [L to R] Polina Marinova, Fortune Magazine, Zach Barasz, BMWi Ventures, Jenny Lefcourt, Freestyle Capital  ©Robert Wright/LDV Vision Summit 2018 

LDV 5-23-2018 - Day 1 - Panel 1 - AR beyond hot dogs.jpg

“If you are interested in computer vision, whether you’re doing research, whether you’re building a company or investing in computer vision companies, I’d say this is the place to be! Everybody here is interested and passionate about computer vision.” Dan Ryan - VergeSense, CEO & Co-Founder.

“I came here because, as a research engineer, this is a unique opportunity to get a perspective on the whole ecosystem. You get to see startups, you get to see people who are non-technical, and you get to see students. So for me it’s very interesting to get to see the whole ecosystem and get a sense of what the needs of the community are.” Raghu Krishnamoorthi - Google, Software Engineer, Tensorflow for Mobile.

“LDV Vision Summit is a great event with thought provoking talks. Rarely do you find an event with each talk as riveting as the next - I look forward to attending the next one.” Elizabeth Mathew - Columbia University, MBA Candidate.

“If you are at all curious about the importance of computer vision, or vision in general, it’s important for you to understand how the technology works and where its applications lie, because we are moving in a direction where computer vision is going to be incredibly important in all applications of life. The LDV Vision Summit is a great conference to learn the different facets of it.” Steve Kuyan - NYU Future Labs, Managing Director.

“It was a great day, I was very impressed with the presentations and the competitions; it’s a fun combination of people from academics and industry…overall I really enjoyed it." Michael Rubinstein - Senior Research Scientist, Google.

Jason Eichenholz, Luminar Technologies, CTO &amp; Co-Founder ©Robert Wright/LDV Vision Summit 2018

Jason Eichenholz, Luminar Technologies, CTO & Co-Founder ©Robert Wright/LDV Vision Summit 2018

Ying Zheng, AiFi, Chief Science Officer &amp; Co-Founder ©Robert Wright/LDV Vision Summit 2018

Ying Zheng, AiFi, Chief Science Officer & Co-Founder ©Robert Wright/LDV Vision Summit 2018

“Some of the technical depth that came into some of the presentations was unbelievable…it’s a diverse group of people from executives, to researchers, and everyone in between.” Anthony Sarkis, Computer Vision Researcher and Entrepreneur.

“Seeing all the latest developments in machine learning, and all of these applications, was really impressive… it’s a great mix of things you cannot find in any other classic academic conference, a combination of the business side and the science side of computer vision.” Anastasia Yendiki - Assistant Professor of Radiology, Harvard Medical School.

“Awesome energy… it brings together a lot of people from research, industry, and startups; it’s a great, fun experience with lots of networking opportunities.”  Ryan Benmalek - Ph.D. Student, Cornell University.

“This is one of the best places to learn about what is happening in the vision world, to connect AI with investors, and to think about what the solutions of the future are going to be…it's an interesting place with a broad range of actors in the space who can talk about what they are working on, and exchange ideas.” Renaud Visage - CTO & Co-Founder of Eventbrite.

Anthony Johnson, Giphy, CTO &nbsp;©Robert Wright/LDV Vision Summit 2018

Anthony Johnson, Giphy, CTO  ©Robert Wright/LDV Vision Summit 2018

Fireside Chat Day 1: Rebecca Kaden, Union Square Ventures, Partner &nbsp;©Robert Wright/LDV Vision Summit 2018

Fireside Chat Day 1: Rebecca Kaden, Union Square Ventures, Partner  ©Robert Wright/LDV Vision Summit 2018

Assaf Glazer, Nanit, CEO &nbsp;©Robert Wright/LDV Vision Summit 2018

Assaf Glazer, Nanit, CEO  ©Robert Wright/LDV Vision Summit 2018

“If you want to learn about anything in the visual space, with a technical analysis not just high-level lay-up questions, this is where you go.” Trace Cohen, Managing Director, NY Venture Partners

“This is a very unique conference...and I go to conferences frequently. Something here in the way that its organized, the talent that they bring and the cadence of that.…this is an inspiring community for experts and for people that are interested in computer vision technology and the content around it.” Assaf Glazer - CEO & Co-Founder of Nanit.

“Great event, super high quality, very focused and targeted. Most events even if they are high quality are much broader in nature, so at LDV you’re able to go very deep. I’d highly recommend this event to anyone interested in this technology.” Ophir Tanz - CEO & Co-Founder of GumGum.

“The event is inspiring. Ted Talks meets computer vision meets the future. But also it’s a family. People are here to help each other grow and that makes this unique from other events. ” Brian Brackeen - CEO & Co-Founder of Kairos.

“Vision summit is a great confluence of researchers, practitioners and visionaries talking about topics which are related but don’t often get put into the same room and probably should be more often.” Anthony Johnson, CTO of Giphy.

“If you want a set of like-minded people who are really really trying to push the boundaries in computer vision, this is the place to be.” Inderbir Sidhu, CTO TVision Insights.

“The energy at the show is fantastic. There’s such a cross-disciplinary field for [computer vision] so we’re able to talk to both the researchers and the investors who help accelerate moving the technology forward.” Jason Eichenholz, CTO & Co-Founder of Luminar Technologies.

Panel Day 2: How Will Technology Impact Our Trust in Visual Content? [L to R]&nbsp;Moderator: Jessi Hempel, Wired, Senior Writer; Panelists: Nick Rockwell, New York Times, CTO and Karen Wickre, KVOX Media, Founder.&nbsp;&nbsp;©Robert Wright/LDV Vision Summit 2018

Panel Day 2: How Will Technology Impact Our Trust in Visual Content? [L to R] Moderator: Jessi Hempel, Wired, Senior Writer; Panelists: Nick Rockwell, New York Times, CTO and Karen Wickre, KVOX Media, Founder.  ©Robert Wright/LDV Vision Summit 2018

Learn more about our partners and sponsors:

Organizers:
Presented by Evan Nisselson & Abby Hunter-Syed, LDV Capital
Video Program: Rebecca Paoletti, CakeWorks, CEO
Computer Vision Program: Serge Belongie, Cornell Tech
Competitions Subfinal Judges: Rob Spectre, Brooklyn Hacker; Alexandre Winter, Netgear; Andy Parsons, WorkFrame; Jan Erik Solem, Mapillary
Universities: Cornell Tech, School of Visual Arts
Sponsors: Coatue Management, Facebook, GumGum, Adobe, ImmerVision, Neuromation, Google Cloud
Media Partners: Kaptur, VizWorld, The Exponential View
Coordinator Entrepreneurial Computer Vision Challenge: Ryan Benmalek, Cornell University, Doctor of Philosophy Candidate in Computer Science

CakeWorks is a full-service video agency and content studio that helps businesses launch better video experiences, grow viewership and increase revenue. Stay in the know with our  weekly newsletter, or follow us @cakeworksvideo #videoiscake  

Cornell Tech is a revolutionary model for graduate education that fuses technology with business and creative thinking. Cornell Tech brings together like-minded faculty, business leaders, tech entrepreneurs and students in a catalytic environment to produce visionary ideas grounded in significant needs that will reinvent the way we live.

Fireside Chat Day 2:&nbsp;Amol Sarva, Knotel, CEO &amp; Co-Founder &nbsp;©Robert Wright/LDV Vision Summit 2018

Fireside Chat Day 2: Amol Sarva, Knotel, CEO & Co-Founder  ©Robert Wright/LDV Vision Summit 2018

Keynote Day 1: Anastasia Yendiki, Harvard Medical School, Assistant Professor of Radiology ©Robert Wright/LDV Vision Summit 2018

Keynote Day 1: Anastasia Yendiki, Harvard Medical School, Assistant Professor of Radiology ©Robert Wright/LDV Vision Summit 2018

Coatue Management, L.L.C, founded by Philippe Laffont, is a technology focused global investment manager with offices in New York, Menlo Park, San Francisco and Hong Kong.  Coatue launched in 1999 and currently has ~$16 billion in assets under management through public and private investments.

Facebook’s mission is to give people the power to share and make the world more open and connected. Achieving this requires constant innovation. Computer Vision researchers at Facebook invent new ways for computers to gain a higher level of understanding cued from the visual world around us. From creating visual sensors derived from digital images and videos that extract information about our environment, to further enabling Facebook services to automate visual tasks. We seek to create magical experiences for the people who use our products.

GumGum is an artificial intelligence company with a focus on computer vision. Our mission is to unlock the value of visual content produced daily across diverse data sets. We teach machines to see in order to solve hard problems. Since 2008, the company has applied its patented capabilities to serve a variety of industries from advertising to professional sports, with more to come.

Adobe is the global leader in digital marketing and digital media solutions. Our tools and services allow our customers to create groundbreaking digital content, deploy it across media and devices, measure and optimize it over time and achieve greater business success. We help our customers make, manage, measure and monetize their content across every channel and screen.

ImmerVision are experts in Intelligent Vision and wide-angle imagery for professional, consumer, and industrial applications.  ImmerVision technology combines advanced lens design, innovative Data-in-Picture marking, and proprietary image processing to provide an AI-ready machine vision system to OEMs, ODMs, and the global imaging ecosystem.

Neuromation is a distributed synthetic data platform for Deep Learning Applications.

The Google Cloud Startup Program is designed to help startups build and scale using Google Cloud Platform.  We are a small team with startups in our DNA. We appreciate what makes early-stage companies tick, and we think that Google Cloud’s continued success over the next decade will be fueled by great companies yet to be born.  But like you, we’re mostly here because startups are challenging and fun and we wouldn’t have it any other way.

Jesse Adams, Winner of ECVC 2018 for "Flatcam", Rice University, PhD Candidate &nbsp;©Robert Wright/LDV Vision Summit 2018

Jesse Adams, Winner of ECVC 2018 for "Flatcam", Rice University, PhD Candidate  ©Robert Wright/LDV Vision Summit 2018

Kaptur is the first magazine about the visual tech space. News, research and stats along with commentaries, industry reports and deep analysis written by industry experts.

Exponential View is one of the best newsletters about the near future of technology and society. Azeem Azhar critically examines the fast pace of change, and its deep impact on the economy, culture, and business. It has been praised by the former CEO of Reuters, founder of Wired, and the deputy editor of The Economist, among others.

LDV Capital invests in deep technical teams building visual technology businesses.

Mapillary is a community-based photomapping service that covers more than just streets, providing real-time data for cities and governments at scale. With hundreds of thousands of new photos every day, Mapillary can connect images to create an immersive ground-level view of the world for users to virtually explore and to document change over time.

The MFA Photography, Video and Related Media Department at the School of Visual Arts is the premiere program for the study of Lens and Screen Arts. This program champions multimedia integration, interdisciplinary activity, and provides ever-expanding opportunities for lens-based students.


Didn't get a chance to tell us what you are working on at the Summit? Get in touch & let us know about your recent research or startup that you are working on.


VizWorld.com covers news and the community engaged in applied visual thinking, from innovation and design thinking to technology, media and education. From the whiteboard to the latest OLED screens and HMDs, from visual UX to movie making and VR/AR/MR, VizWorld readers want to know how to put visual thinking to work.

AliKat Productions is a New York-based event management and marketing company: a one-stop shop for all event, marketing and promotional needs. We plan and execute high-profile, stylized, local, national and international events, specializing in unique, targeted solutions that are highly successful and sustainable. #AliKatProd

Robert Wright Photography clients include Bloomberg Markets, Budget Travel, Elle, Details, Entrepreneur, ESPN The Magazine, Fast Company, Fortune, Glamour, Inc. Men's Journal, Newsweek (the old one), Outside, People, New York Magazine, New York Times, Self, Stern, T&L, Time, W, Wall Street Journal, Happy Cyclist and more…

Prime Image Media works with clients large and small to produce high quality, professional video production. From underwater video to aerial drone shoots, and from one-minute web videos to full blown television pilots... if you want it produced, they can do it.

Pam Majumdar is a Virginia Beach-based social media content writer and community growth strategist for executives and brands looking to amplify their relevance and reach. Pam combines data and creativity to shape and execute original content strategies -- often from scratch.

Celebrating MoQuality's victory in the 2018 Startup Competition [L to R] Evan Nisselson of LDV Capital, Shauvik Roy Choudhary of MoQuality, Serge Belongie of Cornell Tech with August &amp; Emilia Belongie, Abby Hunter-Syed of LDV Capital and Rebecca Paoletti of CakeWorks.&nbsp;&nbsp;©Robert Wright/LDV Vision Summit 2018

Celebrating MoQuality's victory in the 2018 Startup Competition [L to R] Evan Nisselson of LDV Capital, Shauvik Roy Choudhary of MoQuality, Serge Belongie of Cornell Tech with August & Emilia Belongie, Abby Hunter-Syed of LDV Capital and Rebecca Paoletti of CakeWorks.  ©Robert Wright/LDV Vision Summit 2018

An Academic Budget Inspired Raquel Urtasun to Design Affordable Solutions for Self-Driving

Raquel Urtasun.jpg

One week until our LDV Vision Summit 2018 - May 23 & 24 in NYC at the SVA Theatre. Limited tickets are still available to see 80 speakers in 40 sessions discuss the cutting edge in visual tech. Register now!

Raquel Urtasun is a recipient of NVIDIA Pioneers of AI Award, three Google Faculty Research Awards and several more. She lectures at the University of Toronto and the Vector Institute and is the head of Uber ATG, Toronto. At our LDV Vision Summit 2017, she spoke about how autonomous vehicles with human perception will make our cities smarter and better to live in.

It's my pleasure to be here today, and I wanted to introduce who I am just in case you guys don't know.

So I have three jobs, which keeps me quite busy. I am still an academic, one day a week I am I the University of Toronto and the Vector Institute which I co-found with a whole bunch of people that you see in the picture including Geoff Hinton. And the latest greatest news, I guess, as of May 1st 2017, I'm also heading a new lab of Uber ATG in Toronto, so self-driving cars are in Canada now and that's really, really exciting.

Today, I'm going to talk about what led to the Uber acquisition [of the University of Toronto team]. Perhaps you have already seen another discussion about why we need self-driving cars, but what is very important for me is actually that we need to lower the risk of accidents, we need to provide mobility for many people that right now cannot go to the place they want to go, and we need to think of the future of public transportation or ride sharing. In particular, we need to share resources. Ninety-five percent of the time the car is parked, so we are just utilizing our planet without real reason.

© Robert Wright/LDV Vision Summit 2017

© Robert Wright/LDV Vision Summit 2017

If we look at typically what is going on in self-driving car companies we find: they're pretty good at localization, path planning, and obstacle avoidance, but there are two things that they do which actually made them not super scalable. The first thing is LIDAR, the prices are dropping, but it is still quite expensive to buy a decent LIDAR. And the other thing, which is the been in the closet, is actually mapping.

What I have been working for the past seven years is how to make solutions that are scalable, meaning cheap sensors and trying to drive without maps or with as little prior knowledge as possible.

Now if you want to do something of this form, we need to think about many different things at once. The first thing that us at academic was difficult was data and so we created many years ago, I guess, it's still the only benchmark for self-driving which is KITTI. And to my despair, this is still the only benchmark, which I don't understand.

If we want to get rid of the LiDAR, get rid of the maps, one of the things that we need to...have is robust, good, and fast, stereo 3D reconstruction.

The other thing that is important is learning. Right, one can't just handcraft everything, because we need to be robust with scenarios that we have never seen before. We need holistic models to reason many things. At the end of the day, we have fixed computation for many things, many tasks, and we need to think of hardware at the same time.

If we want to get rid of the LiDAR, get rid of the maps, one of the things that we need to do is apply deep learning to have is robust, good, and fast, stereo 3D reconstruction. This can run real-time and after forty meters can basically almost replace the LIDAR.

Other things that you need to do is work on perception. You spend the past year and a half obsessed with instance segmentation. This is where you're segmenting the image. The idea is that you have a single image and you are interested in labeling every pixel but not just with the category of car, road, but also you want to estimate - this is one car, this is another car, etc... And this is a particularly difficult problem for deep learning because the loss function is agnostic, dupe or imitation. So we've built some interesting technology lately based on the what they should transform. It scales really well. It's independent of the number of objects so you can run real-time for anything. And this is triangularization. It's trained in a set of cities and tested in another set of cities. You see the prediction in the middle and the ground truth on the right. Okay so, even with crowded scenes [the model] can actually do pretty well.

Now, if you want to do self-driving, labeling pixels is not going to get you there. Right, so you need to really estimate what's happening everywhere in the scene. This is our latest, greatest results during detection and tracking. This is actually very technically interesting. You can bug propagate through solvers. And here, you see the results of what we have as well.

In general, what you want to do is estimate everything that is in the scenes. So here, we have some results that we had even a couple of years ago, with a single camera mounted on top of the car. The car is driving in intersections it has never seen before and is able to estimate the local map of the intersection. It is creating the map on the fly. It is estimating, whether your car is doing localization as well as estimating where every car is in this scene. And the traffic situation that you see on the bottom left, even though it doesn't see traffic scenes or things like that. So the cars that are color-coded in varying intentions. Basically, here we are estimating where everybody is going in the next couple of seconds. And this is as I said, [with a] single camera [and] new scenarios that we haven't trained.

Other things that you need to do is localization. Localization is an interesting problem, because typically the ways zone is that same way with us. If you go around and then you collect how the world looks like and that's really expensive, meaning that basically you need to know the appearance of the world that [the cars] are in every point in time.

It takes thirty-five seconds of driving to actually localize with a precision of 2 meters

We look at a cartographic environment and the motion of the vehicle to estimate really quickly where the vehicle is in the global coordinate system. Okay, so you see here, so you have a probability distribution over the graph of the road. The vehicles are driving, you have a few miles of the distribution and very quickly we know exactly where this vehicle is.

This is a Manhattan-like scenario, there are two miles of the distribution but again soon we are going to do something where there is only a single location. And this for the whole city of Kalser (NJ) which is two thousand kilometers of road. It takes thirty-five seconds of driving to actually localize with a precision of 2 meters, which is the precision of the maps that we use. These maps are available for free online for sixty percent of the world. So you can just download, you don't need to capture anything; it's free.

Now, in terms of mapping rights, why do car companies or self-driving car, or players use maps? You can think of a map as a sensor, which basically tells you the static part of the scene. It gives you robustness and it allows you to only look at the dynamic objects.

The problem with the way the mapping is done is that you have, say one of these cars with these expensive sensors, and basically you drive around the world, you have your data and then there is some labeling process where you basically say where are the roads, where are the lanes, where are the possible places where can park, etc. Okay, that makes you have very small coverage, because this is at the vehicle level and is very expensive. As an academic I look at "Can we actually do this by spending zero dollars?"

In those terms, we figure you can use aerial images or satellite images. Satellites pass around the earth twice a day so you have this up-to-date view of the world. And we create methods that can automatically extract the HD maps of the form that you see on the top where you have lanes, parking spots, sidewalks, etc. Yes, automatically it takes only 3 seconds in a single computer to get to estimate this perpendicular road. Basically, with a very small cluster of computers, you can run the whole world having up-to-date estimates.

© Robert Wright/LDV Vision Summit 2017

© Robert Wright/LDV Vision Summit 2017

Five and a half years ago, I created KITTI. And one thing that's bugged me about mapping is that is only the players, the companies, that are actually working on this. So, I created Toronto city. This is about to go online soon. The greater Toronto area is twenty percent of the population of Canada; it's huge, and we have all these different views: panoramas, LiDAR, cameras from the area views, drones, etc.

Now, as an academic, I cannot pay Labelers to label [the images]. Just the aerial images are going to cost between twenty to thirty million dollars to label it. What I did was I went to the government and I put all this information from maps that the government has captured through 3D maps of the city, every single building, etc. And then basically, with the veil of algorithms that can align the sources of information including all the different sources of imagery as well as the maps and automatically created ground truth. And here you see the quality of the ground truth is really, really, good. Now, we have ground truth for the whole Greater Toronto Area and we're gonna put online the benchmark where it sends. So this area is the tasks that you can participate with, for instance, semantic segmentation.

A little thing that we have built since then is also implementing ways to be able to extract these maps automatically. You can see from aerial images and one of the thing that was interesting is from the panoramas, you can actually get automatically centimeter accurate maps. That was actually quite interesting. Alright, to conclude, the last seven years, my group has been working on ways to make affordable self-driving cars that scale with a sense and perception, localization, and mapping. Thank you.

LDV Capital is focused on investing in people building visual technology businesses. Our LDV Vision Summit explores how visual technologies leveraging computer vision, machine learning and artificial intelligence are revolutionizing how humans communicate and do business.

Tickets are available for the LDV Vision Summit 2018, where you can hear from other amazing visual tech researchers, entrepreneurs, and investors.

Facebook is building a visual cortex to better understand content and people

Manohar Paluri, Manager of Computer Vision Group at Facebook ©Robert Wright/LDV Vision Summit

Manohar Paluri, Manager of Computer Vision Group at Facebook ©Robert Wright/LDV Vision Summit

Manohar Paluri is the Manager of the Computer Vision Group at Facebook. At our LDV Vision Summit 2017 he spoke about how the Applied Machine Learning organization at Facebook is working to understand the billions of media content uploaded everyday to Facebook in order to improve people’s experiences on the platform and connect them to the right content.

Good morning everyone. Hope the coffee's kicking in. I'm gonna talk about a specific effort, an umbrella of efforts, that we are calling ‘building Facebook's visual cortex.’

If you think about how Facebook started it was people coming together, connecting with friends, with people around you. And slowly through these connections using the Facebook platform to talk about things that they cared about, to upload their moments, whether it's photos. Some may not be the right thing for the platform. Some, obviously moments that you care about. Slowly moving towards video.

This is how Facebook has evolved. The goal for applied machine learning, the group that I'm in, is to take the social graph and make it semantic. What do I mean by that? If you think about all the notes there, the hard notes are basically what people actually interact with, upload, and so on. But those soft notes, the dotted lines, are what the algorithms create. This is our understanding of the people and the content that is on the platform.


LDV Capital invests in people building visual technology businesses. Our fifth annual LDV Vision Summit on May 23 & 24 will discuss the disruptive force of computer vision, machine learning and AI.

Tickets are now available for the LDV Vision Summit 2018 to hear from other amazing visual tech researchers, entrepreneurs, and investors.

Special prices available for startups, researchers, & groups.


This is useful, and thinking about this in this way is scalable because whatever product or the end technology or experience of your building now have access to not only the social graph, but also the semantic information. So you can use it in various ways. This is something that has actually revolutionized the use of computer vision specifically.

Now, if you take a step forward, most likely the last thing that you liked on Facebook is either a photo or a video. When I started in 2012 we were doing a lot of face recognition, but we've started moving beyond that. Lumos is the platform that kind of born out of my internship. So I'm super excited because Lumos today processes billions of images and videos. It has roughly 300-odd visual models that are being built by computer vision experts and general engineers. They don't necessarily need to have machine learning and computer vision expertise. It uses millions of examples. Even though we are making significant process on supervised and unsupervised learning, the best models today are still fully supervised models.

Now, going through a very quick class, the state-of-the-art models today are deep residual networks. Typically what you do is you have a task. You take this state-of-the-art deep network trained for that task. Takes a few weeks, but if you have distributed training then you can bring it down to hours. If you have a new task, the obvious baseline is to take a new deep net and train for the new task.

But, think about Facebook, and think about billions of images and hundreds of models. They don't necessarily multiply and it's not feasible. So what do you do? The nice thing about deep networks are their hierarchical representations. So things that are hierarchical representations, the below parts of the layers, the lower parts of the layers are generalized representations, and the top parts of the layers are specific for the task. Now, if you are at Facebook and you have a task, you should be able to leverage the compute on billions of these images again and again for your task.

image1.jpg

So with Lumos, what people can do, is actually plug at various parts of the layers that suits them. And they make the trade off between computer and accuracy. This is crucial to scale to all the efforts that we are doing for billions of images. Now, as the computer vision group, we might not understand the implications of a loss of accuracy or making something faster because of accuracy. But the group that is building these models know this very well. So with Lumos they are able to do this in a much more simpler manner and in a scalable way.

What is a typical workflow for Lumos? You have lot of tools that allow you to collect training data. One of the nicest things of being at Facebook is you have a lot of data, but it also comes with a lot of metadata. It could be hashtags. It could be text that people write about, and it could be any other metadata. So you can use lots of really cool techniques to collect training data. Then you train the model and you have this control on the trade off between accuracy and compute.

You deploy the model with the click of a button, and the moment you deploy the model every new photo and video that gets uploaded now gets run through your model without you doing any additional bit of engineering. And you can actually refine this model by using active learning. So you are literally doing research and engineering at scale together every day. And you don't need to be an expert in computer vision to do that.

Here is a series of photos that come in and get classified through Lumos, and the concepts that are built through the Lumos. Obviously you can only look at certain portion because we get an image every four minutes.

Lumos today powers many applications. These are some of them. A specific application that I thought would be interesting to talk here is the population density map estimation. So here what happened was connectivity labs cared a lot about where people live so that we can actually provide connectivity technology, different kinds of it, whether it's an urban area or a rural area. So what did they do? They went to Lumos and they trained a simple model that actually takes a satellite file and says whether it's a house or not. And they apply it billions of times on various parts of the world.

Here is a high resolution on the right side that we were able to generate using this Lumos model. And they didn't have to build any new deep net. They just use a representation of one of the existing models. If you apply this billions of times you can detect the houses. And if you do it at scale ... This is Sri Lanka. This is Egypt. And this is South Africa. So based on the density of where people live, you can now use different kinds of technology, connectivity technology whether it's drones, satellites, or Facebook hardware installed in urban areas.


...What Lumos is trying to do, it's actually trying to learn a universal visual representation irrespective of the kind of problem that you are trying to solve.


If you think about what Lumos is trying to do, it's actually trying to learn a universal visual representation irrespective of the kind of problem that you are trying to solve. At F8, which is the Facebook developer conference, we talked about Mask R-CNN. This is the work that came out of Facebook research where you have a single network that is doing classification, detection, segmentation, and human pose estimation.

Think about it for a minute. Just five years ago if somebody had told you that you have one network, same compute, running on all photos and videos, that would give all of this, nobody would have believed it. And this is the world we are moving to. So there is a good chance we'll have a visual representation that is universal.

image3.jpg

Here are some outputs, which are detection and segmentation outputs, and when you actually compare it to ground rule, even for smaller objects, the ground rules and predictions of the algorithm actually match pretty well, and sometimes we cannot distinguish them. And taking it further you can do segmentation of people and pose estimation. So you can actually start reasoning what activities people are engaging in in the photos and videos.

Now, the way we are moving, as rightly pointed out before, is the understanding and this technology is moving to your phone. So your device, what you have, it's pretty powerful. So here are a couple of examples where the camera is understanding what it's seeing whether it's classification of the scene and objects or it's understanding the activities of the people. This is Mass To Go, which is running at few frames per second on your device. To do this we took a look at the entire pipeline, whether it's the modeling or the runtime engine, or the model size.

Taking it a step further, the next frontier for us is video understanding. Here, I'm not gonna play the video, but rather show you the dashboard that actually tells you what is happening in the video. Here is the dashboard. We use our latest face recognition technology to see when people are coming in and going out. We use the latest 3D connect based architectures to understand actions that are happening in the video. And we understand the speech and audio to see what people are talking about. This is extremely important. Now, with this kind of dashboard we have a reasonable understanding of what's happening in the video. But we are just getting started. The anatomy of video is much more complex.

So how do you scale this to 100% of videos? That is non-trivial. We have to do lot of really useful things and interesting things to be able to scale to 100% of videos. You're doing face recognition and actually doing friend tagging. The next one is actually taking it a step further, doing segmentation in video, and doing pose estimation. So we are able to understand people are sitting, people are standing, talking to each other, and so on, with the audio.

That is basically the first layer of peeling off the onion in the video, and there's a lot more we can do here. Now another step that we are taking is connecting the physical world and the visual world. As rightly pointed out we need to start working with LIDAR and 3D data. Here, what you see is the LIDAR data. That is actually we are using deep net to do semantic segmentation of this three-dimensional LIDAR data and doing line of sight analysis on the fly.

We brought down the deployment of Facebook hardware to connect urban cities from days and months to hours, because we were able to use computer vision technology. I have only ten minutes to cover whatever we could do, so I'm going to end it with one statement. I really believe to be able to bring AI to billions of people you need to really understand content and people. Thank you.

You Are Brilliant and You Want More Exposure

Our LDV Vision Summit is coming up on May 23-24, 2018 in New York. We bring together top technologists, researchers, startups, media/brand executives, creators and investors with the purpose of exploring how visual technologies leveraging computer vision, machine learning and artificial intelligence are revolutionizing how humans communicate and do business.   

Through March 25 we are collecting applications to the Entrepreneurial Computer Vision Challenge and the Startup Competition.

Every second of every day, people around the world are publishing research papers and launching new startups that leverage computer vision, machine learning and artificial intelligence.

Researchers and professors want their work to be noticed in the midst of a flood of new work.

Entrepreneurs want to build valuable businesses, get covered in Techcrunch, Wired, Wall Street Journal, want to raise financing and want happy customers.

We want to help you!

We have been organizing the premier annual visual technology summit since 2014 called the LDV Vision Summit with the main focus of showcasing brilliant people like YOU!

Entering competitions increases your odds of being recruited, raising capital, or selling for over $100M because your work becomes visible to an audience of actors working to advance your field.  The key is to focus on attending and competing where it is most contextually relevant for you to further your goals. If you’re in visual tech, that means the LDV Vision Summit.

We bring together top technologists, researchers, startups, media/brand executives, creators and investors with the purpose of exploring how visual technologies leveraging computer vision, machine learning and artificial intelligence are revolutionizing how humans communicate and do business. 

Speakers and judges come from Apple, Cornell Tech, Qualcomm, NBCUniversal, Stanford, Facebook, MIT, Greylock Partners, CMU, Wired, Spark Capital, Nvidia, First Round Capital, Flickr, Refinery29, Lytro, Timer Warner, Samsung, Magic Leap, Ooyala, Hearst, Google and many more.

Enter and present your brilliance at our 2018 LDV Vision Summit Startup Competition or the Entrepreneurial Computer Vision Challenge (ECVC). Application deadline is March 25, 2018.   

Sean Bell, CEO &amp; Co-Founder, GrokStyle from Cornell Tech. ©Robert Wright/LDV Vision Summit

Sean Bell, CEO & Co-Founder, GrokStyle from Cornell Tech. ©Robert Wright/LDV Vision Summit

Past competitors in the ECVC, like 2016 winner, GrokStyle, have reaped the rewards of competing. “The most valuable part of the Vision Summit was connecting with three different companies potentially interested in building on our technology, and with four different potential investors/advisors,” said CEO & Co-founder Sean Bell after the Vision Summit.

Divyaa Ravichandran, from CMU and she showcased her project “Love &amp; Vision” ©Robert Wright/LDV Vision Summit

Divyaa Ravichandran, from CMU and she showcased her project “Love & Vision” ©Robert Wright/LDV Vision Summit

For other 2016 ECVC competitors like Divyaa Ravichandran, who was a recent graduate of Carnegie Mellon University at the time, “attendance at LDV Vision Summit last year gave me visibility and I came in contact with my current employer at Facebook!”

Rosanna Myers CEO &amp; Co-Founder, Carbon Robotics Startup Competition Winner ©Robert Wright/LDV Vision Summit

Rosanna Myers CEO & Co-Founder, Carbon Robotics Startup Competition Winner ©Robert Wright/LDV Vision Summit

2016 Startup Competition winner, Carbon Robotics, was looking to “connect with recruits and investors in NYC. The experience was great for that. Getting to pitch on the main stage was amazing, because it made it easy for people to learn about what we’re working on. After the pitch, we were approached by tons of high-quality engineers and potential partners, so it was a great success,” said Rosanna Myers CEO & Co-founder.

“Following the summit, [London based startup] The Smalls raised investment with an angel investor who was in the audience. The funding was used to make key hires and improve technology. The Smalls has continued to grow at 300% per year and now has offices in both London and Singapore,” reports CEO & Founder of The Smalls, Kate Tancred after finalizing in the 2015 Startup Competition.

The 2018 LDV Vision Summit on May 23 & 24 in NYC will include over 80 international speakers with the purpose of exploring, understanding, and shaping the future of imaging and video in human communication. The best startups and computer vision experts who compete in the Startup Competition and the ECVC will be showcased alongside these industry leaders.

The Startup Competition is for promising visual technology companies with less than $2M in funding.

The ECVC is for any Computer Vision, Machine Learning and/or Artificial Intelligence students, professors, experts or enthusiasts working on a unique solution that leverages visual data to empower businesses and humanity. It provides contestants the opportunity to showcase the technology piece of a potential startup company without requiring a full business plan. It is a unique opportunity for students, engineers, researchers, professors and/or hackers to test the waters of entrepreneurism.

Competitions are open to anyone working in our visual technology sector such as: photography, videography, medical imaging, analytics, robotics, biometrics, LIDAR, radar, satellite imaging, computer vision, machine learning, artificial intelligence, augmented reality, virtual reality, autonomous vehicles, media and entertainment, gesture recognition, search, advertising, cameras, e-commerce, visual sensors, sentiment analysis and much more.

Judges for the competitions include top industry venture capitalists, entrepreneurs, journalists, media executives and companies that are recruiting. Past judges included Josh Elman of Greylock, Tamara Berg of U. North Carolina, Chapel Hill, Larry Zitnick of Facebook, Andy Weisman of Union Square Ventures, Ramesh Raskar, of MIT Media Lab, Alex Iskold of Techstars, Gaile Gordon from Enlighted, Jessi Hempel of Wired and many more. The list of phenomenal 2017 judges continues to evolve on the 2018 Competition’s website.

All competition sub-finalists will receive remote and in-person coaching by Evan Nisselson and in person mentoring during the sub-finalist judging session by Jan Erik Solem, Rebecca Paoletti, Andy Parsons, Evan Nisselson, Serge Belongie and other experts.

It would be a horrible feeling to be sitting behind your computer or in the audience when someone else presents an idea that you had years ago. Take a risk, prove yourself, compete. 

We are waiting to to see YOUR brilliance!

Enter and present your brilliance at our 2018 LDV Vision Summit Startup Competition or the Entrepreneurial Computer Vision Challenge (ECVC). Application deadline is March 25, 2018.   

Hired at Facebook After Showcasing Research in Visual Technology at the LDV Vision Summit: An interview with Divyaa Ravichandran

Divyaa Ravichandran, from CMU and she showcased her project “Love &amp; Vision” ©Robert Wright/LDV Vision Summit  

Divyaa Ravichandran, from CMU and she showcased her project “Love & Vision” ©Robert Wright/LDV Vision Summit
 

The LDV Vision Summit is coming up on May 23 & 24, 2018 in New York. Through March 25 we are collecting applications to the Entrepreneurial Computer Vision Challenge and the Startup Competition.

Divyaa Ravichandran was a finalist in the 2016 Entrepreneurial Computer Vision Challenge (ECVC) at the LDV Vision Summit. Her project, “Love & Vision” used siamese neural networks to predict kinship between pairs of facial images. It was a major success with the judges and the audience. We asked Divyaa some questions on what she has been up to over the past year since her phenomenal performance: 

How have you advanced since the last LDV Vision Summit?
After the Vision Summit I began working as an intern at a startup in the Bay Area, PerceptiMed, where I worked on computer vision methods to identify pills. I specifically worked with implementing feature descriptors and testing their robustness in detection tasks. Since October 2016, I’ve been working at Facebook as a software engineer. 

What are the 2-3 key steps you have taken to achieve that advancement?
a. Stay on the lookout for interesting opportunities, like the LDV Vision Summit
b. ALWAYS stay up-to-date in the tech industry so you know what counts and who's who

What project(s)/work is your focus right now at or outside of Facebook?
Without any specifics, I'm working with neural networks surrounded by some of the brightest minds I have come across as yet, and along with the use of Facebook's resources, the opportunities to improve are boundless.
 

Divyaa Ravichandran &nbsp;©Robert Wright/LDV Vision Summit

Divyaa Ravichandran  ©Robert Wright/LDV Vision Summit

What is your proudest accomplishment over the last year?
Snagging this gig with Facebook was kind of the highlight of my year; working on projects that have the potential to impact and improve so many lives has me pretty psyched!

What was a key challenge you had to overcome to accomplish that? How did you overcome it?
I think visibility was one big point: I wasn't highly visible as a candidate for the Facebook team since I had only just graduated from school and didn't have any compelling publications or such to my name. Fortunately, my attendance at the LDV Vision Summit last year gave me that visibility, and the Facebook team got in touch with me because of that.

Did our LDV Vision Summit help you? If yes, how?
Yeah, it was through LDV that I  came in contact with my current employer at Facebook! I also met some really interesting people from some far-off places, like Norway, for instance. It put into perspective how the field is growing the world-over.
 

Divyaa Ravichandran &nbsp;©Robert Wright/LDV Vision Summit

Divyaa Ravichandran  ©Robert Wright/LDV Vision Summit

What was the most valuable aspect of competing in the ECVC for you?
The fact that the summit puts the guys with the money (the VCs) in touch with the guys with the tech (all the people making Computer Vision pitches) really bridges the gap between two shores that I think would do very well juxtaposed with each other. Personally, it opened my eyes to new ideas that people in the field were looking at and what problems they were trying to tackle, something that I wouldn't have been able to think up myself.

What recommendation(s) would you make to teams submitting their projects to the ECVC?
Stay current, but if you're bringing something entirely new to the table, that would be best! Everybody at ECVC is looking to be blown away (I think) so throwing something totally new and unexpected their way is the best way to get their attention.

What is your favorite Computer Vision blog/website to stay up-to-date on developments in the sector?
I generally read Tombone's CV blog, by Tomasz Malisiewicz*, and follow CV conferences like ECCV, ICML, CVPR to look up the bleeding edge in the industry and this usually gives a fair idea of the biggest problems people are looking to tackle in the current age.

*Editor’s Note: Tomasz Malisiewicz was a speaker at the 2016 Vision Summit

Applications to the 2018 ECVC and the Startup Competition at the LDV Vision Summit are due by March 25, apply now.

AutoX is Democratizing Autonomous Driving with a Camera-First Solution

©Robert Wright/LDV Vision Summit

©Robert Wright/LDV Vision Summit

Leaving his role as founding director of Princeton's Computer Vision and Robotics Lab, Jianxiong Xiao (Professor X) founded AutoX. He spoke at our LDV Vision Summit 2017 about how he is working to lower the price of entry into the autonomous driving field with an innovative camera-first solution.

Early Bird tickets are now available until March 25 for the LDV Vision Summit 2018 to hear from other amazing visual tech researchers, entrepreneurs and investors.

Today I'm going to talk about AutoX. We're a company working on the self-driving car. Why self-driving cars? If you look at the tech revolution in the past few decades, we have personal computers, we have internet, we have SmartPhones. This tech revolution already changed everyone's life. It's not just a fancy tool for scientists, but we actually changed everyone's life.

If you think about the future, many things are going to happen. But if you think about what is the major difference 30 years ahead from now, one of the biggest things is probably all the cars will be ready to drive by themselves. That's what made me very excited about self-driving cars. Transportation is a huge player in human society as well. So I would see this as one of the biggest applications ever for my expertise in computer vision and robotics.

AutoX is a company focused on self-driving technology with the mission to democratize autonomy.

What does that mean? Here we draw an analogy with the computer technology. If you think about computers in the past few decades, a few decades ago, yes, we do have computers, but each computer is so big, and what's even more is they are so expensive. With a million dollar computer in a huge server room, only a very small number of people in the world, including top scientists, top researchers, can have access to computation. At that time I would say that technology is amazing, but the impact of this technology to society is very, very limited.

Now think about life today. Everyone nowadays has a $500 SmartPhone. In this stage, I would say that this is what truly made me excited about technology, is it creating universal impact for everyone.

If you think about self-driving car technology today, it's pretty similar. Each self-driving car costs $1,000,000, or even more. It's much more expensive than to hire a few people just to drive for you. So self-driving car technology, at this stage, I would say, does not make much sense to the general public.


We believe self-driving cards should not be a luxury, it should be universally accessible to everyone.


At AutoX, our mission is to democratize autonomy. It's to make self-driving cars affordable and at the same time technically robust for the people, for every citizen to use. We believe self-driving cars should not be a luxury, it should be universally accessible to everyone.

If you think about self-driving cars, why are they so expensive? Here is a picture of the Baidu self-driving car. Each car costs about $0.8 million USD. Most of the costs come from the sensor that people use, high-end Differential GPS, use high-end IMU, as well as this monster, the LIDAR. The LIDAR on the top is the Velodyne 64 big LIDAR, costs $80,000 USD these days.

Putting aside the cost of the LIDAR, if you look at the LIDAR data, I would say the autonomous driving industry has a blind faith in LIDAR. For example, the LIDAR has very, very low resolution.

Here is a simple question for you, is this LIDAR protocol representing a pedestrian or not? Look here. Everyone here has perfect intelligence. You may see that, okay, maybe this is a pedestrian. But how about this? Is this a pedestrian, or is it a Christmas tree? In fact, both of them are actually pedestrians, coming from this.

A pedestrian viewed from low resolution is still probably able to recognize, but if you want to drive your car safely, you need to recognize some more subtle detail. Like, for example, the curve of the road. If you cannot recognize the curve of the road, the car is going to drive to the sidewalk, which endangers the human pedestrian. So I would say that high resolution really matters. High resolution enables detailed analysis of complex scenes, which is required for level 5 autonomous driving

©Robert Wright/LDV Vision Summit

©Robert Wright/LDV Vision Summit

The other draw-back for LIDAR is that it only depicts the 3D shell of the object. But most complex situations in the world actually depict by the appearance, rather than the 3D shape, such as road marking, traffic signs, curves, traffic lights, and so on. At AutoX, we focus on a Camera-First Solution. We're not against any sensors, but we are focused on using the video camera as our primary sensor, to ensure most of the information necessary for very safe autonomous driving.

We're a company building Full-Stack Software for autonomous driving, which includes understanding perception, the study of auto-dynamic objects, as well as the ability to make decisions and train the car how to drive. The last step of our Full-Stack Software is to control the vehicle, to execute this plan. To train the vehicle to detail the full plan and carry it out.

We're a very young company, we were founded in September 2016. In the past eight months, we're making a tremendous progress. Our company is based in San Jose in California, big enough for doing a lot of testing of autonomous driving.


We are accepting applications to our Vision Summit Entrepreneurial Computer Vision Challenge for computer vision research projects and our Startup Competition for visual technology companies with <$2M in funding. Apply now &/or spread the word.


Here's a demonstration where we're using a purely camera-cased system, with no LIDAR, no radar, no ultrasonic, no Differential GPS, to drive the vehicle. Here, we show some autonomous driving things. On the top left, we're showing our car driving in a dense urban scenario, with a lot of traffic, making turns and so-on. On the bottom left, we show our car driving in a curvy road, making a lot of sharp turns, to demonstrate that it's very important that our car perception system can be able to recognize the road in better detail and in real time.

On the right, we're showing some video we've taken at nighttime. Using the camera, it is still possible to drive at nighttime. To demonstrate the power of this video-based approach. And may I mention, this demo, we're using only cameras with GPS as the only sensor. We're not using any other sensors, but in the production of the cars, we are welcome to other sensors for integration as well. The reason for this demo, is to demonstrate the power of camera. Because personally I believe it is mostly ignored or under-praised by the autonomous driving industry.

In the past eight months, we have built a thing, a very, very small thing, but very good thing, to carry out this mission. And we're very excited to continue following this path to make the self-driving technology become a reality.

©Robert Wright/LDV Vision Summit

©Robert Wright/LDV Vision Summit

Here is another video demonstrating our camera-based system, driving under a different scenario. Here, as you know, in California it is actually very difficult to find bad weather. So in the past two months, we're finally finding data when it actually rains, and we're so excited that we brought out the car to take a video like this. You can see that our camera-based system actually drives quite well under the heavy rains, and you can see that here our car is actually driving in a residential neighborhood. There's no real marking on the road, that's also made it particularly challenging. Auto-rendering the road is very challenging to authorize.

Here is another video during a raining day, there we see that our car is going through the bridge. That makes the lighting very bad, then very bright again, but we still demonstrate that this camera-based system is possible to work. Some of you guys can probably recognize where we're driving, doing this test demo. The logo it says here is the city of Cupertino.

As I mentioned, we're a very, very young company. At this very early stage we're still demonstrating the potential for this camera-based system.

Watch the video:

Josh Kopelman Looks to Find the "Investable Window" on New Technologies

At the LDV Vision Summit 2017, Evan Nisselson had the privilege to sit down with Josh Kopelman, the self-described "accidental VC" and Partner at First Round Capital, to discuss investment trends and what Josh looks for in a founder at the seed-stage level

According to Josh, First Round Capital either invests early or way too early. Each technology has a window in which it is investable and you want to avoid funding a company too early. At First Round, they are investing in things that are trying to solve common problems. Watch Josh and Evan's fireside chat to learn more:

 

Our fifth annual LDV Vision Summit will be May 23 & 24, 2018 in NYC. Early bird tickets are currently on sale. Sign up to our LDV Vision Summit newsletter for updates and deals on tickets.

Hearst, AlphaPrime, ENIAC and Samsung Next Talk Opportunities in Visual Tech Investing

At the LDV Vision Summit 2017, Erin Griffith of Fortune spoke with Vic Singh of ENIAC Ventures, Claudia Iannazzo from AlphaPrime Ventures, Scott English of Hearst Ventures and Emily Becher from Samsung Next Start about trends and opportunities in visual technology investing.

Watch their panel discussion to learn more:

Our fifth annual LDV Vision Summit will be May 23 & 24, 2018 in NYC. Early bird tickets are currently on sale. Sign up to our LDV Vision Summit newsletter for updates and deals on tickets.

LDV Capital Raises $10M Second Seed Fund for Visual Technologies

Evan Nisselson, General Parter &amp; Founder of LDV Capital © Ron Haviv

Evan Nisselson, General Parter & Founder of LDV Capital © Ron Haviv

We are very excited to announce the close of our second fund for investing in people building visual technology businesses at the pre-seed or seed stage. You can read more about it on the Wall Street Journal. Our press release is below. Also check out our Jobs page to learn more about the exciting new roles available with us at LDV Capital.

Press Release -- LDV Capital, the venture fund investing in people building visual technology businesses, today announced a new $10M seed fund. It is the second fund for the thesis-driven firm that specifically invests in deep technical teams that leverage computer vision, machine learning and artificial intelligence to analyze visual data.

Investors in this second fund include top technical experts in the field including Mike Krieger, Instagram Co-founder/CTO and Steve Chen, YouTube Co-founder/CTO. Other investors came from family offices, fund-of-funds, an endowment, a sovereign wealth fund, and more.

“Because of their domain expertise and leadership in visual technology, LDV Capital is at the forefront of innovations in the space. They invest in and empower technical founders with the greatest potential for harnessing the power of computer vision to disrupt industries. The opportunities are tremendous.” Mike Krieger, Instagram, Co-Founder & Director of Engineering.

"Capturing and analyzing visual data with the aid of computers create a paradigm shift in the approach to content. I believe LDV Capital helps founders grow companies at the helm of this evolution." Steve Chen, Youtube, Co-Founder & CTO.

LDV Capital investments at the pre-seed stage include Clarifai - an artificial intelligence company that leverages visual recognition to solve real-world problems for businesses and developers, Mapillary - delivering street-level imagery for the future of maps and data solutions, and Upskill - delivering augmented reality solutions for the industrial workforce. They have assisted their portfolio companies in raising follow-on capital from Sequoia, Union Square Ventures, NEA, Atomico and others.

“Visual technologies are revolutionizing businesses and society,” says LDV Capital General Partner, Evan Nisselson, a renowned thought leader in the visual tech space. “By 2022, our research has found there will be 45 billion cameras in the world capturing visual data that will be analyzed by artificial intelligence. Our goal is to collaborate with technical entrepreneurs who are looking to solve problems, build businesses and improve our world with that visual data.”

LDV’s horizontal thesis spans all enterprise and consumer verticals such as: autonomous vehicles, medical imaging, robotics, security, manufacturing, logistics, smart homes, satellite imaging, augmented/virtual/mixed reality, mapping, video, imaging, biometrics, 3D, 4D and much more.  

Every May, LDV Capital hosts the two-day LDV Vision Summit in NYC known to top technologists, investors and entrepreneurs as the premier global gathering in visual tech. The fifth annual LDV Vision Summit will be May 23 and 24, 2018. Since 2011, LDV Capital also holds invite-only, gender-balanced monthly LDV Community dinners that bring together leading NYC entrepreneurs and investors to help each other succeed. Both are part of their LDV Platform initiatives.

LDV Capital is one of the growing number of single GP funds, founded by Nisselson in 2012 after building four visual technology startups over 18 years in Silicon Valley, NYC and Europe.  The firm boasts an exceptionally strong expert network with their experts-in-residence including computer vision leaders such as Serge Belongie, a professor of Computer Science at Cornell University who also co-founded several companies and Andrew Rabinovich, Director of Deep Learning at Magic Leap, and Luc Vincent, VP of Engineering at Lyft and Gaile Gordon, Vice President Location Products at Enlighted.

Find out more about our open opportunities on our Jobs page.