Facebook launched a pair of new open datasets today to help developers and data scientists train artificial intelligence systems to better understand videos.
The Scenes, Objects, and Actions dataset (SOA) will provide devs with a massive set of videos with multiple labels for what’s going on inside them. Each video has been tagged by trained humans who are able to attach multiple labels for where a video is taking place, what is in it, and what is going on in the scene. Those labels can then be used to train systems that can understand the
A Generic Motions dataset includes a set of GIFs that are focused on certain motion properties like jumping and sliding. As the name implies, the subjects in the video include more than humans, so it should be possible to use the data to train a machine to understand different motions like a panda falling or a kitten sliding.
Both of these data sets should be useful for building more intelligent video understanding systems using machine learning. One of the key limitations that SOA is supposed to help deal with are machine learning systems that don’t actually understand the underlying videos, but rather pick up some sort of tangential marker that’s good enough.
One example Manohar Paluri, Facebook’s computer vision research lead, cited on stage at the GitHub Universe conference was a hypothetical neural network that only looks for the presence of a kayak inside a video when it labels footage as containing “kayaking.” While that would work for many pieces of footage, such a system might also label a piece of footage set in a garage full of kayaks as about being about kayaking.
Facebook will be challenging developers and data scientists around the world to come up with the best models for understanding the contents of videos using the SOA dataset.
Robust open datasets have been a key part of driving the field of machine learning forward in the past. ImageNet, a set of labeled images, has become a key benchmark for computer vision systems, for example. Facebook’s newly-released footage could help propel the field of computer vision for video to new heights.