Two groups of independently working scientists have recently made advances in image recognition coupled to the utilization of artificial intelligence software. The software was described yesterday by researchers at Stanford University and at Google Inc.
Computer vision has previously been limited to only recognizing individual objects, but that is all about to change. The created software has the capability of describing the content of actual videos and photographs, telling what a certain scene displays contextually rather than discovering single items in the media.
This is undoubtedly impressive, and the achievement might be more impactful than one expects at first. Progressions like these are what will make us catalogue the world’s information in a much more comprehensive way. If machines could write captions and descriptions to the billions of videos and images on the web, the material would be searchable. Search engines rely heavily on written language at this present.
The Google and Stanford groups have been refining software programs called neural networks in this achievement. Resembling some kind of brain, these neural networks can train themselves to discover patterns in data – may the pattern be visible to humans or not.
So what other things might we expect from this piece of news, except from speculating on improved search functions? Well, for instance the technology could help blind people with more accurate navigation and also provide robots with navigational ability in unknown environments. A more dystopian scenario at hand regards the application in the surveillance industry. This scenario stretches far longer than just face recognition; it would for instance also entail behavioural identification.
The groups worked with a relatively small amount of training data, which makes the software unable to be near real-human descriptions. But the scientists are very enthusiastic. Considering the small amount of training data they claim that the field is just starting.
Sceptical scientists on the other hand claim that the progress is marginal in the sense of replicating human vision and understanding. Still, the Google and Stanford teams say that they expect to see significant progress in a near future, when the software is trained with huge sets of annotated images. These statements are probably not to be taken lightly, considering who are behind them.
Read more in this article from NY Times