We conduct research in computer vision, applied machine (deep) learning, natural language processing and multimedia. We aim at developing artificially intelligent systems to help computers perform visual perception and recognition tasks.
The PREVUE project investigates modern A.I. and Computer Vision approaches for video analysis and event prediction in urban scenarios. Specifically, we claim that video-surveillance and autonomous driving are nowadays technologically mature enough to be rethought jointly in a unique framework. Simultaneously using both mobile (e.g. on vehicles) and fixed sensors, we analyze the urban environment (context), the behavior of humans and moving agents (vehicles, bikes, social robots), as well as their mutual interactions in order to improve the safety and efficiency of urban life.
To truly understand the content of a document containing both text and pictures, an artificial agent should be able to jointly recognize the entities shown in the pictures and mentioned in the text, and to link them to their corresponding background knowledge. We refer to this task as Visual-Textual-Knowledge Entity Linking (VTKEL). To this end, we also introduce a novel dataset derived from Flickr30k-Entities.