Robots operating in the real world require sophisticated Visual Intelligence (VI), to be able to make sense of the variety of situations they may encounter. Computationally, the basic capability here has to do with Object Recognition – as a minimum, a visually intelligent robot needs to be able to recognise the content of its observations. Object recognition is typically carried out by means of Deep Learning (DL) methods, which provide the de facto standard for several Artificial Intelligence (AI) tasks, including image and speech recognition. However, despite the major successes in these and other benchmark tasks, from a cognitive point of view DL architectures still fare poorly when compared against human abilities, in terms of both efficiency and epistemology. From an efficiency perspective, DL methods are notoriously very data hungry, while humans are able to learn and generalise even from a single example. Even more importantly, from an epistemological point of view, a key aspect of human learning is that it goes well beyond pattern recognition. Humans learn concepts (not just patterns) and therefore are able to recognise instances of these concepts even when key features are missing (e.g., a car from which all the wheels have been removed or a cartoon pink elephant dressed in a tutu), thus avoiding the brittleness that characterises not just DL methods but also other types of AI systems. For these reasons, there has been much interest in recent years in hybrid computational architectures, integrating DL methods with other AI reasoning components, to try and address these issues. Our research is situated in the context of this paradigm and in this talk I will present the work that my team and I are currently carrying out, which is developing a hybrid architecture augmenting a Deep Learning approach with a variety of reasoning components drawn from Cognitive Science, with the aim of producing a new class of visually intelligent robots. In particular, I will illustrate the methodological process we have followed for the design of the architecture, which includes both a top-down analysis of the cognitive science literature as well as bottom-up experimental work. I will also report on our initial implementation of the reasoning components of the architecture, which include a size reasoner and a common-sense spatial reasoner, and on the experiments that we have carried out, which indeed demonstrate the improvement in performance afforded by the hybrid architecture.