Papers on Topic: Vision

  1. Amir H Assadi, Perceptual Geometry of Space and Form: Visual Perception of Natural Scenes And Their Virtual Representation, Spie - Conference Vision Geometry X Am, 2001 pp. 1-14.
    Abstract. Perceptual geometry is an emerging field of interdisciplinary research whose objectives focus on study of geometry from the perspective of visual perception, and in turn, apply such geometric findings to the ecological study of vision. Perceptual geometry attempts to answer fundamental questions in perception of form and representation of space through synthesis of cognitive and biological theories of visual perception with geometric theories of the physical world. Perception of form and space are among fundamental problems in vision science. In recent cognitive and computational models of human perception, natural scenes are used systematically as preferred visual stimuli. Among key problems in perception of form and space, we have examined perception of geometry of natural surfaces and curves, e.g. as in the observer’s environment. Besides a systematic mathematical foundation for a remarkably general framework, the advantages of the Gestalt theory of natural surfaces include a concrete computational approach to simulate or “recreate images” whose geometric invariants and quantities might be perceived and estimated by an observer. The latter is at the very foundation of understanding the nature of perception of space and form, and the (computer graphics) problem of rendering scenes to visually invoke virtual presence. (pdf)

  2. Caglar Aytekin, Quantum Mechanics in Computer Vision: Automatic Object Extraction , Icip , 2020 pp. 1-5.
    An automatic object extraction method is proposed exploiting the rich mathematical structure of quantum mechanics. First, a novel segmentation method based on the solutions of Schrödinger’s equation is proposed. This powerful segmentation method allows us to model complex objects and inherent structures of edge, shape, and texture information along with the grey-level intensity uniformity, all in a single equation. Due to the large amount of segments extracted with the proposed method, the selection of the object segment is performed by maximizing a regularization energy function based on a recently proposed sub-segment analysis indicating the object boundaries. The results of the proposed automatic object extraction method exhibit such a promising accuracy that pushes the frontier in this field to the borders of the input-driven processing only – without the use of “object knowledge” aided by long-term human memory and intelligence. (pdf)

  3. Elizabeth S Spelke, Principles of Object Perception, Cognitive Science, 14 (1990) 29-56.
    Researchon human infants has begunto shed light on early-develppingprocesses for segmenting perceptualarrays into objects. Infantsappear to perceiveobjects by analyzing three-dimensional surface arrangements and motions. Their per- ceptiondoesnotaccordwith ageneraltendencytomaximizefiguralgoodnessor toattend-tononaccidentalgeometricrelationsinvisualarrays.Objectperception does accord with principles governing the motions of material bodies: Infants divide perceptual arrays into units that move as connected wholes, that move separately from one another, that tend to maintain their size and shape over motion, and that tend to act upon each other only on contact. These findings sug- gest that o general representation of object unity and boundaries i s interposed between representationsof surfaces and representationsof obiects of familiar kinds. The processes that construct this representation may be related to pro- cesses of physical reasoning. (pdf)

  4. I Bokkon and Ram Lakhan Pandey Vimal, Implications on visual apperception: Energy, duration, structure and synchronization, Biosystems, 2010 pp. 1-9.
    Although primary visual cortex (V1 or striate) activity per se is not sufficient for visual apperception (normal conscious visual experiences and conscious functions such as detection, discrimination, and recognition), the same is also true for extrastriate visual areas (such as V2, V3, V4/V8/VO, V5/M5/MST, IT, and GF). In the lack of V1 area, visual signals can still reach several extrastriate parts but appear incapable of generating normal conscious visual experiences. It is scarcely emphasized in the scientific literature that conscious perceptions and representations must have also essential energetic conditions. These energetic conditions are achieved by spatiotemporal networks of dynamic mitochondrial distributions inside neurons. However, the highest density of neurons in neocortex (number of neurons per degree of visual angle) devoted to representing the visual field is found in retinotopic V1. It means that the highest mitochondrial (energetic) activity can be achieved in mitochondrial cytochrome oxidase-rich V1 areas. Thus, V1 bear the highest energy allocation for visual representation. In addition, the conscious perceptions also demand structural conditions, presence of adequate dura- tion of information representation, and synchronized neural processes and/or ‘interactive hierarchical structuralism.’ For visual apperception, various visual areas are involved depending on context such as stimulus characteristics such as color, form/shape, motion, and other features. Here, we focus primar- ily on V1 where specific mitochondrial-rich retinotopic structures are found; we will concisely discuss V2 where smaller riches of these structures are found. We also point out that residual brain states are not fully reflected in active neural patterns after visual perception. Namely, after visual perception, sub- liminal residual states are not being reflected in passive neural recording techniques, but require active stimulation to be revealed. (web, pdf)

  5. S L Bressler et al., Top-Down Control of Human Visual Cortex by Frontal and Parietal Cortex in Anticipatory Visual Spatial Attention, Journal Of Neuroscience, 28 (2008) 10056-10061.
    Advance information about an impending stimulus facilitates its subsequent identification and ensuing behavioral responses. This facilitation is thought to be mediated by top-down control signals from frontal and parietal cortex that modulate sensory cortical activity. Here we show, using Granger causality measures on blood oxygen level-dependent time series, that frontal eye field (FEF) and intraparietal sulcus (IPS) activity predicts visual occipital activity before an expected visual stimulus. Top-down levels of Granger causality from FEF and IPS to visual occipital cortex were significantly greater than both bottom-up and mean cortex-wide levels in all individual subjects and the group. In the group and most individual subjects, Granger causality was significantly greater from FEF to IPS than from IPS to FEF, and significantly greater from both FEF and IPS to intermediate-tier than lower-tier ventral visual areas. Moreover, top-down Granger causality from right IPS to intermediate-tier areas was predictive of correct behavioral performance. These results suggest that FEF and IPS modulate visual occipital cortex, and FEF modulates IPS, in relation to visual attention. The current approach may prove advantageous for the investigation of interregional directed influences in other human brain functions. (web, pdf)

  6. C Cadieu and BA Olshausen Advances in neural information, Learning transformational invariants from natural movies, Papers.Nips.Cc, .
    We describe a hierarchical, probabilistic model that learns to extract complex mo- tion from movies of the natural environment. The model consists of two hidden layers: the first layer produces a sparse representation of the image that is ex- pressed in terms of local amplitude and phase variables. The second layer learns the higher-order structure among the time-varying phase variables. After train- ing on natural movies, the top layer units discover the structure of phase-shifts within the first layer. We show that the top layer units encode transformational invariants: they are selective for the speed and direction of a moving pattern, but are invariant to its spatial structure (orientation/spatial-frequency). The diver- sity of units in both the intermediate and top layers of the model provides a set of testable predictions for representations that might be found in V1 and MT. In addition, the model demonstrates how feedback from higher levels can influence representations at lower levels as a by-product of inference in a graphical model. (web, pdf)

  7. D Kirsh, Projection, problem space and anchoring, Escholarship.Org, .
    When people make sense of situations, illustrations, instructions and problems they do more than just think with their heads. They gesture, talk, point, annotate, make notes and so on. What extra do they get from interacting with their environment in this way? To study this fundamental problem, I looked at how people project structure onto geometric drawings, visual proofs, and games like tic tac toe. Two experiments were run to learn more about projection. Projection is a special capacity, similar to perception, but less tied to what is in … (web, pdf)

  8. Yujia Huang et al., Brain-inspired Robust Vision using Convolutional Neural Networks with Feedback, , 2019.
    Primates have a remarkable ability to correctly classify images even in the presence of significant noise and degradation. In contrast, even the state-of-art CNNs are extremely vulnerable to imperceptible level of noise. Many neuroscience studies have suggested that robustness in human vision arises from the interaction between the feedforward signals from bottom-up pathways of the visual cortex and the feedback signals from the top-down pathways. Motivated by this, we propose a new neuro-inspired model, namely Convolutional Neural Networks with Feedback (CNN-F). CNN-F augments CNN with a feedback generative network that shares the same set of weights along with an additional set of latent variables. CNN-F combines bottom-up and top-down inference through approximate loopy belief propagation to obtain the MAP-estimates of the latent variables. We show that CNN-F’s iterative inference allows for disentanglement of latent variables across layers. We validate the advantages of CNN-F over the baseline CNN in multiple ways. Our experimental results suggest that the CNN-F is more robust to image degradation such as pixel noise, occlusion, and blur than the corresponding CNN. Furthermore, we show that the CNN-F is capable of restoring original images from the degraded ones with high reconstruction accuracy while introducing negligible artifacts. (web, pdf)

  9. Erin Koch et al., Picture perception reveals mental geometry of 3D scene inferences, Proceedings Of The National Academy Of Sciences Of The United States Of America, 115 (2018) 7807-7812.
    Pose estimation of objects in real scenes is critically important for biological and machine visual systems, but little is known of how humans infer 3D poses from 2D retinal images. We show unexpect- edly remarkable agreement in the 3D poses different observers estimate from pictures. We further show that all observers apply the same inferential rule from all viewpoints, utilizing the geometrically derived back-transform from retinal images to actual 3D scenes. Pose estimations are altered by a fronto-parallel bias, and by image distortions that appear to tilt the ground plane. We used pictures of single sticks or pairs of joined sticks taken from different camera angles. Observers viewed these from five directions, and matched the perceived pose of each stick by rotating an arrow on a horizontal touchscreen. The projection of each 3D stick to the 2D picture, and then onto the retina, is described by an invertible trigonometric expression. The inverted expression yields the back-projection for each object pose, camera elevation, and observer viewpoint. We show that a model that uses the back-projection, modulated by just two free parameters, explains 560 pose estimates per observer. By considering changes in retinal image orientations due to position and elevation of limbs, the model also explains perceived limb poses in a complex scene of two bodies lying on the ground. The inferential rules simply explain both perceptual invariance and dramatic distortions in poses of real and pictured objects, and show the benefits of incorporating pro- jective geometry of light into mental inferences about 3D scenes. (web, pdf)

  10. Clay Mash et al., Mechanisms of Visual Object Recognition in Infancy: Five‐Month‐Olds Generalize Beyond the Interpolation of Familiar Views, Infancy, 12 (2007) 31-43.
    This work examined predictions of the interpolation of familiar views (IFV) account of object recognition performance in 5-month-olds. Infants were familiarized to an object either from a single viewpoint or from multiple viewpoints varying in rotation around a single axis. Object recognition was then tested in both conditions with the same object rotated around a novel axis. Infants in the multiple-views condition rec- ognized the object, whereas infants in the single-view condition provided no evi- dence for recognition. Under the same 2 familiarization conditions, infants in a 2nd experiment treated as novel an object that differed in only 1 component from the fa- miliar object. Infants’ object recognition is enhanced by experience with multiple views, even when that experience is around an orthogonal axis of rotation, and in- fants are sensitive to even subtle shape differences between components of similar objects. In general, infants’ performance does not accord with the predictions of the IFV model of object recognition. These findings motivate the extension of future re- search and theory beyond the limits of strictly interpolative mechanisms. (web, pdf)

  11. Satoshi Nishida et al., Object-based selection modulates top-down attentional shifts., Frontiers In Human Neuroscience, 2014 PMC3930853, 8 p. 90.
    A large body of evidence supports that visual attention - the cognitive process of selectively concentrating on a salient or task-relevant subset of visual information - often works on object-based representation. Recent studies have postulated two possible accounts for the object-specific attentional advantage: attentional spreading and attentional prioritization, each of which modulates a bottom-up signal for sensory processing and a top-down signal for attentional allocation, respectively. It is still unclear which account can explain the object-specific attentional advantage. To address this issue, we examined the influence of object-specific advantage on two types of visual search: parallel search, invoked when a bottom-up signal is fully available at a target location, and serial search, invoked when a bottom-up signal is not enough to guide target selection and a top-down control for shifting of focused attention is required. Our results revealed that the object-specific advantage is given to the serial search but not to the parallel search, suggesting that object-based attention facilitates stimulus processing by affecting the priority of attentional shifts rather than by enhancing sensory signals. Thus, our findings support the notion that the object-specific attentional advantage can be explained by attentional prioritization but not attentional spreading. (web, pdf)

  12. Mayu Nishimura et al., Development of object recognition in humans., F1000 Biology Reports, 2009 PMC2948260, 1 p. 56.
    Although the ability to perceive simple shapes emerges in infancy, the ability to recognize individual objects as well as adults do continues to develop through childhood into adolescence. Despite this slow development, recent neuroimaging studies have revealed that an area of the ventral visual cortex that responds selectively to the category of common objects is adult-like by 5-8 years of age. The challenge for future research will be to identify the specific visual skills involved in object recognition that continue to develop through childhood and adolescence, and the neural mechanisms underlying this protracted development. (web, pdf)

  13. Sarah M Parker and Thomas Serre, Unsupervised invariance learning of transformation sequences in a model of object recognition yields selectivity for non-accidental properties, Frontiers In Computational Neuroscience, 9 (2015) 213-8.
    Non-accidental properties (NAPs) correspond to image properties that are invariant to changes in viewpoint (e.g., straight vs. curved contours) and are distinguished from metric properties (MPs) that can change continuously with in-depth object rotation (e.g., aspect ratio, degree of curvature, etc.).Non-accidental properties (NAPs) correspond to image properties that are invariant to changes in viewpoint (e.g., straight vs. curved contours) and are distinguished from metric properties (MPs) that can change continuously with in-depth object rotation (e.g., aspect ratio, degree of curvature, etc.). (web, pdf)

  14. Jessie J Peissig and Michael J Tarr, Visual Object Recognition: Do We Know More Now Than We Did 20 Years Ago?, Annual Review Of Psychology, 58 (2007) 75-96.
    We review the progress made in the field of object recognition over the past two decades. Structural-description models, making their appearance in the early 1980s, inspired a wealth of empirical re- search. Moving to the 1990s, psychophysical evidence for view-based accounts of recognition challenged some of the fundamental assump- tions of structural-description theories. The 1990s also saw increased interest in the neurophysiological study of high-level visual cortex, the results of which provide some constraints on how objects may be represented. By 2000, neuroimaging arose as a viable means for con- necting neurons to behavior. One of the most striking fMRI results has been category selectivity, which provided further constraints for models of object recognition. Despite this progress, the field is still faced with the challenge of developing a comprehensive theory that integrates this ever-increasing body of results and explains how we perceive and recognize objects. (web, pdf)

  15. M Riesenhuber and T Poggio, Hierarchical models of object recognition in cortex., Nature Neuroscience, 2 (1999) 1019-1025.
    Visual processing in cortex is classically modeled as a hierarchy of increasingly sophisticated representations, naturally extending the model of simple to complex cells of Hubel and Wiesel. Surprisingly, little quantitative modeling has been done to explore the biological feasibility of this class of models to explain aspects of higher-level visual processing such as object recognition. We describe a new hierarchical model consistent with physiological data from inferotemporal cortex that accounts for this complex visual task and makes testable predictions. The model is based on a MAX-like operation applied to inputs to certain cortical neurons that may have a general role in cortical function. (web, pdf)

  16. Edmund T Rolls, Finding and recognizing objects in natural scenes: complementary computations in the dorsal and ventral visual systems, Frontiers In Computational Neuroscience, 2014 pp. 1-19.
    Searching for and recognizing objects in complex natural scenes is implemented by multiple saccades until the eyes reach within the reduced receptive field sizes of inferior temporal cortex (IT) neurons. We analyze and model how the dorsal and ventral visual streams both contribute to this. Saliency detection in the dorsal visual system including area LIP is modeled by graph-based visual saliency, and allows the eyes to fixate potential objects within several degrees. Visual information at the fixated location subtending approximately 9◦ corresponding to the receptive fields of IT neurons is then passed through a four layer hierarchical model of the ventral cortical visual system, VisNet. We show that VisNet can be trained using a synaptic modification rule with a short-term memory trace of recent neuronal activity to capture both the required view and translation invariances to allow in the model approximately 90% correct object recognition for 4 objects shown in any view across a range of 135◦ anywhere in a scene. The model was able to generalize correctly within the four trained views and the 25 trained translations. This approach analyses the principles by which complementary computations in the dorsal and ventral visual cortical streams enable objects to be located and recognized in complex natural scenes. (pdf)

  17. David Balderas Silva et al., Are the long--short term memory and convolution neural networks really based on biological systems?, Ict Express, 4 (2018) 100-106.
    In general, it is not a simple task to predict sequences or classify images, and it is even more problematic when both are combined. Nevertheless, biological systems can easily predict sequences and are good at image recognition. For these reasons Long–Short Term Memory and Convolutional Neural Networks were created and were based on the memory and visual systems. These algorithms have shown great properties and shown certain resemblance, yet they are still not the same as their biological counterpart. This article reviews the biological bases and compares them. (web, pdf)

  18. Y Song et al., The Role of Top-Down Task Context in Learning to Perceive Objects, Journal Of Neuroscience, 30 (2010) 9869-9876.
    In high-level perceptual regions of the ventral visual pathway in humans, experience shapes the functional properties of the cortex: the fusiform face area responds most strongly to faces of familiar rather than unfamiliar races, and the visual word form area (VWFA) is tuned only to familiar orthographies. But are these regions affected only by the bottom-up stimulus information they receive during learning, or does the effect of perceptual experience depend on the way that stimulus information is used during learning? Here, we test the hypothesis that top-down influences (i.e., task context) modulate the effect of perceptual experience on functional selectivities of the high-level visual cortex. Specifically, we test whether experience with novel visual stimuli produces a greater effect on the VWFA when those stimuli are associated with meanings (via association learning) but produces a greater effect on shape-processing regions when trained in a discrimination task without associated meanings. Our result supports this hypothesis and further shows that learning is transferred to novel objects that share parts with the trained objects. Thus, the effects of experience on selectivities of the high-level visual cortex depend on the task context in which that experience occurs and the perceptual processing strategy by which objects are encoded during learning. (web, pdf)

  19. Simon Thorpe et al., Speed of processing in the human visual system, Nature, 381 (1996) 520-522.
    (web, pdf)

  20. Justin N Wood and Samantha M W Wood, The development of newborn object recognition in fast and slow visual worlds, Proceedings Of The Royal Society B: Biological Sciences, 283 (2016) 20160166-8.
    Object recognition is central to perception and cognition. Yet relatively little is known about the environmental factors that cause invariant object recog- nition to emerge in the newborn brain. Is this ability a hardwired property of vision? Or does the development of invariant object recognition require experience with a particular kind of visual environment? Here, we used a high-throughput controlled-rearing method to examine whether newborn chicks (Gallus gallus) require visual experience with slowly changing objects to develop invariant object recognition abilities. When newborn chicks were raised with a slowly rotating virtual object, the chicks built invariant object representations that generalized across novel viewpoints and rotation speeds. In contrast, when newborn chicks were raised with a virtual object that rotated more quickly, the chicks built viewpoint-specific object repre- sentations that failed to generalize to novel viewpoints and rotation speeds. Moreover, there was a direct relationship between the speed of the object and the amount of invariance in the chick’s object representation. Thus, visual experience with slowly changing objects plays a critical role in the devel- opment of invariant object recognition. These results indicate that invariant object recognition is not a hardwired property of vision, but is learned rapidly when newborns encounter a slowly changing visual world. (web, pdf)

Index