This study has revealed how the brain processes complex visual scenes, such as those in movies. It showed that neurons in the tadpole optic tectum recognize specific temporal patterns and can adjust their responses as the visual environment changes. Based on these findings, the researchers have created AI technology that is more efficient at recognizing movies, outperforming traditional methods in terms of speed and accuracy.
The scenes we see around us, such as a movie or a moving landscape, change rapidly in time and space. These changes are called spatiotemporal dynamics, and they are crucial to our understanding of the visual world.
While we know a lot about how the brain interprets static images, such as a photo, there are still many questions about how it processes dynamic sequences of images, such as those in a movie.
To explore this process, researchers studied the behavior of certain neurons in the tadpole brain. These neurons are located in the optic tectum, an area of the brain that helps process visual information.
They used advanced methods to identify the “receptive fields” of these neurons, or the areas in which they respond to visual stimuli. The researchers presented the tadpoles with complex light patterns that simulated natural scenes and analyzed how the neurons responded to these rapid changes in the images.
The researchers found that neurons in the tadpoles’ optic tectum were able to recognize visual sequences lasting 200 to 600 milliseconds (about the duration of a blink).
These sequences of images had specific start and stop points that the neurons clearly identified. What’s more, the neurons not only recognized these sequences, but also adjusted their response depending on the animal’s previous visual experience. This means that the brain can “learn” from what it sees and adapt to changes in the environment.
Another interesting finding was that the neurons appeared to follow mathematical patterns, similar to trigonometric functions, to process these images over time.
This suggests that the brain uses a kind of “repetitive rules” to identify visual scenes efficiently. These repetitive patterns form the basis of how the brain detects and processes dynamic scenes, such as a moving movie.
Schematic of the retinotectal system of a fish and the connections of the optic tectum to the nucleus isthmi (NI), nucleus pretectalis (NP), and the efferent premotor pathways of the tectum. The gray arrows show the visuotopic mapping in the opposite tectal lobe. Cells in the tectum project excitatorily to the ipsilateral NI. The NP, which receives tectal input, projects inhibitorily to the NI. The dotted arrows show the NI’s projection to the ipsilateral tectal lobes. Source: David P. M. Northmore
The research also has a practical application: understanding how the brain processes this information can inspire the creation of more efficient technologies.
Today, artificial intelligence (AI) already uses some brain-based ideas to recognize static images, such as photographs. For example, convolutional neural networks (CNNs), such as those used in facial recognition systems, rely on principles from the human brain to identify visual shapes and patterns.
However, when it comes to dynamic scenes, such as videos, current technology is still limited. Processing a movie requires many layers of processing, large data sets, and a lot of time to train the AI.
Inspired by the workings of neurons in the optic tectum of tadpoles, researchers created a new machine learning network called MovieNet. This network was designed to mimic how the brain encodes sequences of images.
The results were impressive. MovieNet was able to classify natural movie scenes more efficiently than traditional machine learning networks. What’s more, it used less data and fewer steps to perform the task.
This shows that applying brain-based principles can lead to significant advances in movie recognition technology. This study not only helped us understand how the brain processes complex visual scenes, but also showed how this knowledge can be used to improve technology.
By learning from how the brain works, especially in areas such as the tadpole’s optic tectum, researchers have created a smarter, more efficient AI for recognizing movies.
This work opens up new possibilities for developing brain-inspired technologies that will make them faster and more accurate in the future.
Movie Recognition AI Using Neural Rules for Spatiotemporal RFs as a Movie Encoder. (A and B) Encoding of movie scenes based on neuronal spatiotemporal RF properties. Movies captured from a head-mounted camera of a hawk flying through a forest were used as input (A) and processed by an array of 10 × 10 encoders based on spatiotemporal RF properties of tectal optic neurons (B, Left) to generate a matrix output of a 600 ms movie segment (B, Right). (C) Encoders assembled from four-neuron RF arrays generated four different output matrices of a 600 ms movie. Intensity scaling in C applies to B. (D) Training and testing datasets for machine learning consist of 600 ms movie segments of tadpoles swimming in 0, 5, 15, and 30 µM pentylenetetrazol (PTZ). (E) Brain-based AI network accurately classifies movie data. Networks consisted of arrays of 1–25 multiplexed motion encoders (E, left) and a CNN (E, middle). The encoders transform ~600 ms movie segments into matrix data. Encoder output arrays were organized in an architecture resembling a topographic hypercolumn or a stacked architecture (E, middle). Networks trained with both architectures distinguished the swimming behaviors of animals exposed to 0, 5, 15, and 30 µM PTZ. Plots (E) show classification accuracy. Dotted line: classification of training data, gray lines: classification of test data with increasing numbers of encoders in the matrix. The network model trained with the hypercolumn architecture of the multiplexed encoder outperformed the network with the stacked architecture (hypercolumn: 82.3% accuracy, stacked accuracy 70.1%, L, Right). Humans (blue line, n = 6) classified swimming behaviors with 64.5% accuracy after training with tadpole movies at 0, 5, 15, and 30 µM PTZ (30-trial rolling average). The score plateaued before 200 trials.
READ MORE:
Identification of movie encoding neurons enables movie recognition AI
Masaki Hiramoto and Hollis T. Cline
PNAS. November 19, 2024. 121 (48) e2412260121
Abstract:
Natural visual scenes are dominated by spatiotemporal image dynamics, but how the visual system integrates “movie” information over time is unclear. We characterized optic tectal neuronal receptive fields using sparse noise stimuli and reverse correlation analysis. Neurons recognized movies of ~200-600 ms durations with defined start and stop stimuli. Movie durations from start to stop responses were tuned by sensory experience though a hierarchical algorithm. Neurons encoded families of image sequences following trigonometric functions. Spike sequence and information flow suggest that repetitive circuit motifs underlie movie detection. Principles of frog topographic retinotectal plasticity and cortical simple cells are employed in machine learning networks for static image recognition, suggesting that discoveries of principles of movie encoding in the brain, such as how image sequences and duration are encoded, may benefit movie recognition technology. We built and trained a machine learning network that mimicked neural principles of visual system movie encoders. The network, named MovieNet, outperformed current machine learning image recognition networks in classifying natural movie scenes, while reducing data size and steps to complete the classification task. This study reveals how movie sequences and time are encoded in the brain and demonstrates that brain-based movie processing principles enable efficient machine learning.
Comments