Geoffrey Hinton ha estado investigando algo que él llama "teoría de las cápsulas" en las redes neuronales. ¿Qué es esto y cómo funciona?
neural-networks
rcpinto
fuente
fuente
Respuestas:
It appears to not be published yet; the best available online are these slides for this talk. (Several people reference an earlier talk with this link, but sadly it's broken at time of writing this answer.)
My impression is that it's an attempt to formalize and abstract the creation of subnetworks inside a neural network. That is, if you look at a standard neural network, layers are fully connected (that is, every neuron in layer 1 has access to every neuron in layer 0, and is itself accessed by every neuron in layer 2). But this isn't obviously useful; one might instead have, say, n parallel stacks of layers (the 'capsules') that each specializes on some separate task (which may itself require more than one layer to complete successfully).
If I'm imagining its results correctly, this more sophisticated graph topology seems like something that could easily increase both the effectiveness and the interpretability of the resulting network.
fuente
To supplement the previous answer: there is a paper on this that is mostly about learning low-level capsules from raw data, but explains Hinton's conception of a capsule in its introductory section: http://www.cs.toronto.edu/~fritz/absps/transauto6.pdf
It's also worth noting that the link to the MIT talk in the answer above seems to be working again.
According to Hinton, a "capsule" is a subset of neurons within a layer that outputs both an "instantiation parameter" indicating whether an entity is present within a limited domain and a vector of "pose parameters" specifying the pose of the entity relative to a canonical version.
The parameters output by low-level capsules are converted into predictions for the pose of the entities represented by higher-level capsules, which are activated if the predictions agree and output their own parameters (the higher-level pose parameters being averages of the predictions received).
Hinton speculates that this high-dimensional coincidence detection is what mini-column organization in the brain is for. His main goal seems to be replacing the max pooling used in convolutional networks, in which deeper layers lose information about pose.
fuente
Capsule networks try to mimic Hinton's observations of the human brain on the machine. The motivation stems from the fact that neural networks needed better modeling of the spatial relationships of the parts. Instead of modeling the co-existence, disregarding the relative positioning, capsule-nets try to model the global relative transformations of different sub-parts along a hierarchy. This is the eqivariance vs. invariance trade-off, as explained above by others.
These networks therefore include somewhat a viewpoint / orientation awareness and respond differently to different orientations. This property makes them more discriminative, while potentially introducing the capability to perform pose estimation as the latent-space features contain interpretable, pose specific details.
Todo esto se logra al incluir una capa anidada llamada cápsulas dentro de la capa, en lugar de concatenar otra capa en la red. Estas cápsulas pueden proporcionar salida vectorial en lugar de una escalar por nodo.
La contribución crucial del trabajo es el enrutamiento dinámico que reemplaza la agrupación máxima estándar por una estrategia inteligente. Este algoritmo aplica una agrupación de desplazamiento medio on the capsule outputs to ensure that the output gets sent only to the appropriate parent in the layer above.
Los autores también combinan las contribuciones con una pérdida de margen y pérdida de reconstrucción, que simultáneamente ayudan a aprender mejor la tarea y muestran resultados de vanguardia en MNIST.
El artículo reciente se llama Enrutamiento dinámico entre cápsulas y está disponible en Arxiv: https://arxiv.org/pdf/1710.09829.pdf .
fuente
Based on their paper Dynamic Routing between Capsules
fuente
One of the major advantages of Convolutional neural networks is their invariance to translation. However this invariance comes with a price and that is, it does not consider how different features are related to each other. For example, if we have a picture of a face CNN will have difficulties distinguishing relationship between mouth feature and nose features. Max pooling layers are the main reason for this effect. Because when we use max pooling layers, we lose the precise locations of the mouth and noise and we cannot say how they are related to each other.
Capsules try to keep the advantage of CNN and fix this drawback in two ways;
In other words, capsule takes into account the existence of the the specific feature that we are looking for like mouth or nose. This property makes sure that capsules are translation invariant the same that CNNs are.
fuente