Pre-order SUPER DUCK SET015 1/6 Sexy Robot Head Sculpt + Costume Set at KGHobby (link HERE) Nier: Automata is an action role-playing game developed by PlatinumGames and published by Square Enix. Set in the midst of a proxy war between machines created… Continua a leggere
Posted by Stephan Gouws, Research Scientist, Google Brain Team and Mostafa Dehghani, University of Amsterdam PhD student and Google Research Intern
Last year we released the Transformer, a new machine learning model that showed remarkable success over existing algorithms for machine translation and other language understanding tasks. Before the Transformer, most neural network based approaches to machine translation relied on recurrent neural networks (RNNs) which operate sequentially (e.g. translating words in a sentence one-after-the-other) using recurrence (i.e. the output of each step feeds into the next). While RNNs are very powerful at modeling sequences, their sequential nature means that they are quite slow to train, as longer sentences need more processing steps, and their recurrent structure also makes them notoriously difficult to train properly.
In contrast to RNN-based approaches, the Transformer used no recurrence, instead processing all words or symbols in the sequence in parallel while making use of a self-attention mechanism to incorporate context from words farther away. By processing all words in parallel and letting each word attend to other words in the sentence over multiple processing steps, the Transformer was much faster to train than recurrent models. Remarkably, it also yielded much better translation results than RNNs. However, on smaller and more structured language understanding tasks, or even simple algorithmic tasks such as copying a string (e.g. to transform an input of “abc” to “abcabc”), the Transformer does not perform very well. In contrast, models that perform well on these tasks, like the Neural GPU and Neural Turing Machine, fail on large-scale language understanding tasks like translation.
In “Universal Transformers” we extend the standard Transformer to be computationally universal (Turing complete) using a novel, efficient flavor of parallel-in-time recurrence which yields stronger results across a wider range of tasks. We built on the parallel structure of the Transformer to retain its fast training speed, but we replaced the Transformer’s fixed stack of different transformation functions with several applications of a single, parallel-in-time recurrent transformation function (i.e. the same learned transformation function is applied to all symbols in parallel over multiple processing steps, where the output of each step feeds into the next). Crucially, where an RNN processes a sequence symbol-by-symbol (left to right), the Universal Transformer processes all symbols at the same time (like the Transformer), but then refines its interpretation of every symbol in parallel over a variable number of recurrent processing steps using self-attention. This parallel-in-time recurrence mechanism is both faster than the serial recurrence used in RNNs, and also makes the Universal Transformer more powerful than the standard feedforward Transformer.
At each step, information is communicated from each symbol (e.g. word in the sentence) to all other symbols using self-attention, just like in the original Transformer. However, now the number of times this transformation is applied to each symbol (i.e. the number of recurrent steps) can either be manually set ahead of time (e.g. to some fixed number or to the input length), or it can be decided dynamically by the Universal Transformer itself. To achieve the latter, we added an adaptive computation mechanism to each position which can allocate more processing steps to symbols that are more ambiguous or require more computations.
As an intuitive example of how this could be useful, consider the sentence “I arrived at the bank after crossing the river”. In this case, more context is required to infer the most likely meaning of the word “bank” compared to the less ambiguous meaning of “I” or “river”. When we encode this sentence using the standard Transformer, the same amount of computation is applied unconditionally to each word. However, the Universal Transformer’s adaptive mechanism allows the model to spend increased computation only on the more ambiguous words, e.g. to use more steps to integrate the additional contextual information needed to disambiguate the word “bank”, while spending potentially fewer steps on less ambiguous words.
At first it might seem restrictive to allow the Universal Transformer to only apply a single learned function repeatedly to process its input, especially when compared to the standard Transformer which learns to apply a fixed sequence of distinct functions. But learning how to apply a single function repeatedly means the number of applications (processing steps) can now be variable, and this is the crucial difference. Beyond allowing the Universal Transformer to apply more computation to more ambiguous symbols, as explained above, it further allows the model to scale the number of function applications with the overall size of the input (more steps for longer sequences), or to decide dynamically how often to apply the function to any given part of the input based on other characteristics learned during training. This makes the Universal Transformer more powerful in a theoretical sense, as it can effectively learn to apply different transformations to different parts of the input. This is something that the standard Transformer cannot do, as it consists of fixed stacks of learned Transformation blocks applied only once.
But while increased theoretical power is desirable, we also care about empirical performance. Our experiments confirm that Universal Transformers are indeed able to learn from examples how to copy and reverse strings and how to perform integer addition much better than a Transformer or an RNN (although not quite as well as Neural GPUs). Furthermore, on a diverse set of challenging language understanding tasks the Universal Transformer generalizes significantly better and achieves a new state of the art on the bAbI linguistic reasoning task and the challenging LAMBADA language modeling task. But perhaps of most interest is that the Universal Transformer also improves translation quality by 0.9 BLEU1 over a base Transformer with the same number of parameters, trained in the same way on the same training data. Putting things in perspective, this almost adds another 50% relative improvement on top of the previous 2.0 BLEU improvement that the original Transformer showed over earlier models when it was released last year.
The Universal Transformer thus closes the gap between practical sequence models competitive on large-scale language understanding tasks such as machine translation, and computationally universal models such as the Neural Turing Machine or the Neural GPU, which can be trained using gradient descent to perform arbitrary algorithmic tasks. We are enthusiastic about recent developments on parallel-in-time sequence models, and in addition to adding computational capacity and recurrence in processing depth, we hope that further improvements to the basic Universal Transformer presented here will help us build learning algorithms that are both more powerful, more data efficient, and that generalize beyond the current state-of-the-art.
If you’d like to try this for yourself, the code used to train and evaluate Universal Transformers can be found here in the open-source Tensor2Tensor repository.
This research was conducted by Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, and Łukasz Kaiser. Additional thanks go to Ashish Vaswani, Douglas Eck, and David Dohan for their fruitful comments and inspiration.
Pre-order Following the worldwide success of the Avengers Infinity War movie, Sideshow and Iron Studios are proud to announce the latest from the Legacy Replica 1:4 Scale line – Iron Spider-Man! Based on the likeness of Tom Holland from the film, this … Continua a leggere
Pre-order Hot Toys MMS501 Shuri Black Panther 1/6th scale Collectible Figure at KGHobby (link HERE)
“The Black Panther fights for us. And I will be there beside him.”
Black Panther, one of the movies released by Marvel Studios earlier this year, keeps smashing records and exceeding the box-office and expectations. The Princess of Wakanda, Shuri, is the leader of the Wakandan Design Group responsible for developing this African Nation’s modern technology. When the young King T’Challa is drawn into conflict that puts his homeland Wakanda and the entire world at risk, Shuri has proven herself a great backup on creating new security system and weapons, including the Panther Habits and her cool looking panther-like gauntlets.
Received a tremendous amount of attention in the already-happened exhibitions in San Diego and Hong Kong recently, Hot Toys is more than excited to introduce fans today the long awaited 1/6th scale collectible figure featuring T’Challa’s innovative little sister – Shuri of Black Panther.
Beautifully crafted based on the appearance of Letitia Wright as Shuri in the movie, the highly-accurate collectible figure features a newly developed head sculpt with detailed hair sculpture, a newly developed body, an adorned elaborate new battle suit and neck ring, a Wakandan pattern sash, a wide range of weapons and accessories including a pair of LED light up Vibranium Guantlets, a spear, a Kimoyo Beads bracelet and a movie themed figure stand!
Hot Toys MMS501 1/6th scale Shuri Collectible Figure specially features: Newly developed head sculpt with authentic and detailed likeness of Letitia Wright as Shuri in Black Panther | Movie-accurate facial expression with detail skin texture and makeup | Brown color hair sculpture with braided hairstyles | Approximately 29 cm tall Newly developed body with over 28 points of articulations | Seven (7) pieces of interchangeable hands including: pair of fists, pair of relax hands, pair of hands for holding spear, gesture right hand
Costume: meticulously tailored brown and blue colored patterned jumpsuit with neck ring, silver and blue colored arm bands, yellow colored Wakandan tribe sash with silver colored buckle
Weapon: pair of LED light up Vibranium Gauntlets (blue light, battery operated), spear
Accessories: Kimoyo Beads bracelet, Specially designed Black Panther themed hexagonal figure stand with character nameplate and movie logo
Release date: Approximately Q3 – Q4, 2019
Cable (Nathan Summers) is a fictional character appearing in American comic books published by Marvel Comics, commonly in association with X-Force and the X-Men. The character first appeared as a newborn infant in Uncanny X-Men #201 (Jan. 1986) created… Continua a leggere
Polaris (birth name Lorna Dane, portrayed by Emma Dumont) is a main character on The Gifted. She is a mutant with the ability to manipulate magnetism. She is also the daughter of Magneto. The Gifted is an American television series created for Fox by M… Continua a leggere
Pre-order SUPER DUCK SET034 1/6 Summoner Head and Costume Set at KGHobby (link HERE)Yuna is a fictional character from Square Enix’s Final Fantasy series. She was first introduced as the female protagonist and one of the main playable characters of the… Continua a leggere
Once again, members of the S Scale Workshop will be at Exporail – the Canadian Railway Museum – to take part in the museum’s annual model railway celebration, A Great Passion for Model Trains.This year’s event takes place August 18-19 – yes, next weeke… Continua a leggere