Transformer: A Novel Neural Network Architecture for Language Understanding

Posted by Jakob Uszkoreit, Software Engineer, Natural Language Understanding

Neural networks, in particular recurrent neural networks (RNNs), are now at the core of the leading approaches to language understanding tasks such as language modeling, machine translation and question answering. In Attention Is All You Need we introduce the Transformer, a novel neural network architecture based on a self-attention mechanism that we believe to be particularly well-suited for language understanding.

In our paper, we show that the Transformer outperforms both recurrent and convolutional models on academic English to German and English to French translation benchmarks. On top of higher translation quality, the Transformer requires less computation to train and is a much better fit for modern machine learning hardware, speeding up training by up to an order of magnitude.

BLEU scores (higher is better) of single models on the standard WMT newstest2014 English to German translation benchmark.
BLEU scores (higher is better) of single models on the standard WMT newstest2014 English to French translation benchmark.

Accuracy and Efficiency in Language Understanding
Neural networks usually process language by generating fixed- or variable-length vector-space representations. After starting with representations of individual words or even pieces of words, they aggregate information from surrounding words to determine the meaning of a given bit of language in context. For example, deciding on the most likely meaning and appropriate representation of the word “bank” in the sentence “I arrived at the bank after crossing the…” requires knowing if the sentence ends in “… road.” or “… river.”

RNNs have in recent years become the typical network architecture for translation, processing language sequentially in a left-to-right or right-to-left fashion. Reading one word at a time, this forces RNNs to perform multiple steps to make decisions that depend on words far away from each other. Processing the example above, an RNN could only determine that “bank” is likely to refer to the bank of a river after reading each word between “bank” and “river” step by step. Prior research has shown that, roughly speaking, the more such steps decisions require, the harder it is for a recurrent network to learn how to make those decisions.

The sequential nature of RNNs also makes it more difficult to fully take advantage of modern fast computing devices such as TPUs and GPUs, which excel at parallel and not sequential processing. Convolutional neural networks (CNNs) are much less sequential than RNNs, but in CNN architectures like ByteNet or ConvS2S the number of steps required to combine information from distant parts of the input still grows with increasing distance.

The Transformer
In contrast, the Transformer only performs a small, constant number of steps (chosen empirically). In each step, it applies a self-attention mechanism which directly models relationships between all words in a sentence, regardless of their respective position. In the earlier example “I arrived at the bank after crossing the river”, to determine that the word “bank” refers to the shore of a river and not a financial institution, the Transformer can learn to immediately attend to the word “river” and make this decision in a single step. In fact, in our English-French translation model we observe exactly this behavior.

More specifically, to compute the next representation for a given word – “bank” for example – the Transformer compares it to every other word in the sentence. The result of these comparisons is an attention score for every other word in the sentence. These attention scores determine how much each of the other words should contribute to the next representation of “bank”. In the example, the disambiguating “river” could receive a high attention score when computing a new representation for “bank”. The attention scores are then used as weights for a weighted average of all words’ representations which is fed into a fully-connected network to generate a new representation for “bank”, reflecting that the sentence is talking about a river bank.

The animation below illustrates how we apply the Transformer to machine translation. Neural networks for machine translation typically contain an encoder reading the input sentence and generating a representation of it. A decoder then generates the output sentence word by word while consulting the representation generated by the encoder. The Transformer starts by generating initial representations, or embeddings, for each word. These are represented by the unfilled circles. Then, using self-attention, it aggregates information from all of the other words, generating a new representation per word informed by the entire context, represented by the filled balls. This step is then repeated multiple times in parallel for all words, successively generating new representations.

The decoder operates similarly, but generates one word at a time, from left to right. It attends not only to the other previously generated words, but also to the final representations generated by the encoder.

Flow of Information
Beyond computational performance and higher accuracy, another intriguing aspect of the Transformer is that we can visualize what other parts of a sentence the network attends to when processing or translating a given word, thus gaining insights into how information travels through the network.

To illustrate this, we chose an example involving a phenomenon that is notoriously challenging for machine translation systems: coreference resolution. Consider the following sentences and their French translations:

It is obvious to most that in the first sentence pair “it” refers to the animal, and in the second to the street. When translating these sentences to French or German, the translation for “it” depends on the gender of the noun it refers to – and in French “animal” and “street” have different genders. In contrast to the current Google Translate model, the Transformer translates both of these sentences to French correctly. Visualizing what words the encoder attended to when computing the final representation for the word “it” sheds some light on how the network made the decision. In one of its steps, the Transformer clearly identified the two nouns “it” could refer to and the respective amount of attention reflects its choice in the different contexts.

The encoder self-attention distribution for the word “it” from the 5th to the 6th layer of a Transformer trained on English to French translation (one of eight attention heads).

Given this insight, it might not be that surprising that the Transformer also performs very well on the classic language analysis task of syntactic constituency parsing, a task the natural language processing community has attacked with highly specialized systems for decades.
In fact, with little adaptation, the same network we used for English to German translation outperformed all but one of the previously proposed approaches to constituency parsing.

Next Steps
We are very excited about the future potential of the Transformer and have already started applying it to other problems involving not only natural language but also very different inputs and outputs, such as images and video. Our ongoing experiments are accelerated immensely by the Tensor2Tensor library, which we recently open sourced. In fact, after downloading the library you can train your own Transformer networks for translation and parsing by invoking just a few commands. We hope you’ll give it a try, and look forward to seeing what the community can do with the Transformer.

This research was conducted by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser and Illia Polosukhin. Additional thanks go to David Chenell for creating the animation above.

Continua a leggere

Pubblicato in Senza categoria

Kumik KMF036 1/6th scale 12-year-old Girl Action Figure aka Mathilda in The Professional

Mathilda (Natalie Portman) is only 12 years old, but is already familiar with the dark side of life: her abusive father stores drugs for corrupt police officers, and her mother neglects her. Léon (Jean Reno), who lives down the hall, tends to his houseplants and works as a hired hitman for mobster Tony (Danny Aiello). When her family is murdered by crooked DEA agent Stansfield (Gary Oldman), Mathilda joins forces with a reluctant Léon to learn his deadly trade and avenge her family’s deaths.

Léon: The Professional (French: Léon; originally released in the United States as The Professional) is a 1994 English-language French thriller film written and directed by Luc Besson. It stars Jean Reno and Gary Oldman, and features the motion picture debut of Natalie Portman.

Kumik KMF036 1/6th scale 12-year-old Girl Action Figure Parts list: 1/6th scale child body, 1/6th scale Girl Head sculpt, Outfit set, Figure stand, Revolver, Plant

Scroll down to see all the pictures.
Click on them for bigger and better views.

Related posts:
“No Women, No Kids” – 1/6th scale Léon: The Professional 12-inch Figure (pics HERE)
Updated 1/6th scale Léon: The Professional 12-inch Figure with HeadPlay Jean Reno Head sculpt posted on my toy blog HERE
Action Figure Review of Heroic “Perfection Killer” 1/6th scale Léon, the Professional posted HERE and HERE
Léon the hitman meets Norman Stansfield in 1:6 scale, the yin-yang of “The Professional” movie posted HERE

Continua a leggere

Pubblicato in Senza categoria

Hot Toys Star Wars: Episode III Revenge of the Sith 1/6th Anakin Skywalker collectible figure

“My powers have doubled since the last time we met, Count!” – Anakin Skywalker

Playing a core and decisive role in the Star Wars galaxy, Anakin Skywalker had the potential to become one of the most powerful Jedi ever, and was believed by some to be the prophesied Chosen One who would bring balance to the Force. Despite being a hero of the Clone Wars, Anakin’s fear of loss would prove to be his downfall and lead him on a path to the Dark Side.

Today Hot Toys is thrilled to officially present our incredible movie masterpiece – the highly anticipated 1/6th scale Anakin Skywalker collectible figure from Star Wars: Episode III Revenge of the Sith.

Masterfully crafted based on the appearance of Anakin Skywalker in the movie, the 1/6th scale collectible figure features a newly developed head sculpt with impressive likeness, a meticulously tailored Jedi outfit, an interchangeable mechno right arm, a LED light-up lightsaber, an additional Count Dooku’s lightsaber, and a specially designed figure base.

Scroll down to see the rest of the pictures.
Click on them for bigger and better views.

Hot Toys MMS437 Star Wars: Episode III Revenge of the Sith 1/6th scale Anakin Skywalker collectible figure features: Authentic and detailed likeness of Hayden Christensen as Anakin Skywalker in Star Wars: Episode III Revenge of the Sith | Newly developed head sculpt with movie-accurate facial expression and detailed skin texture | Detailed hair sculpture of Anakin Skywalker’s hair style | Approximately 31 cm tall Body with over 30 points of articulations | Eight (8) pieces of interchangeable hands including (bare left hands and gloved right hands): pair of fists, pair of lightsabers holding hands, pair of the Force-using hands, relaxed right hand, opened left hand, interchangeable mechno right arm

Costume: brown-colored under-tunic, dark brown-colored leather-like tunic, brown-colored Jedi robe, dark brown-colored leather-like belt, pair of brown-colored pants, pair of dark brown-colored leather-like textured boots

Weapons: LED-lighted blue lightsaber (blue light, battery operated), Count Dooku’s red lightsaber (does not light up)

Accessory: Specially designed figure stand and with character nameplate and Star Wars logo

Continua a leggere

Pubblicato in Senza categoria

Kumik 1/6th scale Chloë Grace Moretz head sculpt Review – not a bad sculpt at all :)

Chloë Grace Moretz is an American actress and model. She began her acting career in 2004 at age seven, and received her first award nomination the following year for The Amityville Horror. Her other film credits include (500) Days of Summer, The Poker House, Diary of a Wimpy Kid, Kick-Ass and Kick-Ass 2, Let Me In, Hugo, Dark Shadows, Carrie, If I Stay, The Equalizer, and The 5th Wave. Moretz provided the voice of Hit-Girl for Kick-Ass: The Game and Emily Kaldwin in Dishonored.

Chloë Grace Moretz really stood out for me in the 2010 British-American superhero black comedy film Kick-Ass. As 11-year-old Mindy Macready, she fights crime as the ruthless vigilante Hit-Girl alongside her father Big Daddy (Nicolas Cage), a former cop who intends to bring down the crime boss Frank D’Amico (Mark Strong) and his son (Christopher Mintz-Plasse) (Red Mist).

This is the Kumik 1/6th scale Chloë Grace Moretz head sculpt with rooted hair which I got recently. I don’t have an appropriate body type for this head so I chose the Brother Production Zombie Killer “Alice” 12-inch figure body for this photo shoot. I had reviewed this Brother Production 1/6th scale Zombie Killer “Alice” figure some time back – see the action figure review posted on my toy blog HERE. I also have pictures of the Hot Toys Movie Masterpiece Series MMS139 “Resident Evil: Afterlife” 1/6 scale Alice head sculpt on this body as well – see the pics HERE.

Scroll down to see the rest of the pictures.
Click on them for bigger and better views.

Scroll down to see the close up pictures of the Kumik 1/6th scale Chloë Grace Moretz head sculpt

NEXT: Action Figure Review of 1/6th scale Chloë Grace Moretz as Hit-Girl in Kick-Ass

Continua a leggere

Pubblicato in Senza categoria

Challenges Ahead on Supply

A lot of news, but first and foremost, my thoughts are out to everyone affected by Hurricane Harvey.  That was/is a challenging storm and hoping for only the best for those in its path today and going forward.

Now on to the glass world and its been a tough one on the float side.  The market is relatively busy right now save for some soft spots here and there, so the need for glass is pretty high.  Unfortunately some events at the float level have me very concerned about capacity and so thus comes the warning.  (Katy Devlin had a fantastic take on it HERE) Glass is tight now and going to get tighter.  This is the time to get as organized as you can and understand your supply chains and the future orders you need to fill.  Proactivity is a must.  This is also a massive reason on why you need to attend GlassBuild America in a few weeks because if you are not networking and communicating you will be left behind.  I will be monitoring this glass supply issue and will continue to report when relevant.  And yes I know I am more reactionary than others out there, but I’d rather be safe than sorry.

–  A few weeks ago I mentioned another big glass deal was coming and that happened publicly last week with Glass Dynamics being sold to Press Glass.  Obviously the interesting angle here is the parent company of Press is in Europe so this is a new player having a location the North American fab market.  Press has a great reputation in the areas they are already in, so there’s a positive from the market standpoint.

–  Congrats to Martin Bracamonte on his new position as President of IGE Glass Technologies.  Michael Spellman built a strong team over there and he’ll obviously still be involved but adding Martin as the President was an excellent move- he’s a good and talented person.

–  And while I am in the congratulatory mood, props to the folks at Conners Sales.  They launched a fabulous new website last week.  Really impressive work and great examples of the lines they represent.
–  The latest Architectural Billings Index was released this week and once again it was in the positive- that’s now 6 straight months on the good side of the ledger and overall a pretty amazing run over the last 2+ years.  All of the other indexes that are tracked (regional and new inquiries) were up as well.

–  I’m a big “Smart roads” guy and look forward to seeing ifand how this works.  The next area for testing is in Kansas City along with two other to be determined sites.  Really curious to see how this holds up in climate like KC…

–  Last this week, College Football is now back… can’t wait…. One of the first college games of the year will be held at the new stadium in Atlanta. (Which you can view close up when you attend GlassBuild)  Architectural Digest got a tour and broke itall down….

–  Naming your baby “Eclipse” Ugh.  I guess in 2024 that kid will have its day in the spotlight again.
–  This is brilliant- gamers are very sharp and good to see their skills going to something very meaningful.
Driving a BBQ Grill.  And the Grill is on.  And the driver decides its time for a smoke.  Youcan guess what is next.

Sinkhole happens in China.  Then a few minutes later a guy on his scooter comes by…. Uh oh

Continua a leggere

Pubblicato in Senza categoria