The End Of My Era

For me, today is a sad day. At least in some ways. Today is the end of my career as a Microsoft Certified Trainer. My MCT  credentials lapse today and as I stopped taking/passing exams, I am no longer able to renew.  So this is the end of my 26+year road as an MCT and the end of my 51 year career in IT, largely focused on Microsoft technologies.

I first got excited about computers in 1968, when I found that the job of an operator paid something like $.50/hour more than washing dishes – and was way cleaner. After a degree from CMU in Computer Problem Solving (what the hip call AI today). I spent three years working both in tee US and the UK for Comshare. They were a time-sharing company and were, in effect, at the beginnings of the Cloud.

After several years as an OS developer, I took a sabbatical for 7 months. The high point of that trip was seeing Mt Everest over my shoulder from 18192 feet.

After returning, I joined Arthur Andersen and a few weeks later was in Chicago at a training event when I read about the launch of the PC and was blown away by the potential of the PC. Within days of the announcement, I was bugging our partners to invest. Looking back, I became one of the first DOS trainers in the world!  During the 1980s I taught to a variety of audiences on subjects including PCs, DOS, dBase, Word and Windows. I left Andersen in 1988 and launched my own company specialising in a combination of training and direct mail. in 1992 – One cool client was The Savoy Hotel Group which was a lot of fun.

I discovered NT (as an early beta tester) and began doing NT training in 1993 – teaching at SkyTech in London. Happy memories. I also taught for Learning Tree. Their NT and TCP/IP courses were awesome.  I then became an MCT at the very opening of the programme and until today I have remained an active MCT (MCT ID 6851).

Those early days were very different from today. To become an MCT you had to attend an evaluation session (often dubbed Shelia York’s Day Of Hell). I failed my first attempt – I was just told to turn up and was never given the relevant information. Second time around I nailed it and became both one of the first MCTs and one of the first MCSEs in the world.  You also had to attend a trainer prep course for any course you wanted to re-deliver – I taught a lot of these in my time which was a real honour to do this training.

I was also active first on MSN, then in the newsgroups, when there were newsgroups. I loved being able to help other MCTs. Like many MCTs, I  spent time working for Microsoft both in Redmond and Europe. As an MCT I served on a couple of advisory boards too. Perhaps the most meaningful for me was the Certified Learning Consultant initiative – requiring learning partners to have suitably qualified learning consultants on staff.

As many of the early MCTs will know, I have a passion for quality. Even when I all too often fall short myself. Such is life. When Lutz Ziob joined Microsoft Learning, he quickly outsourced much of the work that had previously been done internally (with LOTS of contractors – some of them awesome). The outsourcing made sense, unfortunately, the quality of what was produced was incredibly bad. I started a discussion about quality which I am glad to say had a major effect. One thing the discussion surfaced was that students often rated the courseware far better than the MCTs did. If nothing else, this proved the value of the MCT.

I have many happy memories: meeting Bill Gates and getting him to sign my Windows 95 Gold CD.

I also met Steve Ballmer on a couple of occasions. Here’s one:

In my travels, I have had many adventures, lost luggage, horrible rooms, cancelled/rerouted flights. I even spent an evening in jail in Turkey during a military revolution. I have also had the good times – flying Concorde, staying in the Savoy, and eating at the Tuna House. A precious memory was being in the room when Jeffrey Snover launched PowerShell and waving a $20 at him saying I’ll buy it.

So now it’s over. I enter full retirement with a mixture of relief(I made it!) and sadness.

What a long strange trip it’s been. Thanks for all the fish

Continua a leggere

Pubblicato in Senza categoria

Robust Neural Machine Translation

Posted by Yong Cheng, Software Engineer, Google Research

In recent years, neural machine translation (NMT) using Transformer models has experienced tremendous success. Based on deep neural networks, NMT models are usually trained end-to-end on very large parallel corpora (input/output text pairs) in an entirely data-driven fashion and without the need to impose explicit rules of language.

Despite this huge success, NMT models can be sensitive to minor perturbations of the input, which can manifest as a variety of different errors, such as under-translation, over-translation or mistranslation. For example, given a German sentence, the state-of-the-art NMT model, Transformer, will yield a correct translation.

“Der Sprecher des Untersuchungsausschusses hat angekündigt, vor Gericht zu ziehen, falls sich die geladenen Zeugen weiterhin weigern sollten, eine Aussage zu machen.”

(Machine translation to English: “The spokesman of the Committee of Inquiry has announced that if the witnesses summoned continue to refuse to testify, he will be brought to court.”),

But, when we apply a subtle change to the input sentence, say from geladenen to the synonym vorgeladenen, the translation becomes very different (and in this case, incorrect):

“Der Sprecher des Untersuchungsausschusses hat angekündigt, vor Gericht zu ziehen, falls sich die vorgeladenen Zeugen weiterhin weigern sollten, eine Aussage zu machen.”

(Machine translation to English: “The investigative committee has announced that he will be brought to justice if the witnesses who have been invited continue to refuse to testify.”).

This lack of robustness in NMT models prevents many commercial systems from being applicable to tasks that cannot tolerate this level of instability. Therefore, learning robust translation models is not just desirable, but is often required in many scenarios. Yet, while the robustness of neural networks has been extensively studied in the computer vision community, only a few prior studies on learning robust NMT models can be found in literature.

In “Robust Neural Machine Translation with Doubly Adversarial Inputs” (to appear at ACL 2019), we propose an approach that uses generated adversarial examples to improve the stability of machine translation models against small perturbations in the input. We learn a robust NMT model to directly overcome adversarial examples generated with knowledge of the model and with the intent of distorting the model predictions. We show that this approach improves the performance of the NMT model on standard benchmarks.

Training a Model with AdvGen
An ideal NMT model would generate similar translations for separate inputs that exhibit small differences. The idea behind our approach is to perturb a translation model with adversarial inputs in the hope of improving the model’s robustness. It does this using an algorithm called Adversarial Generation (AdvGen), which generates plausible adversarial examples for perturbing the model and then feeds them back into the model for defensive training. While this method is inspired by the idea of generative adversarial networks (GANs), it does not rely on a discriminator network, but simply applies the adversarial example in training, effectively diversifying and extending the training set.

The first step is to perturb the model using AdvGen. We start by using Transformer to calculate the translation loss based on a source input sentence, a target input sentence and a target output sentence. Then AdvGen randomly selects some words in the source sentence, assuming a uniform distribution. Each word has an associated list of similar words, i.e., candidates that can be used for substitution, from which AdvGen selects the word that is most likely to introduce errors in Transformer output. Then, this generated adversarial sentence is fed back into Transformer, initiating the defense stage.

First, the Transformer model is applied to an input sentence (lower left) and, in conjunction with the target output sentence (above right) and target input sentence (middle right; beginning with the placeholder “<sos>”), the translation loss is calculated. The AdvGen function then takes the source sentence, word selection distribution, word candidates, and the translation loss as inputs to construct an adversarial source example.

During the defend stage, the adversarial sentence is fed back into the Transformer model. Again the translation loss is calculated, but this time using the adversarial source input. Using the same method as above, AdvGen uses the target input sentence, word replacement candidates, the word selection distribution calculated by the attention matrix, and the translation loss to construct an adversarial target example.

In the defense stage, the adversarial source example serves as input to the Transformer model, and the translation loss is calculated. AdvGen then uses the same method as above to generate an adversarial target example from the target input.

Finally, the adversarial sentence is fed back into Transformer and the robustness loss using the adversarial source example, the adversarial target input example and the target sentence is calculated. If the perturbation led to a significant loss, the loss is minimized so that when the model is confronted with similar perturbations, it will not repeat the same mistake. On the other hand, if the perturbation leads to a low loss, nothing happens, indicating that the model can already handle this perturbation.

Model Performance
We demonstrate the effectiveness of our approach by applying it to the standard Chinese-English and English-German translation benchmarks. We observed a notable improvement of 2.8 and 1.6 BLEU points, respectively, compared to the competitive Transformer model, achieving a new state-of-the-art performance.

Comparison of Transformer model (Vaswani et al., 2017) on standard benchmarks.

We then evaluate our model on a noisy dataset, generated using a procedure similar to that described for AdvGen. We take an input clean dataset, such as that used on standard translation benchmarks, and randomly select words for similar word substitution. We find that our model exhibits improved robustness compared to other recent models.

Comparison of Transformer, Miyao et al. and Cheng et al. on artificial noisy inputs.

These results show that our method is able to overcome small perturbations in the input sentence and improve the generalization performance. It outperforms competitive translation models and achieves state-of-the-art translation performance on standard benchmarks. We hope our translation model will serve as a robust building block for improving many downstream tasks, especially when those are sensitive or intolerant to imperfect translation input.

This research was conducted by Yong Cheng, Lu Jiang and Wolfgang Macherey. Additional thanks go to our leadership Andrew Moore and Julia (Wenli) Zhu‎.

Continua a leggere

Pubblicato in Senza categoria

Google at ACL 2019

Andrew Helton, Editor, Google Research Communications

This week, Florence, Italy hosts the 2019 Annual Meeting of the Association for Computational Linguistics (ACL 2019), the premier conference in the field of natural language understanding, covering a broad spectrum of research areas that are concerned with computational approaches to natural language.

As a leader in natural language processing and understanding, and a Diamond Level sponsor of ACL 2019, Google will be on hand to showcase the latest research on syntax, semantics, discourse, conversation, multilingual modeling, sentiment analysis, question answering, summarization, and generally building better systems using labeled and unlabeled data.

If you’re attending ACL 2019, we hope that you’ll stop by the Google booth to meet our researchers and discuss projects and opportunities at Google that go into solving interesting problems for billions of people. Our researchers will also be on hand to demo the Natural Questions corpus, the Multilingual Universal Sentence Encoder and more. You can also learn more about the Google research being presented at ACL 2019 below (Google affiliations in blue).

Organizing Committee includes:
Enrique Alfonseca

Accepted Publications
A Joint Named-Entity Recognizer for Heterogeneous Tag-sets Using a Tag Hierarchy
Genady Beryozkin, Yoel Drori, Oren Gilon, Tzvika Hartman, Idan Szpektor

Do Neural Dialog Systems Use the Conversation History Effectively? An Empirical Study
Chinnadhurai Sankar, Sandeep Subramanian, Chris Pal, Sarath Chandar, Yoshua Bengio

Generating Logical Forms from Graph Representations of Text and Entities
Peter Shaw, Philip Massey, Angelica Chen, Francesco Piccinno, Yasemin Altun

Extracting Symptoms and their Status from Clinical Conversations
Nan Du, Kai Chen, Anjuli Kannan, Linh Trans, Yuhui Chen, Izhak Shafran

Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation
Vihan Jain, Gabriel Magalhaes, Alexander Ku, Ashish Vaswani, Eugene Le, Jason Baldridge

Meaning to Form: Measuring Systematicity as Information
Tiago Pimentel, Arya D. McCarthy, Damian Blasi, Brian Roark, Ryan Cotterell

Matching the Blanks: Distributional Similarityfor Relation Learning
Livio Baldini Soares, Nicholas FitzGerald, Jeffrey Ling, Tom Kwiatkowski

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc Le, Ruslan Salakhutdinov

HighRES: Highlight-based Reference-less Evaluation of Summarization
Hardy Hardy, Shashi Narayan, Andreas Vlachos

Zero-Shot Entity Linking by Reading Entity Descriptions
Lajanugen Logeswaran, Ming-Wei Chang, Kristina Toutanova, Kenton Lee, Jacob Devlin, Honglak Lee

Robust Neural Machine Translation with Doubly Adversarial Inputs
Yong Cheng, Lu Jiang, Wolfgang Macherey

Natural Questions: a Benchmark for Question Answering Research
Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Matthew Kelcey, Jacob Devlin, Kenton Lee, Kristina N. Toutanova, Llion Jones, Ming-Wei Chang, Andrew Dai, Jakob Uszkoreit, Quoc Le, Slav Petrov

Like a Baby: Visually Situated Neural Language Acquisition
Alexander Ororbia, Ankur Mali, Matthew Kelly, David Reitter

What Kind of Language Is Hard to Language-Model?
Sebastian J. Mielke, Ryan Cotterell, Kyle Gorman, Brian Roark, Jason Eisner

How Multilingual is Multilingual BERT?
Telmo Pires, Eva Schlinger, Dan Garrette

Handling Divergent Reference Texts when Evaluating Table-to-Text Generation
Bhuwan Dhingra, Manaal Faruqui, Ankur Parikh, Ming-Wei Chang, Dipanjan Das, William Cohen

BAM! Born-Again Multi-Task Networks for Natural Language Understanding
Kevin Clark, Minh-Thang Luong, Urvashi Khandelal, Christopher D. Manning, Quoc V. Le

Dynamically Composing Domain-Data Selection with Clean-Data Selection by “Co-Curricular Learning” for Neural Machine Translation
Wei Wang, Isaac Caswell, Ciprian Chelba

Monotonic Infinite Lookback Attention for Simultaneous Machine Translation
Naveen Arivazhagan, Colin Cherry, Wolfgang Macherey, Chung-Cheng Chiu, Semih Yavuz, Ruoming Pang, Wei Li, Colin Raffel

On the Robustness of Self-Attentive Models
Yu-Lun Hsieh, Minhao Cheng, Da-Cheng Juan, Wei Wei, Wen-Lian Hsu, Cho-Jui Hsieh

Neural Decipherment via Minimum-Cost Flow: from Ugaritic to Linear B
Jiaming Luo, Yuan Cao, Regina Barzilay

How Large Are Lions? Inducing Distributions over Quantitative Attributes
Yanai Elazar, Abhijit Mahabal, Deepak Ramachandran, Tania Bedrax-Weiss, Dan Roth

BERT Rediscovers the Classical NLP Pipeline
Ian Tenney, Dipanjan Das, Ellie Pavlick

Can You Tell Me How to Get Past Sesame Street? Sentence-Level Pretraining Beyond Language Modeling
Alex Wang, Jan Hula, Patrick Xia, Raghavendra Pappagari, R. Thomas Mccoy, Roma Patel, Najoung Kim, Ian Tenney, Yinghui Huang, Katherin Yu, Shuning Jin, Berlin Chen, Benjamin Van Durme, Edouard Grave, Ellie Pavlick, Samuel R. Bowman

Robust Zero-Shot Cross-Domain Slot Filling with Example Values
Darsh Shah, Raghav Gupta, Amir Fayazi, Dilek Hakkani-Tur

Latent Retrieval for Weakly Supervised Open Domain Question Answering
Kenton Lee, Ming-Wei Chang, Kristina Toutanova

On-device Structured and Context Partitioned Projection Networks
Sujith Ravi, Zornitsa Kozareva

Incorporating Priors with Feature Attribution on Text Classification
Frederick Liu, Besim Avci

Informative Image Captioning with External Sources of Information
Sanqiang Zhao, Piyush Sharma, Tomer Levinboim, Radu Soricut

Reducing Word Omission Errors in Neural Machine Translation: A Contrastive Learning Approach
Zonghan Yang, Yong Cheng, Yang Liu, Maosong Sun

Synthetic QA Corpora Generation with Roundtrip Consistency
Chris Alberti, Daniel Andor, Emily Pitler, Jacob Devlin, Michael Collins

Unsupervised Paraphrasing without Translation
Aurko Roy, David Grangier

Widening NLP 2019
Organizers include: Diyi Yang

NLP for Conversational AI
Organizers include: Thang-Minh Luong, Tania Bedrax-Weiss

The Fourth Arabic Natural Language Processing Workshop
Organizers include: Imed Zitouni

The Third Workshop on Abusive Language Online
Organizers include: Zeerak Waseem

TyP-NLP, Typology for Polyglot NLP
Organizers include: Manaal Faruqui

Gender Bias in Natural Language Processing
Organizers include: Kellie Webster

Wikipedia as a Resource for Text Analysis and Retrieval
Organizer: Marius Pasca

Continua a leggere

Pubblicato in Senza categoria

Roundtable and Great Glazing

Tons of educational and technical topics lead off the blog this week.   First off if you missed the webinar from the National Glass Association and Architectural Record Magazine you missed an incredible event.  The “Professional Roundtable: Perspectives on Glass and Glazing in Design” featured five brilliant minds all bringing up great point after great point with regards to energy codes, standards, performances and the glass and glazing products that can be utilized to satisfy all and advance our world.  It truly was something to take in and I especially enjoyed seeing the pictures of the projects that the panelists pointed to showing great glass and glazing in action.  The biggest takeaway for me was that we as an industry could do amazing things and reach incredible performances- all we have to do is actually do it.  Welcome the change, welcome the new and push forward. There are many people and companies who take this tact but there are still many that don’t and that includes architects unwilling to try the “new” as well.  In any case I think panels like this are only scratching the surface… which brings me to….

The final schedule for Express Learning at GlassBuild is now published and there’s a ton of incredible content but one of the big keys is the “live podcast” that the guys from Edify Studios will be doing which will be focused on disruptive change in the glass industry.  Basically what is going to be our “Uber” or other breakthrough? It’s a huge and important subject and one that will be “must attend” at the show.  The rest of the schedule is fantastic too- lots of very interesting subjects and engaging speakers. 

–  On GlassBuild, seriously- if you haven’t registered yet- please make a note and do it soon.  And even bigger- get your hotel room taken care of… just go to and it’ll take you 5 minutes.  Thank you.

–  The latest NGA Tech and Advocacy bulletin was released this week and its astonishing how much work is being done by Urmilla Sowell and the folks at NGA.  Here’s a quick smattering of what was covered and you’ll see there is a TON going on…

Laminated Glazing Reference Manual   

Products for Energy Applications 

Coastal Glazing and the Turtle Codes

Assessing the Durability of Decorative Glass

Glass Properties Pertaining to Photovoltaic Applications 

Glossary of Terms for Color and Appearance 

Proper Procedures for Cleaning Flat Glass Mirrors

Proper Procedures for Receiving, Storage and Transportation of Flat Glass Mirrors 

70 Glass Information Bulletins (GIBs) available

AIA-Approved Presentations

Glass Floors and Stairs Task Group

Measuring Color of Decorative Material in the Field

Point-Supported Glazing

Design Considerations for Use of Sealants/Adhesives with Coated Glass and Adhesives Compatibility

Understanding Reflected Solar Energy of Glazing Systems in Buildings

Updated Coated Glass AIA presentation

Engineering Standards Manual, 2019 edition

This work matters as it advances our industry… if you are interested join NGA and GET INVOLVED in the process!

–  Last this week a fun one… so I was way behind the times TV wise- just recently finished “Breaking Bad” (AWESOME TV) and now I’m catching up on “Better Call Saul” which is also fabulous.  This week as I watched an episode in season three I was excited to see a familiar glass industry product… I am pretty sure that Wood’s Powr-Grip vacuum cups were utilized… it wasn’t for glazing unfortunately but still to see a product that we in our industry use quite a bit I still enjoyed it.  You know me I watch everything for a connection back to the industry.  Not sure anyone from Wood’s still reads this blog (I miss Joe Landsverk of Wood’s- he used to read, but he passed on a few years ago, but I’m guessing he would’ve loved this story) but if you do, let me know if that was yours…


Creative way that Japan is creating the medals for the next Olympics…
Comical story involving dating, sisters, and a stolen car
This is a great son doing super for his mom.  I could absolutely see my brother doing the same for our Mom when she was with us…

Couldn’t get this to post… so skipping the video for the this week… sorry!

Continua a leggere

Pubblicato in Senza categoria

Learning Better Simulation Methods for Partial Differential Equations

Posted by Stephan Hoyer, Software Engineer, Google Research

The world’s fastest supercomputers were designed for modeling physical phenomena, yet they still are not fast enough to robustly predict the impacts of climate change, to design controls for airplanes based on airflow or to accurately simulate a fusion reactor. All of these phenomena are modeled by partial differential equations (PDEs), the class of equations that describe everything smooth and continuous in the physical world, and the most common class of simulation problems in science and engineering. To solve these equations, we need faster simulations, but in recent years, Moore’s law has been slowing. At the same time, we’ve seen huge breakthroughs in machine learning (ML) along with faster hardware optimized for it. What does this new paradigm offer for scientific computing?

In “Learning Data Driven Discretizations for Partial Differential Equations”, published in Proceedings of the National Academy of Sciences, we explore a potential path for how ML can offer continued improvements in high-performance computing, both for solving PDEs and, more broadly, for solving hard computational problems in every area of science.

For most real-world problems, closed-form solutions to PDEs don’t exist. Instead, one must find discrete equations (“discretizations”) that a computer can solve to approximate the continuous PDE. Typical approaches to solve PDEs represent equations on a grid, e.g., using finite differences. To achieve convergence, the mesh spacing of the grid needs to be smaller than the smallest feature size of the solutions. This often isn’t feasible because of an unfortunate scaling law: achieving 10x higher resolution requires 10,000x more compute, because the grid must be scaled in four dimensions—three spatial dimensions and time. Instead, in our paper we show that ML can be used to learn better representations for PDEs on coarser grids.

Satellite photo of a hurricane, at both full resolution and simulated resolution in a state-of-the-art weather model. Cumulus clouds (e.g., in the red circle) are responsible for heavy rainfall, but in the weather model the details are entirely blurred out. Instead, models rely on crude approximations for sub-grid physics, a key source of uncertainty in climate models. Image credit: NOAA

The challenge is to retain the accuracy of high-resolution simulations while still using the coarsest grid possible. In our work we’re able to improve upon existing schemes by replacing heuristics based on deep human insight (e.g., “solutions to a PDE should always be smooth away from discontinuities”) with optimized rules based on machine learning. The rules our ML models recover are complex, and we don’t entirely understand them, but they incorporate sophisticated physical principles like the idea of “upwinding”—to accurately model what’s coming towards you in a fluid flow, you should look upstream in the direction the wind is coming from. An example of our results on a simple model of fluid dynamics are shown below:

Simulations of Burgers’ equation, a model for shock waves in fluids, solved with either a standard finite volume method (left) or our neural network based method (right). The orange squares represent simulations with each method on low resolution grids. These points are fed back into the model at each time step, which then predicts how they should change. Blue lines show the exact simulations used for training. The neural network solution is much better, even on a 4x coarser grid, as indicated by the orange squares smoothly tracing the blue line.

Our research also illustrates a broader lesson about how to effectively combine machine learning and physics. Rather than attempting to learn physics from scratch, we combined neural networks with components from traditional simulation methods, including the known form of the equations we’re solving and finite volume methods. This means that laws such as conservation of momentum are exactly satisfied, by construction, and allows our machine learning models to focus on what they do best, learning optimal rules for interpolation in complex, high-dimensional spaces.

Next Steps
We are focused on scaling up the techniques outlined in our paper to solve larger scale simulation problems with real-world impacts, such as weather and climate prediction. We’re excited about the broad potential of blending machine learning into the complex algorithms of scientific computing.

Thanks to co-authors Yohai Bar-Sinari, Jason Hickey and Michael Brenner; and Google collaborators Peyman Milanfar, Pascal Getreur, Ignacio Garcia Dorado, Dmitrii Kochkov, Jiawei Zhuang and Anton Geraschenko.

Continua a leggere

Pubblicato in Senza categoria

Hot Toys 1/6th scale "Avengers: Endgame" Don Cheadle as James Rhodes / Iron Patriot figure

With blistering success of the epic screenplay by Marvel Studios, Avengers: Endgame is well-received worldwide and the characters have all gained tremendous popularity including James Rhodes a.k.a Iron Patriot. Though the armor gets a few aesthetic changes, yet his new suit is still highly weaponised, with massive machine guns mounted on arms and shoulders.

In addition to the official unveil of the Battle Damaged Version of Iron Man Mark LXXXV 1/6th scale collectible figure, Hot Toys is excited to present today the new Iron Patriot 1/6th scale collectible figure from the MMS Diecast Series inspired by the final chapter of the 22-film MCU series, Avengers: Endgame for our passionate fans!

The highly-accurate diescast collectible figure is specially crafted based on the appearance of Don Cheadle as James Rhodes/Iron Patriot in Avengers: Endgame. It features two interchangeable head sculpts including a newly developed head sculpt with remarkable likeness and a helmet head with LED light-up function, metallic blue and reddish-orange painted armor with streamline armor design, LED light-up chest Arc Reactor and repulsors, Iron Patriot’s articulated weapons featuring back-mounted cannons and shoulder-mounted missile launchers, and a specially designed Avengers: Endgame themed figure base!

Scroll down to see all the pictures.
Click on them for bigger and better views.

Hot Toys MMS547D34 1/6th scale Iron Patriot Collectible Figure specially features: Approximately 32.5 cm tall Authentic and detailed likeness of Iron Patriot in Avengers: Endgame with Over 30 points of articulations | newly developed head sculpt with authentic likeness of Don Cheadle as James Rhodes in the movie | Movie-accurate facial features with detailed wrinkles and skin texture | interchangeable helmeted head with LED light-up function (white light, battery operated) | Contains diecast material | Special features on armor: Metallic blue, reddish orange and grayish silver colored painting on the sleek and streamline armor design; 6 LED light-up points throughout parts of the armor (white light, battery operated); pair of fully deployable air flaps at back of the armor; detachable chest armor to reveal interior mechanical design; set of attachable cannon (attachable to forearm or back of figure); Two (2) sets of interchangeable forearm cannon (normal and missile firing); Two (2) sets of interchangeable forearm armor (normal and missile firing); Six (6) pieces of interchangeable hands including: pair of fists, pair of hands with articulated fingers and light-up repulsors (white light, battery operated), pair of repulsor firing hands (white light, battery operated) | Articulations on waist armor which allow flexible movement

Weapons: pair of articulated back-mounted cannons, pair of articulated shoulder-mounted missile launchers

Accessories: specially designed movie-themed figure base with movie logo and character name

Continua a leggere

Pubblicato in Senza categoria

Building SMILY, a Human-Centric, Similar-Image Search Tool for Pathology

Posted by Narayan Hegde, Software Engineer, Google Health and Carrie J. Cai, Research Scientist, Google Research

Advances in machine learning (ML) have shown great promise for assisting in the work of healthcare professionals, such as aiding the detection of diabetic eye disease and metastatic breast cancer. Though high-performing algorithms are necessary to gain the trust and adoption of clinicians, they are not always sufficient—what information is presented to doctors and how doctors interact with that information can be crucial determinants in the utility that ML technology ultimately has for users.

The medical specialty of anatomic pathology, which is the gold standard for the diagnosis of cancer and many other diseases through microscopic analysis of tissue samples, can greatly benefit from applications of ML. Though diagnosis through pathology is traditionally done on physical microscopes, there has been a growing adoption of “digital pathology,” where high-resolution images of pathology samples can be examined on a computer. With this movement comes the potential to much more easily look up information, as is needed when pathologists tackle the diagnosis of difficult cases or rare diseases, when “general” pathologists approach specialist cases, and when trainee pathologists are learning. In these situations, a common question arises, “What is this feature that I’m seeing?” The traditional solution is for doctors to ask colleagues, or to laboriously browse reference textbooks or online resources, hoping to find an image with similar visual characteristics. The general computer vision solution to problems like this is termed content-based image retrieval (CBIR), one example of which is the “reverse image search” feature in Google Images, in which users can search for similar images by using another image as input.

Today, we are excited to share two research papers describing further progress in human-computer interaction research for similar image search in medicine. In “Similar Image Search for Histopathology: SMILY” published in Nature Partner Journal (npj) Digital Medicine, we report on our ML-based tool for reverse image search for pathology. In our second paper, Human-Centered Tools for Coping with Imperfect Algorithms During Medical Decision-Making(preprint available here), which received an honorable mention at the 2019 ACM CHI Conference on Human Factors in Computing Systems, we explored different modes of refinement for image-based search, and evaluated their effects on doctor interaction with SMILY.

SMILY Design
The first step in developing SMILY was to apply a deep learning model, trained using 5 billion natural, non-pathology images (e.g., dogs, trees, man-made objects, etc.), to compress images into a “summary” numerical vector, called an embedding. The network learned during the training process to distinguish similar images from dissimilar ones by computing and comparing their embeddings. This model is then used to create a database of image patches and their associated embeddings using a corpus of de-identified slides from The Cancer Genome Atlas. When a query image patch is selected in the SMILY tool, the query patch’s embedding is similarly computed and compared with the database to retrieve the image patches with the most similar embeddings.

Schematic of the steps in building the SMILY database and the process by which input image patches are used to perform the similar image search.

The tool allows a user to select a region of interest, and obtain visually-similar matches. We tested SMILY’s ability to retrieve images along a pre-specified axis of similarity (e.g. histologic feature or tumor grade), using images of tissue from the breast, colon, and prostate (3 of the most common cancer sites). We found that SMILY demonstrated promising results despite not being trained specifically on pathology images or using any labeled examples of histologic features or tumor grades.

Example of selecting a small region in a slide and using SMILY to retrieve similar images. SMILY efficiently searches a database of billions of cropped images in a few seconds. Because pathology images can be viewed at different magnifications (zoom levels), SMILY automatically searches images at the same magnification as the input image.
Second example of using SMILY, this time searching for a lobular carcinoma, a specific subtype of breast cancer.

Refinement tools for SMILY
However, a problem emerged when we observed how pathologists interacted with SMILY. Specifically, users were trying to answer the nebulous question of “What looks similar to this image?” so that they could learn from past cases containing similar images. Yet, there was no way for the tool to understand the intent of the search: Was the user trying to find images that have a similar histologic feature, glandular morphology, overall architecture, or something else? In other words, users needed the ability to guide and refine the search results on a case-by-case basis in order to actually find what they were looking for. Furthermore, we observed that this need for iterative search refinement was rooted in how doctors often perform “iterative diagnosis”—by generating hypotheses, collecting data to test these hypotheses, exploring alternative hypotheses, and revisiting or retesting previous hypotheses in an iterative fashion. It became clear that, for SMILY to meet real user needs, it would need to support a different approach to user interaction.

Through careful human-centered research described in our second paper, we designed and augmented SMILY with a suite of interactive refinement tools that enable end-users to express what similarity means on-the-fly: 1) refine-by-region allows pathologists to crop a region of interest within the image, limiting the search to just that region; 2) refine-by-example gives users the ability to pick a subset of the search results and retrieve more results like those; and 3) refine-by-concept sliders can be used to specify that more or less of a clinical concept be present in the search results (e.g., fused glands). Rather than requiring that these concepts be built into the machine learning model, we instead developed a method that enables end-users to create new concepts post-hoc, customizing the search algorithm towards concepts they find important for each specific use case. This enables new explorations via post-hoc tools after a machine learning model has already been trained, without needing to re-train the original model for each concept or application of interest.

Through our user study with pathologists, we found that the tool-based SMILY not only increased the clinical usefulness of search results, but also significantly increased users’ trust and likelihood of adoption, compared to a conventional version of SMILY without these tools. Interestingly, these refinement tools appeared to have supported pathologists’ decision-making process in ways beyond simply performing better on similarity searches. For example, pathologists used the observed changes to their results from iterative searches as a means of progressively tracking the likelihood of a hypothesis. When search results were surprising, many re-purposed the tools to test and understand the underlying algorithm, for example, by cropping out regions they thought were interfering with the search or by adjusting the concept sliders to increase the presence of concepts they suspected were being ignored. Beyond being passive recipients of ML results, doctors were empowered with the agency to actively test hypotheses and apply their expert domain knowledge, while simultaneously leveraging the benefits of automation.

With these interactive tools enabling users to tailor each search experience to their desired intent, we are excited for SMILY’s potential to assist with searching large databases of digitized pathology images. One potential application of this technology is to index textbooks of pathology images with descriptive captions, and enable medical students or pathologists in training to search these textbooks using visual search, speeding up the educational process. Another application is for cancer researchers interested in studying the correlation of tumor morphologies with patient outcomes, to accelerate the search for similar cases. Finally, pathologists may be able to leverage tools like SMILY to locate all occurrences of a feature (e.g. signs of active cell division, or mitosis) in the same patient’s tissue sample to better understand the severity of the disease to inform cancer therapy decisions. Importantly, our findings add to the body of evidence that sophisticated machine learning algorithms need to be paired with human-centered design and interactive tooling in order to be most useful.

This work would not have been possible without Jason D. Hipp, Yun Liu, Emily Reif, Daniel Smilkov, Michael Terry, Craig H. Mermel, Martin C. Stumpe and members of Google Health and PAIR. Preprints of the two papers are available here and here.

Continua a leggere

Pubblicato in Senza categoria

Parrotron: New Research into Improving Verbal Communication for People with Speech Impairments

Posted by Fadi Biadsy, Research Scientist and Ron Weiss, Software Engineer, Google Research

Most people take for granted that when they speak, they will be heard and understood. But for the millions who live with speech impairments caused by physical or neurological conditions, trying to communicate with others can be difficult and lead to frustration. While there have been a great number of recent advances in automatic speech recognition (ASR; a.k.a. speech-to-text) technologies, these interfaces can be inaccessible for those with speech impairments. Further, applications that rely on speech recognition as input for text-to-speech synthesis (TTS) can exhibit word substitution, deletion, and insertion errors. Critically, in today’s technological environment, limited access to speech interfaces, such as digital assistants that depend on directly understanding one’s speech, means being excluded from state-of-the-art tools and experiences, widening the gap between what those with and without speech impairments can access.

Project Euphonia has demonstrated that speech recognition models can be significantly improved to better transcribe a variety of atypical and dysarthric speech. Today, we are presenting Parrotron, an ongoing research project that continues and extends our effort to build speech technologies to help those with impaired or atypical speech to be understood by both people and devices. Parrotron consists of a single end-to-end deep neural network trained to convert speech from a speaker with atypical speech patterns directly into fluent synthesized speech, without an intermediate step of generating text—skipping speech recognition altogether. Parrotron’s approach is speech-centric, looking at the problem only from the point of view of speech signals—e.g., without visual cues such as lip movements. Through this work, we show that Parrotron can help people with a variety of atypical speech patterns—including those with ALS, deafness, and muscular dystrophy—to be better understood in both human-to-human interactions and by ASR engines.

The Parrotron Speech Conversion Model
Parrotron is an attention-based sequence-to-sequence model trained in two phases using parallel corpora of input/output speech pairs. First, we build a general speech-to-speech conversion model for standard fluent speech, followed by a personalization phase that adjusts the model parameters to the atypical speech patterns from the target speaker. The primary challenge in such a configuration lies in the collection of the parallel training data needed for supervised training, which consists of utterances spoken by many speakers and mapped to the same output speech content spoken by a single speaker. Since it is impractical to have a single speaker record the many hours of training data needed to build a high quality model, Parrotron uses parallel data automatically derived with a TTS system. This allows us to make use of a pre-existing anonymized, transcribed speech recognition corpus to obtain training targets.

The first training phase uses a corpus of ~30,000 hours that consists of millions of anonymized utterance pairs. Each pair includes a natural utterance paired with an automatically synthesized speech utterance that results from running our state-of-the-art Parallel WaveNet TTS system on the transcript of the first. This dataset includes utterances from thousands of speakers spanning hundreds of dialects/accents and acoustic conditions, allowing us to model a large variety of voices, linguistic and non-linguistic contents, accents, and noise conditions with “typical” speech all in the same language. The resulting conversion model projects away all non-linguistic information, including speaker characteristics, and retains only what is being said, not who, where, or how it is said. This base model is used to seed the second personalization phase of training.

The second training phase utilizes a corpus of utterance pairs generated in the same manner as the first dataset. In this case, however, the corpus is used to adapt the network to the acoustic/phonetic, phonotactic and language patterns specific to the input speaker, which might include, for example, learning how the target speaker alters, substitutes, and reduces or removes certain vowels or consonants. To model ALS speech characteristics in general, we use utterances taken from an ALS speech corpus derived from Project Euphonia. If instead we want to personalize the model for a particular speaker, then the utterances are contributed by that person. The larger this corpus is, the better the model is likely to be at correctly converting to fluent speech. Using this second smaller and personalized parallel corpus, we run the neural-training algorithm, updating the parameters of the pre-trained base model to generate the final personalized model.

We found that training the model with a multitask objective to predict the target phonemes while simultaneously generating spectrograms of the target speech led to significant quality improvements. Such a multitask trained encoder can be thought of as learning a latent representation of the input that maintains information about the underlying linguistic content.

Overview of the Parrotron model architecture. An input speech spectrogram is passed through encoder and decoder neural networks to generate an output spectrogram in a new voice.

Case Studies
To demonstrate a proof of concept, we worked with our fellow Google research scientist and mathematician Dimitri Kanevsky, who was born in Russia to Russian speaking, normal-hearing parents but has been profoundly deaf from a very young age. He learned to speak English as a teenager, by using Russian phonetic representations of English words, learning to pronounce English using transliteration into Russian (e.g., The quick brown fox jumps over the lazy dog => ЗИ КВИК БРАУН ДОГ ЖАМПС ОУВЕР ЛАЙЗИ ДОГ). As a result, Dimitri’s speech is substantially distinct from native English speakers, and can be challenging to comprehend for systems or listeners who are not accustomed to it.

Dimitri recorded a corpus of 15 hours of speech, which was used to adapt the base model to the nuances specific to his speech. The resulting Parrotron system helped him be better understood by both people and Google’s ASR system alike. Running Google’s ASR engine on the output of Parrotron significantly reduced the word error rate from 89% to 32%, on a held out test set from Dimitri. Below is an example of Parrotron’s successful conversion of input speech from Dimitri:

Dimitri saying, “How far is the Moon from the Earth?
Parrotron (male voice) saying, “How far are the Moon from the Earth?

We also worked with Aubrie Lee, a Googler and advocate for disability inclusion, who has muscular dystrophy, a condition that causes progressive muscle weakness, and sometimes impacts speech production. Aubrie contributed 1.5 hours of speech, which has been instrumental in showing promising outcomes of the applicability of this speech-to-speech technology. Below is an example of Parrotron’s successful conversion of input speech from Aubrie:

Aubrie saying, “Is morning glory a perennial plant?
Parrotron (female voice) saying, “Is morning glory a perennial plant?
Aubrie saying, “Schedule a meeting with John on Friday.
Parrotron (female voice) saying, “Schedule a meeting with John on Friday.

We also tested Parrotron’s performance on speech from speakers with ALS by adapting the pretrained model on multiple speakers who share similar speech characteristics grouped together, rather than on a single speaker. We conducted a preliminary listening study and observed an increase in intelligibility when comparing natural ALS speech to the corresponding speech obtained from running the Parroton model, for the majority of our test speakers.

Cascaded Approach
Project Euphonia has built a personalized speech-to-text model that has reduced the word error rate for a deaf speaker from 89% to 25%, and ongoing research is also likely to improve upon these results. One could use such a speech-to-text model to achieve a similar goal as Parrotron by simply passing its output into a TTS system to synthesize speech from the result. In such a cascaded approach, however, the recognizer may choose an incorrect word (roughly 1 out 4 times, in this case)—i.e., it may yield words/sentences with unintended meaning and, as a result, the synthesized audio of these words would be far from the speaker’s intention. Given the end-to-end speech-to-speech training objective function of Parrotron, even when errors are made, the generated output speech is likely to sound acoustically similar to the input speech, and thus the speaker’s original intention is less likely to be significantly altered and it is often still possible to understand what is intended:

Dimitri saying, “What is definition of rhythm?
Parrotron (male voice) saying, “What is definition of rhythm?
Dimitri saying, “How many ounces in one liter?
Parrotron (male voice) saying, “Hey Google, How many unces [sic] in one liter?
Google Assistant saying, “One liter is equal to thirty-three point eight one four US fluid ounces.
Aubrie saying, “Is it wheelchair accessible?
Parrotron (female voice) saying, “Is it wheelchair accecable [sic]?

Furthermore, since Parrotron is not strongly biased to producing words from a predefined vocabulary set, input to the model may contain completely new invented words, foreign words/names, and even nonsense words. We observe that feeding Arabic and Spanish utterances into the US-English Parrotron model often results in output which echoes the original speech content with an American accent, in the target voice. Such behavior is qualitatively different from what one would obtain by simply running an ASR followed by a TTS. Finally, by going from a combination of independently tuned neural networks to a single one, we also believe there are improvements and simplifications that could be substantial.

Parrotron makes it easier for users with atypical speech to talk to and be understood by other people and by speech interfaces, with its end-to-end speech conversion approach more likely to reproduce the user’s intended speech. More exciting applications of Parrotron are discussed in our paper and additional audio samples can be found on our github repository. If you would like to participate in this ongoing research, please fill out this short form and volunteer to record a set of phrases. We look forward to working with you!

This project was joint work between the Speech and Google Brain teams. Contributors include Fadi Biadsy, Ron Weiss, Pedro Moreno, Dimitri Kanevsky, Ye Jia, Suzan Schwartz, Landis Baker, Zelin Wu, Johan Schalkwyk, Yonghui Wu, Zhifeng Chen, Patrick Nguyen, Aubrie Lee, Andrew Rosenberg, Bhuvana Ramabhadran, Jason Pelecanos, Julie Cattiau, Michael Brenner, Dotan Emanuel, Joel Shor, Sean Lee and Benjamin Schroeder. Our data collection efforts have been vastly accelerated by our collaborations with ALS-TDI.

Continua a leggere

Pubblicato in Senza categoria