Google’s Next Generation Music Recognition

Posted by James Lyon, Google AI, Zürich

In 2017 we launched Now Playing on the Pixel 2, using deep neural networks to bring low-power, always-on music recognition to mobile devices. In developing Now Playing, our goal was to create a small, efficient music recognizer which requires a very small fingerprint for each track in the database, allowing music recognition to be run entirely on-device without an internet connection. As it turns out, Now Playing was not only useful for an on-device music recognizer, but also greatly exceeded the accuracy and efficiency of our then-current server-side system, Sound Search, which was built before the widespread use of deep neural networks. Naturally, we wondered if we could bring the same technology that powers Now Playing to the server-side Sound Search, with the goal of making Google’s music recognition capabilities the best in the world.

Recently, we introduced a new version of Sound Search that is powered by some of the same technology used by Now Playing. You can use it through the Google Search app or the Google Assistant on any Android phone. Just start a voice query, and if there’s music playing near you, a “What’s this song?” suggestion will pop up for you to press. Otherwise, you can just ask, “Hey Google, what’s this song?” With this latest version of Sound Search, you’ll get faster, more accurate results than ever before!

Now Playing versus Sound Search
Now Playing miniaturized music recognition technology such that it was small and efficient enough to be run continuously on a mobile device without noticeable battery impact. To do this we developed an entirely new system using convolutional neural networks to turn a few seconds of audio into a unique “fingerprint.” This fingerprint is then compared against an on-device database holding tens of thousands of songs, which is regularly updated to add newly released tracks and remove those that are no longer popular. In contrast, the server-side Sound Search system is very different, having to match against ~1000x as many songs as Now Playing. Making Sound Search both faster and more accurate with a substantially larger musical library presented several unique challenges. But before we go into that, a few details on how Now Playing works.

The Core Matching Process of Now Playing
Now Playing generates the musical “fingerprint” by projecting the musical features of an eight-second portion of audio into a sequence of low-dimensional embedding spaces consisting of seven two-second clips at 1 second intervals, giving a segmentation like this:

Now Playing then searches the on-device song database, which was generated by processing popular music with the same neural network, for similar embedding sequences. The database search uses a two phase algorithm to identify matching songs, where the first phase uses a fast but inaccurate algorithm which searches the whole song database to find a few likely candidates, and the second phase does a detailed analysis of each candidate to work out which song, if any, is the right one.

  • Matching, phase 1: Finding good candidates: For every embedding, Now Playing performs a nearest neighbor search on the on-device database of songs for similar embeddings. The database uses a hybrid of spatial partitioning and vector quantization to efficiently search through millions of embedding vectors. Because the audio buffer is noisy, this search is approximate, and not every embedding will find a nearby match in the database for the correct song. However, over the whole clip, the chances of finding several nearby embeddings for the correct song are very high, so the search is narrowed to a small set of songs which got multiple hits.
  • Matching, phase 2: Final matching: Because the database search used above is approximate, Now Playing may not find song embeddings which are nearby to some embeddings in our query. Therefore, in order to calculate an accurate similarity score, Now Playing retrieves all embeddings for each song in the database which might be relevant to fill in the “gaps”. Then, given the sequence of embeddings from the audio buffer and another sequence of embeddings from a song in the on-device database, Now Playing estimates their similarity pairwise and adds up the estimates to get the final matching score.

It’s critical to the accuracy of Now Playing to use a sequence of embeddings rather than a single embedding. The fingerprinting neural network is not accurate enough to allow identification of a song from a single embedding alone — each embedding will generate a lot of false positive results. However, combining the results from multiple embeddings allows the false positives to be easily removed, as the correct song will be a match to every embedding, while false positive matches will only be close to one or two embeddings from the input audio.

Scaling up Now Playing for the Sound Search server
So far, we’ve gone into some detail of how Now Playing matches songs to an on-device database. The biggest challenge in going from Now Playing, with tens of thousands of songs, to Sound Search, with tens of millions, is that there are a thousand times as many songs which could give a false positive result. To compensate for this without any other changes, we would have to increase the recognition threshold, which would mean needing more audio to get a confirmed match. However, the goal of the new Sound Search server was to be able to match faster, not slower, than Now Playing, so we didn’t want people to wait 10+ seconds for a result.

As Sound Search is a server-side system, it isn’t limited by processing and storage constraints in the same way Now Playing is. Therefore, we made two major changes to how we do fingerprinting, both of which increased accuracy at the expense of server resources:

  • We quadrupled the size of the neural network used, and increased each embedding from 96 to 128 dimensions, which reduces the amount of work the neural network has to do to pack the high-dimensional input audio into a low-dimensional embedding. This is critical in improving the quality of phase two, which is very dependent on the accuracy of the raw neural network output.
  • We doubled the density of our embeddings — it turns out that fingerprinting audio every 0.5s instead of every 1s doesn’t reduce the quality of the individual embeddings very much, and gives us a huge boost by doubling the number of embeddings we can use for the match.

We also decided to weight our index based on song popularity – in effect, for popular songs, we lower the matching threshold, and we raise it for obscure songs. Overall, this means that we can keep adding more (obscure) songs almost indefinitely to our database without slowing our recognition speed too much.

With Now Playing, we originally set out to use machine learning to create a robust audio fingerprint compact enough to run entirely on a phone. It turned out that we had, in fact, created a very good all-round audio fingerprinting system, and the ideas developed there carried over very well to the server-side Sound Search system, even though the challenges of Sound Search are quite different.

We still think there’s room for improvement though — we don’t always match when music is very quiet or in very noisy environments, and we believe we can make the system even faster. We are continuing to work on these challenges with the goal of providing the next generation in music recognition. We hope you’ll try it the next time you want to find out what song is playing! You can put a shortcut on your home screen like this:

We would like to thank Micha Riser, Mihajlo Velimirovic, Marvin Ritter, Ruiqi Guo, Sanjiv Kumar, Stephen Wu, Diego Melendo Casado‎, Katia Naliuka, Jason Sanders, Beat Gfeller, Julian Odell, Christian Frank, Dominik Roblek, Matt Sharifi and Blaise Aguera y Arcas‎.

Continua a leggere

Pubblicato in Senza categoria

Introducing the Unrestricted Adversarial Examples Challenge

Posted by Tom B. Brown and Catherine Olsson, Research Engineers, Google Brain Team

Machine learning is being deployed in more and more real-world applications, including medicine, chemistry and agriculture. When it comes to deploying machine learning in safety-critical contexts, significant challenges remain. In particular, all known machine learning algorithms are vulnerable to adversarial examples — inputs that an attacker has intentionally designed to cause the model to make a mistake. While previous research on adversarial examples has mostly focused on investigating mistakes caused by small modifications in order to develop improved models, real-world adversarial agents are often not subject to the “small modification” constraint. Furthermore, machine learning algorithms can often make confident errors when faced with an adversary, which makes the development of classifiers that don’t make any confident mistakes, even in the presence of an adversary which can submit arbitrary inputs to try to fool the system, an important open problem.

Today we’re announcing the Unrestricted Adversarial Examples Challenge, a community-based challenge to incentivize and measure progress towards the goal of zero confident classification errors in machine learning models. While previous research has focused on adversarial examples that are restricted to small changes to pre-labeled data points (allowing researchers to assume the image should have the same label after a small perturbation), this challenge allows unrestricted inputs, allowing participants to submit arbitrary images from the target classes to develop and test models on a wider variety of adversarial examples.

Adversarial examples can be generated through a variety of means, including by making small modifications to the input pixels, but also using spatial transformations, or simple guess-and-check to find misclassified inputs.

Structure of the Challenge
Participants can submit entries one of two roles: as a defender, by submitting a classifier which has been designed to be difficult to fool, or as an attacker, by submitting arbitrary inputs to try to fool the defenders’ models. In a “warm-up” period before the challenge, we will present a set of fixed attacks for participants to design networks to defend against. After the community can conclusively beat those fixed attacks, we will launch the full two-sided challenge with prizes for both attacks and defenses.

For the purposes of this challenge, we have created a simple “bird-or-bicycle” classification task, where a classifier must answer the following: “Is this an unambiguous picture of a bird, a bicycle, or is it ambiguous / not obvious?” We selected this task because telling birds and bicycles apart is very easy for humans, but all known machine learning techniques struggle at the task when in the presence of an adversary.

The defender’s goal is to correctly label a clean test set of birds and bicycles with high accuracy, while also making no confident errors on any attacker-provided bird or bicycle image. The attacker’s goal is to find an image of a bird that the defending classifier confidently labels as a bicycle (or vice versa). We want to make the challenge as easy as possible for the defenders, so we discard all images that are ambiguous (such as a bird riding a bicycle) or not obvious (such as an aerial view of a park, or random noise).

Examples of ambiguous and unambiguous images. Defenders must make no confident mistakes on unambiguous bird or bicycle images. We discard all images that humans find ambiguous or not obvious. All images under CC licenses 1, 2, 3, 4.

Attackers may submit absolutely any image of a bird or a bicycle in an attempt to fool the defending classifier. For example, an attacker could take photographs of birds, use 3D rendering software, make image composites using image editing software, produce novel bird images with a generative model, or any other technique.

In order to validate new attacker-provided images, we ask an ensemble of humans to label the image. This procedure lets us allow attackers to submit arbitrary images, not just test set images modified in small ways. If the defending classifier confidently classifies as “bird” any attacker-provided image which the human labelers unanimously labeled as a bicycle, the defending model has been broken. You can learn more details about the structure of the challenge in our paper.

How to Participate
If you’re interested in participating, guidelines for getting started can be found on the project on github. We’ve already released our dataset, the evaluation pipeline, and baseline attacks for the warm-up, and we’ll be keeping an up-to-date leaderboard with the best defenses from the community. We look forward to your entries!

The team behind the Unrestricted Adversarial Examples Challenge includes Tom Brown, Catherine Olsson, Nicholas Carlini, Chiyuan Zhang, and Ian Goodfellow from Google, and Paul Christiano from OpenAI.

Continua a leggere

Pubblicato in Senza categoria

Check out HH Model 1/6th scale Imperial Roman Army 12-inch action figure preview pictures

The Roman army (Latin: exercitus Romanus) was the terrestrial armed forces deployed by the Romans throughout the duration of Ancient Rome, from the Roman Kingdom (to c. 500 BC) to the Roman Republic (500–31 BC) and the Roman Empire (31 BC – 395), and its medieval continuation the Eastern Roman Empire.

HH Model 1/6th scale Imperial Roman Army 12-inch figure features: 1/6th scale head sculpt, 12-inch figure body, six (6) Pieces of Interchangeable Palms, Red jacket, Brown pants, Red cloak, Red scarf, Brown wristband, Sundries bag, Marching backpack, blanket, belt, Shoes, helmet (metal), armor (metal), pauldrons (metal), Arm guard (metal), Long sword (metal), dagger (metal), Shield, Javelin (metal), kettle (metal), pot (metal), ladle (metal), ax (metal), figure display stand.

Scroll down to see all the pictures.
Click on them for bigger and better views.

Related posts:
Review of ACI Toys “Total Rome!” 1/6th scale Roman Legionary Optio 12-inch action figure posted on my toy blog HERE and HERE and HERE
ACI Toys Warriors III 1/6th scale Roman General Maximus Decimus Meridius 12-inch Figure reviewed HERE

Continua a leggere

Pubblicato in Senza categoria

Recycling as a hot topic at the Aluminium – World Trade Fair and Conference in Dusseldorf

9 years ago I wrote the first post about The good and the bad about recycled aluminium. Since then recycled aluminium has become a hot topic and this year´s Aluminium – World Trade Fair in Dusseldorf – Aluminium 2018 has dedicated a lot of energy into this subject.

One of the articles in their news room is about recycled aluminium,


A short informative note about challenges, mentioning that there are no qualitative differences between aluminium alloys made from the primary and those made from recycled aluminium.

Is this always true – not really, especially when looking at the result obtained when anodizing aluminium.

Here you have to be aware of the following:

  • Heavy metals
  • Metallurgical structure
  • Traceability
  • Repeatability/Consistency
  • Consistent recycled stock

So it should not be difficult to use more recycled aluminum for anodizing. It only requires a minimum amount of adjustment to arrive at the present alloy composition, which works well for anodizing. So this should not be the reason for not using recycled aluminum when anodizing.

From an environmental viewpoint, anodizing is a very unique process. It does not require the use of organic solvents, which may cause unwanted atmospheric emissions and the amount of sludge can be diminished by using new processes as the acid etch.

Finally anodized extrusions and castings can be readily recycled without the need for special emission control equipment, so no VOC or other hazardous chemicals are emitted to the air.

If you find this article useful and you would like to know more please contact me [email protected] 

Continua a leggere

Pubblicato in Senza categoria

Soldier Story 1/6th scale U.S. Army 28th Infantry Division Ardennes 1944 12-inch action figure

The Battle of the Bulge (16 December 1944 – 25 January 1945) was the last major German offensive campaign on the Western Front during World War II. It was launched through the densely forested Ardennes region of Wallonia in eastern Belgium, northeast France, and Luxembourg, towards the end of World War II.

The surprise attack caught the Allied forces completely off guard. American forces bore the brunt of the attack and incurred their highest casualties of any operation during the war. The battle also severely depleted Germany’s armored forces, and they were largely unable to replace them. Between 63,222 and 98,000 German soldiers were killed, missing, wounded in action, or captured. For the Americans, out of a peak of 610,000 troops, 89,000 became casualties out of which some 19,000 were killed. The “Bulge” was the largest and bloodiest single battle fought by the United States in World War II and the second bloodiest battle in American history.

Soldier Story SS-111 1/6th scale U.S. Army 28th Infantry Division Ardennes 1944 12-inch figure features: WWII US Army soldier life-like head sculpt, S2.5 BODY, Weapon bare Hand (1 Pair), Bendable bare Hand (1 Pair), Bare Feet (1 Pair). HEADGEAR – M-1 combat helmet (metal), M-1 combat helmet liner, Snow white helmet cover, Snow white face mask. UNIFORM – GI wool shirt, Brown T-shirt, M-41 field jacket (with 28th Infantry div. Patch), GI wool pants, Snow suit smock, Snow white mittens (1 pair), Wool mitten liners (1 pair), Wool scarf, Snow pants, M1944 high neck wool sweater (with 28th Infantry div. Patch), GI combat boots (sewing leather)

Scroll down to see the rest of the pictures.
Click on them for bigger and better views.

FIELD GEAR – M-1936 suspenders, M-1936 pistol belt, M-1910 canteen cover, M-1910 canteen (metal), M-1910 canteen cup (metal), M-1942 first aid pouch, M-43 entrenching tool cover, M-43 tool entrenching tool shovel (metal), M-1943 wire cutter pouch, M-1938 wire cutter, M-1923 .45 magazine pouch, TL-122 flashlight torch, M-2 grenade

WEAPON – M1919A6 Browning .30 machine gun (metal), .30 M1A1 ammo box (metal), CNC metal bullet / metal bullet link, M1911 .45 pistol, M1911 7rd magazine x 3, M-1916 pistol holster

Accessories: YANK Army weekly magazine, Hershey’s chocolate, Camel cigarette pack, Cigarette x 3, Ardennes 1944 exclusive figure stand

Continua a leggere

Pubblicato in Senza categoria

The What-If Tool: Code-Free Probing of Machine Learning Models

Posted by James Wexler, Software Engineer, Google AI

Building effective machine learning (ML) systems means asking a lot of questions. It’s not enough to train a model and walk away. Instead, good practitioners act as detectives, probing to understand their model better: How would changes to a datapoint affect my model’s prediction? Does it perform differently for various groups–for example, historically marginalized people? How diverse is the dataset I am testing my model on?

Answering these kinds of questions isn’t easy. Probing “what if” scenarios often means writing custom, one-off code to analyze a specific model. Not only is this process inefficient, it makes it hard for non-programmers to participate in the process of shaping and improving ML models. One focus of the Google AI PAIR initiative is making it easier for a broad set of people to examine, evaluate, and debug ML systems.

Today, we are launching the What-If Tool, a new feature of the open-source TensorBoard web application, which let users analyze an ML model without writing code. Given pointers to a TensorFlow model and a dataset, the What-If Tool offers an interactive visual interface for exploring model results.

The What-If Tool, showing a set of 250 face pictures and their results from a model that detects smiles.

The What-If Tool has a large set of features, including visualizing your dataset automatically using Facets, the ability to manually edit examples from your dataset and see the effect of those changes, and automatic generation of partial dependence plots which show how the model’s predictions change as any single feature is changed. Let’s explore two features in more detail.

Exploring what-if scenarios on a datapoint.

With a click of a button you can compare a datapoint to the most similar point where your model predicts a different result. We call such points “counterfactuals,” and they can shed light on the decision boundaries of your model. Or, you can edit a datapoint by hand and explore how the model’s prediction changes. In the screenshot below, the tool is being used on a binary classification model that predicts whether a person earns more than $50k based on public census data from the UCI census dataset. This is a benchmark prediction task used by ML researchers, especially when analyzing algorithmic fairness — a topic we’ll get to soon. In this case, for the selected datapoint, the model predicted with 73% confidence that the person earns more than $50k. The tool has automatically located the most-similar person in the dataset for which the model predicted earnings of less than $50k and compares the two side-by-side. In this case, with just a minor difference in age and an occupation change, the model’s prediction has flipped.

Comparing counterfactuals.

Analysis of Performance and Algorithmic Fairness
You can also explore the effects of different classification thresholds, taking into account constraints such as different numerical fairness criteria. The below screenshot shows the results of a smile detector model, trained on the open-source CelebA dataset which consists of annotated face images of celebrities. Below, the faces in the dataset are divided by whether they have brown hair, and for each of the two groups there is an ROC curve and confusion matrix of the predictions, along with sliders for setting how confident the model must be before determining that a face is smiling. In this case, the confidence thresholds for the two groups were set automatically by the tool to optimize for equal opportunity.

Comparing the performance of two slices of data on a smile detection model, with their classification thresholds set to satisfy the “equal opportunity” constraint.

To illustrate the capabilities of the What-If Tool, we’ve released a set of demos using pre-trained models:

  • Detecting misclassifications: A multiclass classification model, which predicts plant type from four measurements of a flower from the plant. The tool is helpful in showing the decision boundary of the model and what causes misclassifications. This model is trained with the UCI iris dataset.
  • Assessing fairness in binary classification models: The image classification model for smile detection mentioned above. The tool is helpful in assessing algorithmic fairness across different subgroups. The model was purposefully trained without providing any examples from a specific subset of the population, in order to show how the tool can help uncover such biases in models. Assessing fairness requires careful consideration of the overall context — but this is a useful quantitative starting point.
  • Investigating model performance across different subgroups: A regression model that predicts a subject’s age from census information. The tool is helpful in showing relative performance of the model across subgroups and how the different features individually affect the prediction. This model is trained with the UCI census dataset.

What-If in Practice
We tested the What-If Tool with teams inside Google and saw the immediate value of such a tool. One team quickly found that their model was incorrectly ignoring an entire feature of their dataset, leading them to fix a previously-undiscovered code bug. Another team used it to visually organize their examples from best to worst performance, leading them to discover patterns about the types of examples their model was underperforming on.

We look forward to people inside and outside of Google using this tool to better understand ML models and to begin assessing fairness. And as the code is open-source, we welcome contributions to the tool.

The What-If Tool was a collaborative effort, with UX design by Mahima Pushkarna, Facets updates by Jimbo Wilson, and input from many others. We would like to thank the Google teams that piloted the tool and provided valuable feedback and the TensorBoard team for all their help.

Continua a leggere

Pubblicato in Senza categoria

Check out VTS Toys VM-024 1/6th scale BloodHunter 12-inch action figure preview pics

“Fire is fought with fire, blood is fought with blood.”

Bloodhunters are Rangers who have taken up the Nightly Hunt; a vow to eradicate all traces of lycanthropy where they can. To better hunt the beasts, they have taken lycanthropic blood into their bodies. Many sinister or fearful lycanthropes will hide in the cities of the world to blend in with civilization. Bloodhunters have adapted somewhat to seeking them there as well as in the wilderness.

The monstrous origin and characteristically grim presence of Bloodhunters has society avoid them. However, most leaders and royalty will still employ them (often in secret) to rout out one of the most violent curses the land has seen.

VTS Toys VM-024 1/6th BloodHunter 12-inch action figure Parts list: highly Detailed Head Sculpt, action Body, four (4) Pieces Of Interchangeable Gloved Hands Including: Pair Of Hands For Weapon, Pair Of Hands For Baseball Bat. costume: Custom-Made Imitation Leather Shawl (The Leather Won’t Rot), White Shirt, overcoat, customized Imitation Leather Shoulder (The Leather Won’t Rot), Leather Chest, trousers, belt, side Leg Hanger, blue Mask, scarf. weapons: hunter Pistol, hunter Blunder, busssaw Cleaver. accessories: hunter Hat, arm Wristbands, Kneecap, Leg Guard, Boots, Pocket Watch, figure Stand

Scroll down to see all the pictures.
Click on them for bigger and better views.

Continua a leggere

Pubblicato in Senza categoria

GlassBuild Week


Before I start, my thoughts are prayers are out to everyone on east coast with the impending Hurricane.  Very scary times and here’s hoping for the best.

Ok… we are finally here- GlassBuild America.  One year ago we gathered in Atlanta and the hopes and expectations were very high but unfortunately a hurricane timed itself just “right” and changed plans for so many who planned to attend. Now this year, we should see a very strong and excited crowd with the combo of people who missed the event last year and an overall positive energy surrounding the industry right now.

On the floor and in the surroundings here at the convention center there is ton to see and do.  It is exciting to see that not only can you walk the aisles and see the best exhibitors in the world but you have education all over the place including express learning and action demos.  Plus many exhibitors are planning in booth experiences, folks like IGE have a ton going on with their machinery and various demos and Diamon Fusion has gone as far as having a specific meeting room set aside for a presentation on 9/13 upstairs in the hall.  In addition there’s numerous companies having sales meetings, lunches, hospitality events etc. in combination with the show.  It truly is ground zero for everything happening in our world right now and its quite exciting.

Initial impressions on a floor in progress…  I am always amazed how this show gets built up from nothing.  So many people work so hard to make it shine when the doors open.  Plus it’s very hot out here, so this is not the most fun working conditions.  Walking around its good to see the equipment side of the hall on display.  You name the equipment player and they are here.  It’s incredible. And this show is not the big equipment one- usually the Atlanta version is.  No matter what your role in this industry is, there is equipment here to support and advance your efforts!

I am also excited for Fall Conference to be integrated in to GlassBuild.  I know for some of those folks it makes for a longer week but I think this format is worth a shot and quite frankly there are advantages to getting things all done in one fell swoop vs. having individual events.

Also I must give props to the many exhibitors who really brought the best out of their social media game.  This year by far has been the best with exhibitors utilizing the social medium fully to promote what they have going on.  (Note- If you are coming to the showI will be presenting on social media twice at the Express Learning area)   I have to give big credit to the team at FlexScreen.  They had an amazing series using video snippets and their entire sales team helped push it.  Very well done!!
As I always do after a show on my next post I’ll be noting whom I was lucky enough to visit with and also some of my own personal takeaways from the show.   Note if you are coming to the show, I won’t be in my traffic director vest this year- its been retired for now, but I still hope you’ll look for me and stop me to say hi!


My son is moving to Florida for college.  He hates these sort of animals.  Better keep this from him.
Absolutely next level thinking here.
You know just when you think there’s no good left in the world a simple story like this hits.  Nice work young man!

The latest remake out there is “A Star is Born” and its getting a ton of hype.  Will it live up to it?  I’m not so sure- Trailer is below.  Meanwhile I am hoping that the new Jennifer Garner movie “Peppermint” lasts long enough in theaters for me to see it post GlassBuild.  Love Jen in a serious revenge role!

Continua a leggere

Pubblicato in Senza categoria