Understanding Bias in Peer Review

Posted by Andrew Tomkins, Director of Engineering and William D. Heavlin, Statistician, Google Research

In the 1600’s, a series of practices came into being known collectively as the “scientific method.” These practices encoded verifiable experimentation as a path to establishing scientific fact. Scientific literature arose as a mechanism to validate and disseminate findings, and standards of scientific peer review developed as a means to control the quality of entrants into this literature. Over the course of development of peer review, one key structural question remains unresolved to the current day: should the reviewers of a piece of scientific work be made aware of the identify of the authors? Those in favor argue that such additional knowledge may allow the reviewer to set the work in perspective and evaluate it more completely. Those opposed argue instead that the reviewer may form an opinion based on past performance rather than the merit of the work at hand.

Existing academic literature on this subject describes specific forms of bias that may arise when reviewers are aware of the authors. In 1968, Merton proposed the Matthew effect, whereby credit goes to the best established researchers. More recently, Knobloch-Westerwick et al. proposed a Matilda effect, whereby papers from male-first authors were considered to have greater scientific merit that those from female-first authors. But with the exception of one classical study performed by Rebecca Blank in 1991 at the American Economic Review, there have been few controlled experimental studies of such effects on reviews of academic papers.

Last year we had the opportunity to explore this question experimentally, resulting in “Reviewer bias in single- versus double-blind peer review,” a paper that just appeared in the Proceedings of the National Academy of Sciences. Working with Professor Min Zhang of Tsinghua University, we performed an experiment during the peer review process of the 10th ACM Web Search and Data Mining Conference (WSDM 2017) to compare the behavior of reviewers under single-blind and double-blind review. Our experiment ran as follows:

  1. We invited a number of experts to join the conference Program Committee (PC).
  2. We randomly split these PC members into a single-blind cadre and a double-blind cadre.
  3. We asked all PC members to “bid” for papers they were qualified to review, but only the single-blind cadre had access to the names and institutions of the paper authors.
  4. Based on the resulting bids, we then allocated two single-blind and two double-blind PC members to each paper.
  5. Each PC member read his or her assigned papers and entered reviews, again with only single-blind PC members able to see the authors and institutions.

At this point, we closed our experiment and performed the remainder of the conference reviewing process under the single-blind model. As a result, we were able to assess the difference in bidding and reviewing behavior of single-blind and double-blind PC members on the same papers. We discovered a number of surprises.

Our first finding shows that compared to their double-blind counterparts, single-blind PC members tend to enter higher scores for papers from top institutions (the finding holds for both universities and companies) and for papers written by well-known authors. This suggests that a paper authored by an up-and-coming researcher might be reviewed more negatively (by a single-blind PC member) than exactly the same paper written by an established star of the field.

Digging a little deeper, we show some additional findings related to the “bidding process,” in which PC members indicate which papers they would like to review. We found that single-blind PC members (a) bid for about 22% fewer papers than their double-blind counterparts, and (b) bid preferentially for papers from top schools and companies. Finding (a) is especially intriguing; with no author information reviewers have less information, arguably making the job of weighing the merit of each paper more difficult. Yet, the double-blind reviewers bid for more work, not less, than their single-blind counterparts. This suggests that double-blind reviewers become more engaged in the review process. Finding (b) is less surprising, but nonetheless enlightening: In the presence of author names and institution, this information is incorporated into the reviewers’ bids. All else being equal, the odds that single-blind reviewers bid on papers from top institutions is about 15 percent above parity.

We also studied whether the actual or perceived gender of authors influenced the behavior of single-blind versus double-blind reviewers. Here the results are a little more nuanced. Compared to double-blind reviewers, we saw about a 22% decrease in the odds that a single-blind reviewer would give a female-authored paper a favorable review, but due to the smaller count of female-authored papers this result was not statistically significant. In an extended version of our paper, we consider our study as well as a range of other studies in the literature and perform a “meta-analysis” of all these results. From this larger pool of observations, the combined results do show a significant finding for the gender effect.

To conclude, we see that the practice of double-blind reviewing yields a denser landscape of bids, which may result in a better allocation of papers to qualified reviewers. We also see that reviewers who see author and institution information tend to bid more for papers from top institutions, and are more likely to vote to accept papers from top institutions or famous authors than their double-blind counterparts. This offers some evidence to suggest that a particular piece of work might be accepted under single-blind review if the authors are famous or come from top institutions, but rejected otherwise. Of course, the situation remains complex: double-blind review imposes an administrative burden on conference organizers, reduces the opportunity to detect several varieties of conflict of interest, and may in some cases be difficult to implement due to the existence of pre-prints or long-running research agendas that are well-known to experts in the field. Nonetheless, we recommend that journal editors and conference chairs carefully consider the merits of double-blind review.

Please take a look at our full paper for more details of our study.

Continua a leggere

Pubblicato in Senza categoria

Ace Toys 1/6th scale US (Mobile Strike Force Command) MIKE Force “Baron” 12-inch figure

The Mobile Strike Force Command, or MIKE Force, was a key component of United States Army Special Forces in the Vietnam War. They served with indigenous soldiers selected and trained through the largely minority Civilian Irregular Defense Group (CIDG) and were led by American SF and Australian Army Training Team Personnel AATTV. MIKE Force was a force multiplier, operating what is today called a Foreign Internal Defense mission.

MIKE Force was active under MACV, Army Special Forces, from 1964 to 1970 and under ARVN until 1974. MIKE Force waged special warfare against the Viet Minh, NLF (Viet Cong), and PAVN (North Vietnamese Army) liberation forces in various detachments, volunteering in support of MIKE Force missions. MIKE Force’s mission was to act as a country-wide quick reaction force for securing, reinforcing, and recapturing CIDG A Camps, as well as to conduct special reconnaissance patrols. Search and rescue and search and destroy missions were also assigned. The conventional unit alternative to Special Forces detachments like MIKE was Tiger Force, which was primarily tasked with counter-guerrilla warfare against enemies from behind their lines that emphasized body-count rather than force multiplication.

ace new product (#13032) MIKE Force “Baron” 12-inch figure features: Head sculpt, Man Mk I body, Painted bare hands, Beret, Advisor’s type sparse pattern – ADS (Golden Tiger) boonie hat (short brim), Advisor’s type sparse pattern – ADS (Golden Tiger) combat jacket, Advisor’s type sparse pattern – ADS (Golden Tiger) combat trouser, MIKE force scarf, OD green tee, GI trouser belt, Patches (MIKE force & name tab), US spike protective jungle boots 3rd pattern DMS spike sole, Suspenders for BAR belt, M1937 BAR belt, M1956 compass pouches x 2, M1942 carlisle bandage pouch x 2, Jungle first aid pouch, 1 quart canteens x 2, M1956 1 quart canteen covers (modified) x 2, M1961 combat field pack “butt pack”, CIDG rucksack, M16 A1 rifle “three prong flash hider vision”, Rifle sling, 20-rd magazines x 12 + 1 pc, Ka-bar combat knife w/ sheath, White phosphorus smoke grenade, M18 smoke grenade violet, INCEN TH grenade, M26 grenades x 2, MX-991/U flashlight, USGI carabineer, GI watch, Dog Tags

Scroll down to see all the pictures.
Click on them for bigger and better views.

The head sculpt seems to resemble a young Sean Penn who starred as Sergeant Tony Meserve opposite Michael J. Fox as Private First Class Max Eriksson in the 1989 American war drama film “Casualties of War”, based on the actual events of the incident on Hill 192 in 1966 during the Vietnam War. An article written by Daniel Lang for The New Yorker in 1969, and a subsequent book were the movie’s primary sources.

Related posts:
Vietnam War, 1959 to 1975 posted on my toy blog HERE
Action Figure Review of Soldier Story 1/6th scale US MACV-SOG 12-inch military figure posted HERE and HERE
Toy Soldier’s 7th Anniversary figure – a USMC Force Recon Rifleman/Corpsman in Vietnam 1970 reviewed HERE

Continua a leggere

Pubblicato in Senza categoria

Interpreting Deep Neural Networks with SVCCA

Posted by Maithra Raghu, Google Brain Team

Deep Neural Networks (DNNs) have driven unprecedented advances in areas such as vision, language understanding and speech recognition. But these successes also bring new challenges. In particular, contrary to many previous machine learning methods, DNNs can be susceptible to adversarial examples in classification, catastrophic forgetting of tasks in reinforcement learning, and mode collapse in generative modelling. In order to build better and more robust DNN-based systems, it is critically important to be able to interpret these models. In particular, we would like a notion of representational similarity for DNNs: can we effectively determine when the representations learned by two neural networks are same?

In our paper, “SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability,” we introduce a simple and scalable method to address these points. Two specific applications of this that we look at are comparing the representations learned by different networks, and interpreting representations learned by hidden layers in DNNs. Furthermore, we are open sourcing the code so that the research community can experiment with this method.

Key to our setup is the interpretation of each neuron in a DNN as an activation vector. As shown in the figure below, the activation vector of a neuron is the scalar output it produces on the input data. For example, for 50 input images, a neuron in a DNN will output 50 scalar values, encoding how much it responds to each input. These 50 scalar values then make up an activation vector for the neuron. (Of course, in practice, we take many more than 50 inputs.)

Here a DNN is given three inputs, x1, x2, x3. Looking at a neuron inside the DNN (bolded in red, right pane), this neuron produces a scalar output zi corresponding to each input xi. These values form the activation vector of the neuron.

With this basic observation and a little more formulation, we introduce Singular Vector Canonical Correlation Analysis (SVCCA), a technique for taking in two sets of neurons and outputting aligned feature maps learned by both of them. Critically, this technique accounts for superficial differences such as permutations in neuron orderings (crucial for comparing different networks), and can detect similarities where other, more straightforward comparisons fail.

As an example, consider training two convolutional neural nets (net1 and net2, below) on CIFAR-10, a medium scale image classification task. To visualize the results of our method, we compare activation vectors of neurons with the aligned features output by SVCCA. Recall that the activation vector of a neuron is the raw scalar outputs on input images. The x-axis of the plot consists of images sorted by class (gray dotted lines showing class boundaries), and the y axis the output value of the neuron.

On the left pane, we show the two highest activation (largest euclidean norm) neurons in net1 and net2. Examining highest activations neurons has been a popular method to interpret DNNs in computer vision, but in this case, the highest activation neurons in net1 and net2 have no clear correspondence, despite both being trained on the same task. However, after applying SVCCA, (right pane), we see that the latent representations learned by both networks do indeed share some very similar features. Note that the top two rows representing aligned feature maps are close to identical, as are the second highest aligned feature maps (bottom two rows). Furthermore, these aligned mappings in the right pane also show a clear correspondence with the class boundaries, e.g. we see the top pair give negative outputs for Class 8, with the bottom pair giving a positive output for Class 2 and Class 7.

While you can apply SVCCA across networks, one can also do this for the same network, across time, enabling the study of how different layers in a network converge to their final representations. Below, we show panes that compare the representation of layers in net1 during training (y-axes) with the layers at the end of training (x-axes). For example, in the top left pane (titled “0% trained”), the x-axis shows layers of increasing depth of net1 at 100% trained, and the y axis shows layers of increasing depth at 0% trained. Each (i,j) square then tells us how similar the representation of layer i at 100% trained is to layer j at 0% trained. The input layer is at the bottom left, and is (as expected) identical at 0% to 100%. We make this comparison at several points through training, at 0%, 35%, 75% and 100%, for convolutional (top row) and residual (bottom row) nets on CIFAR-10.

Plots showing learning dynamics of convolutional and residual networks on CIFAR-10. Note the additional structure also visible: the 2×2 blocks in the top row are due to batch norm layers, and the checkered pattern in the bottom row due to residual connections.

We find evidence of bottom-up convergence, with layers closer to the input converging first, and layers higher up taking longer to converge. This suggests a faster training method, Freeze Training — see our paper for details. Furthermore, this visualization also helps highlight properties of the network. In the top row, there are a couple of 2×2 blocks. These correspond to batch normalization layers, which are representationally identical to their previous layers. On the bottom row, towards the end of training, we can see a checkerboard like pattern appear, which is due to the residual connections of the network having greater similarity to previous layers.

So far, we’ve concentrated on applying SVCCA to CIFAR-10. But applying preprocessing techniques with the Discrete Fourier transform, we can scale this method to Imagenet sized models. We applied this technique to the Imagenet Resnet, comparing the similarity of latent representations to representations corresponding to different classes:

SVCCA similarity of latent representations with different classes. We take different layers in Imagenet Resnet, with 0 indicating input and 74 indicating output, and compare representational similarity of the hidden layer and the output class. Interestingly, different classes are learned at different speeds: the firetruck class is learned faster than the different dog breeds. Furthermore, the two pairs of dog breeds (a husky-like pair and a terrier-like pair) are learned at the same rate, reflecting the visual similarity between them.

Our paper gives further details on the results we’ve explored so far, and also touches on different applications, e.g. compressing DNNs by projecting onto the SVCCA outputs, and Freeze Training, a computationally cheaper method for training deep networks. There are many followups we’re excited about exploring with SVCCA — moving on to different kinds of architectures, comparing across datasets, and better visualizing the aligned directions are just a few ideas we’re eager to try out. We look forward to presenting these results next week at NIPS 2017 in Long Beach, and we hope the code will also encourage many people to apply SVCCA to their network representations to interpret and understand what their network is learning.

Continua a leggere

Pubblicato in Senza categoria

GATE TOYS "League of Demon Hunters" Chapter 2 Maoshan Taoist Priest Juan Tin Ming

When I was ten, on the day of the dead, I had done something I could never forgive myself with…

Out of curiosity, I took my little brother “Tin Bao” out to play in the wilderness that night… something happened and the spirits of my little brother was taken, I was saved by a Taoist Priest. To perserve the soul of Tin Bao, they made a mini robotic vampire and kept him in it… , and I joined the League of Demon Hunters, hoping one day I could retrieve his spirits back from the demonic force.

GATE TOYS original design series “League of Demon Hunters” Chapter 2 Maoshan Taoist Priest – Juan Tin Ming is now available for Pre-Order HERE (click for the link)

There are 4 options available: (1) DELUX version (30 sets only worldwide) with the massive Mythical Maoshan Diorama Base, Robotic Baby Vamp and a 1:1 wearable win Dragon Jade Bracelet (shown in above picture) -  (2) Juan Tin MIng + Robotic Baby Vamp set  – (3) Juan Tin Ming standalone – (4) Robotic Baby Vamp standalone

*Tin Ming, Tin means heaven, Ming means fate, the name means destiny
**Tin Bao, Tin meas heaven, Bao means treasure, the name means heavenly treasure

Scroll down to see all the pictures.
Click on them for bigger and better views.

GATE TOYS original design series “League of Demon Hunters” Chapter 2 Maoshan Taoist Priest – Juan Tin Ming DELUX version (30 set ONLY worldwide)

Product Specifications: 1/6 Scale Lifelike Headsculpt, Summon Masks x 2, Fully Articulated Action Figure Body, 6 Styles of Interchangeable Hands. Costume: Daoshi (Taoist Priest ) Hat, Bone Bracelet, Short Sleeves Han Style Clothing, Twin Dragon Jade Pendant, Lantern Pants, Seal of League of Demon Hunters, Gadget Belt Set, Daoshi’s Shoes, Daoshi’s Waist Bag, Daoshi’s Rope “Zhong Kui”

Weapon: Feng Shui Compass, Spiritual Bell, “Ba Gua” (I Ching) Mirror, Inked Cord, Peach Wood Sword, Magic Oil Paper Umbrella, Horsetail Whisk, Cinnabarite Ink plate and Bai Ze (White Marsh) Brush Set, Coin Sword, Scrolls of Fire and Water, Gourd of Heaven’s Eye, Heaven’s Command, Heavenly Bird with Bird Cage set, Taoist Priest Backpack with Altar set. Plus: Mythical Maoshan Diorama Base, Robotic Baby Vamp “Tin Bao” collectible Figurine, Twin Dragon Jade Bracelet (1:1 wearable scale)

GATE TOYS original design series “League of Demon Hunters” Chapter 2 Maoshan Taoist Priest – Juan Tin Ming + Robotic Baby Vamp set
GATE TOYS original design series “League of Demon Hunters” Chapter 2 Maoshan Taoist Priest – Juan Tin Ming Standalone set
GATE TOYS original design series “League of Demon Hunters” Chapter 2 Maoshan Taoist Priest – Robotic Baby Vamp “Tin Bao” Standalone

Related posts:
GATE TOYS Original Design Series 1/6th scale League Of Demon Hunters Taoist Priest (Daoshi) first posted on my toy blog HERE

Continua a leggere

Pubblicato in Senza categoria

Almost Done with 2017

We’re now entering the home stretch of 2017.  For many people the year is over.  The amount of work that gets done between now and new years is usually on the low side.  So productivity is not a positive right now.  Still work needs to be done, and weather is cooperating in the traditional winter weather spots of the US.  That could change pretty quickly but nasty, job-slowing precipitation has not been a factor yet.  This year though has absolutely flown right by…


–  The latest Architectural Billings Index (ABI) came in with areturn to the positive.  The key with tracking the ABI is that it doesn’t touch our world for at least 9 months, so these consistent upbeat reports, along with healthy backlogs really are momentum builders for 2018.  There’s obviously the daily worry that some geo political nightmare that bursts from a twitter post could bring all of this to a halt, but so far so good metric wise.

–  A while back I wrote briefly on the “fake marketing” that is happening in our industry.  Sadly that trend keeps continuing with companies attaching their names to projects online and leaving the reader to incorrectly assume that said company supplied the entire noted product on the building.  It’s a lazy and weak approach and quite frankly distasteful to try and take credit for something you don’t deserve.  As you can see it makes me somewhat crazy- mostly because companies can and should be better than that.

–  Congrats to my friends at Walker Glass for completing the HPD/EPD process for several of their products.  That is not an easy or inexpensive process but it is meaningful in the big picture of transparency with regards to sustainable building.  Well done gang!

–  The home of the 2018 edition of GlassBuild America, the Las Vegas Convention Center is getting some serious upgrades and additions.  860 million will be spent to make the center the 2nd largest in the US right behind Chicago’s McCormick.  My guess is Vegas saw how GlassBuild is growing and with the merger of GANA and NGA, they needed to add more space to take care of us glass folk!
–  Does anyone out there watch the new ABC show “The Good Doctor” – if you do you see an amazing looking hospital featuring tons of glass.  I’m curious is that a real structure (or Hollywood magic) and if it is real, where is it and who did the glass and glazing… because it is sharp!

–  Are you a Top Glass Fabricator?  If so submissions are now open for Glass Magazine’s Top Fabricator list.  Get your details in and you can also nominate so great glaziers along the way too! 
–  I think “Black Friday” may be the biggest day of the year for promotional e-mails.  I was floored by the amount that I received to start the day and how they just kept on coming in.  I think every list I have ever been on from a retail outlet reached out to me as well as many others that I have no idea how they found me.

–  Last this week, before the Thanksgiving holiday in the US I posted a handful of things I am thankful for within our industry.  If you are interested you can find that posthere…

This story shows a sad part of our society- people confusing Charles Manson with music star Marilyn Manson
Part 2 of society frustrations… how people can be mad at a professor for telling the simple truth
Deep and long story on the last of the Iron Lungs in operation.  I honestly had no idea this was still out there.

Fun video- guy does one of those typical contests during an NBA game and wins food for entire crowd… backstory they don’t mention, the shooter recently survived cancer.  Very cool.

Continua a leggere

Pubblicato in Senza categoria