The International Conference on Harmonization (ICH) M7 guideline allows the use of in silico approaches for predicting Ames mutagenicity for the initial assessment of impurities in pharmaceuticals. This is the first international guideline that addresses the use of quantitative structure\activity relationship (QSAR) models in lieu of actual toxicological studies for human health assessment. Therefore, QSAR models for Ames mutagenicity now require higher predictive power for identifying mutagenic chemicals. To increase the predictive power of QSAR models, larger experimental datasets from reliable sources are required. The Division of Genetics and Mutagenesis, National Institute of Health Sciences (DGM/NIHS) of Japan recently established a unique proprietary Ames mutagenicity database containing 12140 new chemicals that have not been previously used for developing QSAR models. The DGM/NIHS provided this Ames database to QSAR vendors to validate and improve their QSAR tools. The Ames/QSAR International Challenge Project was initiated in 2014 with 12 QSAR vendors testing 17 QSAR tools against these compounds in three phases. We now present the final results. All tools were considerably improved by participation in this project. Most tools achieved \gt;50\% sensitivity (positive prediction among all Ames positives) and predictive power (accuracy) was as high as 80\ almost equivalent to the inter-laboratory reproducibility of Ames tests. To further increase the predictive power of QSAR tools, accumulation of additional Ames test data is required as well as re-evaluation of some previous Ames test results. Indeed, some Ames-positive or Ames-negative chemicals may have previously been incorrectly classified because of methodological weakness, resulting in false-positive or false-negative predictions by QSAR tools. These incorrect data hamper prediction and are a source of noise in the development of QSAR models. It is thus essential to establish a large benchmark database consisting only of well-validated Ames test results to build more accurate QSAR models.