How Reproducibility Crisis is Eating Away the Credibility of Machine Learning Technology?

Published in

Analytics Vidhya

7 min readAug 23, 2019

The core value of any scientific finding is the capability to reproduce. When scientific inquiry about any physical reality observes some natural laws, those laws can be seen to have similar effects with a similar state of things. This reproducibility of output is what rules the scientific inquiry and principles. When a particular cause-effect relationship between things is found once and never again, then the relationship between the cause and effect is not established at all.

What happens if a similar kind of thing happens with Machine Learning Development, an evolving technology that claims to be able to train computing algorithms for analytical output? Well, for some time reproducibility of result has become a big challenge for Machine Learning models. Here through the length of this post, we are going to explain the problem.

The Rosy Picture and the Reality

The promise of Machine Learning for the scientific and professional community has been huge. To train machines with relevant user data and getting data-driven insights for different user contexts and purposes has made Machine Learning to be considered as one of the most sought after technologies for interactive interfaces across the niches. Machine Learning technology employed for mobile or web solutions by the eCommerce stores or other digitally exposed enterprises, offered the promise of addressing customer concerns proactively by learning from their interactions and on-screen behaviour.

But it didn’t take long for that rosy picture to fall out and look bleak. Recently, a statistician from Rice University in Australia said that the discoveries made using Machine Learning cannot be automatically held as trusted and sacrosanct. In a “Nature” survey conducted back in 2016, it has been found that more than 70% researchers actually have been unsuccessful in reproducing the results of the experiments of other data scientists who achieved results using Machine Learning. The reproducibility crisis is continuing to loom large on the data scientists, especially where they use Machine Learning models.

While Machine Learning is still is in use by the data scientists and other scientists across several disciplines to take advantage of the refined data analysis for making discoveries and accessing more accurate findings, the credibility of the technology because of this reproducibility crisis is increasingly becoming common. Several experts are of the opinion that the scientific discoveries made by using Machine Learning now lost credibility because of the lack of reproducibility and the discoveries need to be tested and evaluated by other scientific methods before they are held credible.

On the other side of this increasing dilemma and uncertainty about the Machine Learning technology, next-generation research is already underway to advance the field and address the reproducibility crisis. The data scientists in the world’s finest labs are already working to assess what goes wrong with the ability to reproduce results based on data-centric learning. Though Machine Learning is now in the middle of controversy concerning credibility, there is no question of writing off the technology as something short-lived and gone.

Where We Are Heading For? Are We in Total Control of the Data-Driven Decision Making Process?

When referring to the shortcomings of the Machine Learning technology we need to take every aspect cautiously and explain all the aspects one by one. Just think about a startup which is using the Machine Learning models to build utilising the work of various team members on top of each other. Now, if the Machine Learning model while working with the same raw data suggest different solutions at different instances, how can you positively assess the credibility of the models? It is purely the crisis if not being able to produce the same output with the same data training input. So, when it comes to building data-centric models from scratch, we have not progressed very much.

The Core Challenge: Collaboration on ML Models

How do you track the changes and maintain a safe register of all the source code and the respective changes brought to the code? Well, this challenge corresponding to the version control of the software programs seems to create the biggest roadblocks. As of now, the Machine Learning environment has been one of the most challenging environments to collaborate on bringing changes and keeping track of them in an organised manner.

Why Machine Learning model makes it challenging to collaborate and keep track of the changes? Well, explaining the typical attributes of the lifecycle of a Machine Learning model can be helpful in this respect. Let’s how Machine Learning models work.

● For example, the data researcher tries to testify the issues while trying to build an architecture to classify images.

● To make the input of his required data set easier he incorporated some from another completed project.

● This dataset incorporated from another system is stored in one of the network folders. This is done to register all the changes incurred to the data set.

● Now by tweaking a few things and by doing a few extra things he ended up copying the entire source file and storing the same in the GPU cluster. He did it particularly to run the full training module.

● In this way, he actually executes several training runs. He even continued the process while working on other things in the computer just because the process may take weeks or months to complete.

● If he found a bug while running a cluster he needs to modify the code and copy-pasting the same in all other machines for the sake of starting the job.

● While one run goes on, he may take partially trained weights from this and use the same as the starting point for another run with a different code.

● Now, he also keeps all the evaluation scores and various model weights of all the runs.

● When he can no longer run experiments any more he chooses trained model weights for releasing the final model. These finally adopted model weights can be any of the stored ones in his Machine or an entirely different one.

● He also needs to check in the final code in a personal folder designated for source control.

● Now finally, he is ready to publish the results with the trained weights and the source code.

This scenario of building a Machine Learning model involves all the hardships and time. To get similar results, each and every step of this process needs to be followed meticulously and without exceptions. In case, any step takes a slightly different footing, easily inconsistencies can creep in and consequently the numeric determination of the outcome changes. To make things even more challenging, the ML frameworks can do away with an exactitude of numerical determination for performance. Naturally, even if miraculously one manages to reproduce the steps exactly as described above, a tiny difference in respect of outcome cannot be avoided entirely.

In most other real-world scenarios, normally researchers do not keep account or notes of all the development steps and hence, they are very unlikely to reproduce the model at all.

Reproducibility and Data: Revisiting the Key Concerns

When talking about the Reproducibility Crisis in Machine Learning some concerns keep coming. These challenges or concerns remain as common roadblocks and contributing factors to the reproducibility crisis. Let’s have a look at them one by one.

Version control with large data sets is the biggest problem when creating a Machine Learning model. You can create versions and keep version History for models and the datasets but you may not have versions of all the small changes and little tweaks that you did to the models. It is really challenging to copying all the assets to the server once more. When working on a small volume of data and tweak here and there, the challenge is not so big. But as we know Machine Learning models need to work on large data sets and so, controlling and registering the versions with different changes becomes so big.

If the challenge of controlling versions by registering all the small tweaks and changes with different versions is the key pain point, handling large data sets becomes the other end of the problem. When you train machine against a small subset of data, reproducibility is not a big challenge but when scaling up this for a large subset even by re-applying all the little tweaks and changes of the earlier model, the result can be very different.

The same model working for one may not work for other team members because the database will look different with a different date and similar attributes. For example, Excel formats the dates and by doing so it will instantly switch to the local date or date perceived in the local context. For a big team of developers spread across the continent can be very confusing here.

Conclusion

Don’t have an impression by knowing about these challenges that Machine Learning was a short-lived bubble that just became crashed facing mighty challenges. Actually, quite the contrary in this case. The reproducibility challenge to Machine Learning will eventually be solved by addressing the issues of version control on large data sets.

The challenge may lead data scientists to maneuver some well-equipped mechanism to organise large data sets for allowing developers to keep each and every tweak through different versions effortlessly. Like all earlier challenges to revolutionary technologies, these will also pass as Machine Learning will continue to get matured and more equipped.