This gigantic catalog of information contains genetic blueprints extracted from virus samples studied at a laboratory in Wuhan, China, that some authorities believe may have been the origin of the Covid-19 outbreak, several people familiar with the matter have said.
It is not known exactly how or when U.S. intelligence agencies gained access to the information, but the machines involved in creating and processing this type of virus genetic data are often connected to external servers in the cloud, leaving them open to hacking, the sources said.
However, turning this mountain of raw data into usable information - which is just one part of the intelligence community's 90-day effort to uncover the origins of the pandemic - comes with several challenges, including using enough computing power to process all this data. For that, the intelligence community relies on supercomputers at the Department of Energy's National Laboratories, which comprises 17 elite government research institutes.
There is also the problem of manpower. Intelligence agencies need not only government scientists skilled enough to interpret the complex genetic sequencing data and who have the proper clearance, but also those who can speak Mandarin, since the information is written in Chinese with a specialized vocabulary.
"Obviously there are scientists who have clearance," says a source familiar with the intelligence. "But Mandarin speakers who have clearance? That's a very small group. And not just scientists, but scientists who specialize in biology? So you can see how quickly it gets complicated."
Officials conducting the 90-day review hope the information will help answer the question of how the virus passed from animals to humans. Solving this mystery is important to finally determine whether Covid-19 leaked from a laboratory or was transmitted to humans from animals in the wild,
Researchers both inside and outside the government have long been searching for genetic data from 22,000 samples of the virus, which were studied at the Wuhan Institute of Virology. That data was removed from the internet by Chinese officials in September 2019, and China has since refused to hand over that and other raw data on early cases of the coronavirus to the World Health Organization and the United States.
The question for researchers is whether WIV or other laboratories in China had samples of the virus or other contextual information that could help them trace the evolutionary history of the coronavirus.
Two coronavirus scientists are skeptical that there is genetic data in the 22,000-sample tranche or any other human papillomavirus database that scientists don't already know about.
"Basically, in [a scientific paper published in 2020 in the journal Nature], WIV told us all the sequences they had up to a certain point; that's what most virological scientists believe, that's pretty much what they had," said Robert Garry, Ph.D., a virologist at Tulane University School of Medicine.
A source familiar with the U.S. investigation would neither confirm nor deny that the data relating to these 22,000 samples were among those currently being analyzed by U.S. intelligence agencies.