Youyang Gu: From his bedroom to the CDC

CDC stands for the Center for Disease Control and Prevention. It is the number one research center on infectious diseases in the United States, and perhaps in the world. One of its tasks is that of providing forecasts on the evolution of the COVID-19 epidemic, with the help of about fifty university research groups, government agencies and consulting firms, which, every Monday before 6 pm, deliver forecasts on the numbers of positive cases, hospitalizations and deaths associated with COVID-19 for the following weeks. By averaging all of these predictions, the CDC has been able to anticipate the pandemic’s progress with a fair amount of accuracy.

But who are the best forecasting scientists among those consulted by the CDC? Medical statistician Nicholas Reich of the University of Massachusetts has compared the predictions sent to the CDC with the actual data. In the end, the winner of this special “contest” was not one of the teams of academic researchers or a private study center, but the amateur Youyang Gu, a 27-year-old computer scientist born in China who arrived in the U.S. at the age of 7, a graduate of MIT in Boston with no experience in the health field. His predictions did not come to the CDC from a research lab, but from the small bedroom in his parents’ home in Santa Clara where Gu was caught by the lockdown in March 2020, after a couple of years working as a financial analyst in New York City.

Until last May, Gu was completely unknown to the CDC. Initially, his predictions only circulated on Twitter. His followers quickly noticed that they were more accurate than the official ones circulating in the media. For example, more than those of the Institute for Health Metrics and Evaluation (IHME), a research center at the University of Washington on whose optimistic predictions Donald Trump relied heavily. In its April forecasts, the IHME estimated that the epidemic would be over by July and that the number of deaths in the U.S. would not exceed 80,000 for the whole of 2020. According to Gu, however, that threshold would already be reached on May 9. He was right: on that day, 79,926 deaths were officially registered.

After the first correct predictions, the interest around Gu’s work grew until it reached the official scientific community. At the end of April, the biologist Carl Bergstrom of the University of Washington and Nicholas Reich himself suggested to the CDC that they should include Gu among the experts to be consulted. The young computer scientist became a regular participant in the meetings of the most important experts on the management of the pandemic at the international level as well: thanks to his skill with data, the World Health Organization recruited him in the technical committee in charge of estimating the mortality of COVID-19. Alongside him were world-class scientists who rely on research teams of dozens for their predictions. Gu, on the other hand, had only one “helper” (his computer) and he didn’t need much else.

His algorithm is quite simple. It simulates a pandemic according to a classic model for epidemiology, the so-called SEIR model, in which each individual is a node in a network and can be in the state of “susceptible” (to the virus), “exposed”, “infected” or “recovered”. So far, nothing original: it is the most used method in the field, and it depends on a series of parameters. To estimate these, however, Youyang Gu relies on machine learning, and this represents the secret of the method’s success. “There are probably not many infectious disease experts with expertise in machine learning,” Gu tells il manifesto.

Machine learning is also a model: it’s a computer-generated artificial intelligence that, inspired by how real neurons work, uses past data to predict future data. For the pandemic, Gu did not use official data, but that published by the Covid Tracking Project, another project run on a voluntary basis by hundreds of users via the internet. Such a model is able to perform “fast calculations with limited resources,” as Gu explains: “All projections can be generated in less than half an hour on a laptop.” For those who want to try it, all the software is open source and free to download from the web.

IHME director Christopher Murray tried to explain to Bloomberg Businessweek that Gu’s model “works well in short-term forecasting, but it doesn’t understand what’s going on. Past-based algorithms can’t account for viral variants and the impact of vaccines on them.” He’s not the only one who thinks this way: many are comparing machine learning to an oracle, which gets predictions right but doesn’t allow you to understand how the system according to which the predictions are made works – In this case, the pandemic.

Is developing a scientific theory something completely different? “The only experts who think so are those who don’t understand machine learning,” Gu replies curtly, with a hint of dismissiveness, perhaps motivated by all the past attacks against him. “I greatly appreciate it when an expert gives a positive review of my work. But the worst comments come from those who see me as an opportunist who exploited the pandemic. I try to take it in a positive way: I saw an opportunity to make my expertise available to everyone, and I took it.”

Despite his successes, after six months of getting the numbers right, Gu stopped publishing his predictions in November 2020 because, quite simply, there was no longer a need. “In March and April 2020, I was amazed at the absence of high-quality models cited in the media. Predictions ranged from 60,000 to 2.2 million deaths in the U.S. by August,” he wrote on his blog. “In the months since, several other accurate models have emerged. Therefore, the time is right to stop.” Not least because the project began almost as a hobby. “All I used to develop the model was a laptop, a Twitter account and $20 to buy the domain covid19-projections.com.” When he asked his followers for support, however, he received $52,000 in donations.

After he stopped making predictions, Gu devoted himself to estimating the true number of infected people—in the U.S. as well, swabs reveal only a small part—and to monitoring vaccinations. But on March 7, he put a stop to those projects as well, with the last prediction being of a return to normal for the U.S. in the summer. “The work of modeling and forecasting COVID-19 is properly the work of epidemiologists and the greater public health community,” he explains. Does he have any new projects? “I don’t know yet,” he replies from New York, where he has since returned. “I’ll take some time to think about it.”

Gu’s story seems tailor-made for a movie about the American dream. He confessed in an interview on ABC: “I can’t think of many other countries where the work of a 27-year-old immigrant with no experience can get the attention and respect of scientists and the public.”

Could this ever happen in Italy? Readers will have noticed the similarities with the story of Alberto Giovanni Gerli, a 40-year-old expert in street lighting and the game of bridge catapulted into the Italian scientific technical committee on the pandemic thanks to his mathematical models that were often wrong, but appreciated by the administration of Lombardy for their intractable optimism. Both were admitted among the experts in the absence of relevant experience, and, with different times and circumstances, both have since stepped back.

But there are also big differences between the two successful self-taught individuals. Gu sat at that table thanks to the recognition of the scientific community, which didn’t care about which country his passport was from or how much his figures supported White House policies. Gerli, on the other hand, is still an irrelevant figure to those professionally involved in epidemiology, and his predictions have served mainly to pander to the anti-pandemic plans of incompetent administrators.

Originally published at https://ilmanifesto.it/dalla-cameretta-al-cdc/ on 2021-03-26

Reportage

Youyang Gu: From his bedroom to the CDC

Until last May, Gu, a young computer scientist, was completely unknown to the CDC. But his followers quickly noticed that his predictions about the pandemic were more accurate than the official ones circulating in the media.