How many times have you done an online search for medical information you were too embarrassed to ask your doctor?
New research suggests that this search data can give us unique insight into how people think about illness, what they really want to know, and how we might use that data to develop more relevant public health interventions.
Fear and stigma surrounding disease limit people’s ability to seek and receive appropriate medical advice and care. It leads to secrecy and denial, which can increase the likelihood of further spreading infection. Effective medical care depends on openness, honesty, acceptance, a lack of judgment, and good information. However, public health interventions are not one size fits all; they must take into account the cultural context of the people in need of care.
It’s easy to assume that – whether it’s information, treatment, or cure – a health care worker can simply go into a country, town, or home and educate or treat people. Many have tried and failed, spending millions along the way by failing to take into account issues like culture, religion, gender, and economic status, and failing to establish understanding and trust with potential patients.
Successful public health interventions require, at minimum, knowledge of how patients understand disease and the kinds of questions they need answered before participating in treatment. This important anthropological work is not only expensive and time-consuming, but still depends on the complete honesty of the subjects. If that doesn’t sound like a monumental challenge, consider how often you’ve withheld information from your physician or failed to ask a question you really wanted the answer to.
Using search data in public health
Researchers from Cornell University, SUNY Stonybrook, Microsoft Research, and the Rockefeller Foundation Fellowship have just published a study that may give public health organizations new tools to improve their education campaigns and programming efforts.
With the understanding that Internet searches provide a wealth of information about people’s genuine concerns about disease and the kinds of treatments they seek, Cornell computer science researcher Rediet Abebe and her team set out to discover what we can learn about public health on the African continent from people’s online searches and whether we can use this information to understand what people really want to know about disease and are afraid to ask.
Abebe began this research and did much of the technical analysis during an internship at Microsoft Research. Using proprietary data provided by Microsoft’s search engine Bing, the project turned into what she called “the first work to use large Web- or social media-based data to study health in all 54 nations in Africa.” (While Bing is more popular in Africa than it is in the U.S., Google is still the most popular search engine – however, the data gathered was enough to run the statistical analysis needed for their preliminary research.)
The research team set out to see what search data could tell them about the information needs, concerns, and misconceptions of people throughout Africa when it came to HIV/AIDS, malaria, and tuberculosis. These three diseases account for 22% of the disease burden in sub-Saharan Africa.
In their new paper “Using Search Queries to Understand Health Information Needs in Africa,” freely accessible here, the team collected common search themes and topics from Bing search data. They cataloged how people searched for disease symptoms, drugs, and concerns about breastfeeding, but also information that might have been difficult to glean from surveys, such as ideas about stigma and curiosity about natural cures.
Abebe told me in an interview that this project wasn’t simply confined to the computer science lab:
The research was informed by extensive bottom-up work from public health using surveys, talking to health experts and other people working on the front-lines, and also my own experiences growing up in Addis Ababa, Ethiopia.
When she arrived at Microsoft, she knew she wanted to use this data to ask questions about improving lives in Africa. Devoting much of her time to reading the public health literature, she embarked on the project to gather data to inform on-the-ground efforts:
Given the lack of computationally-driven work on mapping health information needs of individuals in African nations, we settled on asking whether we can shed more light on the varied and unmet health information needs of individuals in Africa and how those needs are being met online.
Roughly 31% of people living on the African continent have Internet access, a number that is constantly growing. Despite the research only covering a small part of the population, the data gathered provides unique insight into issues of concern that public health professionals might not otherwise have known about.
Methods and data
Using a statistical model that extracts sets of semantically-related words from online searches, the team was able to identify common topics. While the data was anonymized to some extent, the researchers had access to gender, sex, age, and location data in order to see who was searching for what and where, as well as how that compared to disease statistics. The data was scrubbed to remove names, physical addresses, IP addresses, phone numbers, Bing user IDs, and any HIPAA identifiers.
Data was gathered from searches in 2016 and 2017. The searches used included the diseases “HIV,” “AIDS,” “tuberculosis” (or “tb”), or “malaria,” and at least one other word. The data revealed the following themes:
- A high correlation between searches about a disease and the rate of disease in that area.
- A relationship between searches related to stigma and the prevalence of HIV in the area the searches came from.
- Searches related to stigma are more popular among women and users 18–24 years of age.
- Topics related to news on HIV/AIDS cures are more popular among men, as well among those 35–49 years old.
- Topics related to breastfeeding, pregnancy, and family care are more popular among women.
- Topics related to symptoms are more popular in the 18–24 and 25–34 age groups.
- Topics related to the socioeconomic implications of HIV/AIDS, such as gender inequality, are more popular among 18–24-year-olds.
- Topics related to concerns about transmission to partner and child are more popular among the 25–34 age group.
- Searches for natural cures were most popular among the oldest age group (35–49), but they also had the most interest in drugs related to treatment.
It is not only important to understand what knowledge people seek when they’re looking for health information, but to understand how these information needs vary by region and demographic group.
The authors note that search engine data provides insight into “people’s real-time activities, experiences, concerns, and misconceptions relatively cheaply…,” especially in “data-sparse regions.” By starting with the data instead of with a question or goal in mind, the researcher’s bottom-up approach is also easier to scale, cheaper, and less time-consuming than other types of research (though this is not meant to replace face-to-face data-gathering).
And as Abebe told me:
Search data cannot replace door-to-door surveys or other in-person conversations with stakeholders. Surveys are invaluable in understanding and mapping health information needs of communities. In this study, we were sensitive to this fact and were focused on how we can best supplement existing efforts.
The main thing I hope to avoid is to see these as a replacement of surveys and more traditional methods or to see this as a panacea.
The team also assessed the quality of search results on HIV/AIDS to get a sense of what people ended up reading while doing online research. To do this, they looked at which webpages users clicked on, how much time they spent on a page, and how much time they spent looking at the results page.
Time spent on the results page and the number of clicks were both low for those who made searches related to natural cures. The researchers also found that searches for natural remedies were more likely to turn up untrustworthy blogs and websites.
The researchers were wise not to draw conclusions from this limited and preliminary data, but the study does allow us insight into behaviors that can help us ask better questions in the future and perhaps address the need for more reliable information on certain topics.
The future of search engine research
Issues that need to be addressed in future research range from technical to linguistic to big-picture ethical questions about the endeavor itself.
Search engine companies collect a tremendous amount of data about their users, but they’re under no obligation to share it – in fact, many argue that it should never be shared – with researchers or organizations, even those attempting to save lives. Concerns about data being de-anonymized are all too real and the more data is transmitted and shared, the more chances it has to be hacked.
But one might argue that if the information exists in the first place – and if it can truly help people – it should be put to use. If research such as this can help improve education campaigns and bring better health care to regions where it’s lacking, is it irresponsible to let the data be used only for internal product research and marketing and not to improve health?
The researchers don’t address these questions in the paper, but they do discuss the use of search engines as a platform for public health education. They suggest that someday we might enlist tech companies to deliver targeted advertisements linking to the most reliable information to those searching for specific health topics.
Public health has always been closely tied to surveillance. On the one hand, it’s integral to epidemiology, but it has also been viewed as unnecessary overreach by the state. Combining it with 21st century Internet surveillance could very well be a privacy disaster. Even in cases where health care workers are monitoring public health campaigns, medication usage, or planning outreach projects, the projects are riddled with ethical hurdles involving privacy, dignity, autonomy, and choice.
The researchers are concerned about these issues. Abebe, in particular, has been active in discussions about AI ethics and how algorithms can be used for social good. Maintaining privacy is a priority for her as she continues her work:
I think there are ways to do this so that you are not giving up user privacy…You can do this in such a way that is only taking into account aggregate previous searches and not tracking any individual’s search. Of course, the line may not always be so clean, and in those cases, priority should be given to the users’ privacy and any campaigns should be done in collaboration with health experts.
She’s also aware that computer scientists add only one piece to a complex picture of public health, stating that “good follow up work to this would be to focus specifically on this topic and use not just search data but many other data sources to get even deeper insights both into patterns by geography and demographic groups, as well as whether it appears to be impacted by off-line events or experiences such as proximity to hospitals and health centers.”
But while we’ve seen many attempts at interdisciplinary research over the last decade, we’re still falling short when it comes to integrating the social sciences and humanities into computer science research. Abebe told the audience at this year’s EmTech Digital event that despite algorithms being pervasive in nearly every area of life, “There’s a disconnect between researchers and practitioners and communities.”
Sadly, all the good research in the world won’t help if experts can’t work together to build good programs and implement them effectively and ethically.