Announcements

Updates on campus events, policies, construction and more.

close  

Information for Our Community

Whether you are part of our community or are interested in joining us, we welcome you to WashU Medicine.

close  


Visit the News Hub

New software safeguards research participants’ privacy

De-ID allows easy removal of identifiable details from datasets, enabling safe sharing

by Marta WegorzewskaApril 2, 2026

graphic of a open laptop with a screen showing color coding of textSara Moser/WashU Medicine

Which details in a de-identified scientific record are enough to still identify a person? If, for example, the record includes that a person is a CEO, the abundance of CEOs in the world would make identification nearly impossible. If the person is a CEO living in Missouri, the list becomes notably shorter but likely extensive enough to safeguard identity. But the name of a CEO living in St. Louis and working at a particular company is easy to figure out.

Some scientific research studies collect qualitative data — information provided through focus groups, surveys and interviews — that harbor potentially telling information, including cities lived in, work histories, personal anecdotes and other details. Privacy concerns have kept such data largely inaccessible to the wider scientific community and the public, with only the study’s researchers having access.

But now, a team at Washington University School of Medicine in St. Louis has developed software that flags sensitive information in interview text — such as where a person works or lives — making it easier for scientists to remove or modify identifiable details. Subtracting the risk of identifying study participants opens the doors to sharing qualitative data with other researchers and the public, which has several benefits, according to James DuBois, DSc, PhD, executive director of the Bioethics Research Center within the WashU Institute of Clinical and Translational Sciences.

“Sharing data can be critical for improving public trust in science,” said DuBois, who led the development of the software, which has become the De-ID App. “Greater access to data also supports new research with existing data, allows for verification of results, and provides teaching opportunities. Our software will guide researchers in the safe and ethical sharing of qualitative data that have historically remained hidden.”

DuBois developed the software with fellow researchers at WashU Medicine and other institutions. The group then worked with WashU’s Office of Technology Management to license the software to Socio-Cultural Research Consultants, a company that creates tools for researchers in various sectors, for further development and commercialization. De-ID became commercially available in early 2026.

As much as necessary and as little as possible

Under a National Institutes of Health (NIH) data-sharing policy updated in 2023, researchers are required to share all data gathered during NIH-funded research — including qualitative data — with the public. But to meet that requirement without sacrificing participants’ privacy or making their information vulnerable to misuse, researchers must painstakingly identify and remove personal details from the text before sharing it, which can be time-consuming and vulnerable to human error.

With funding from the NIH’s National Human Genome Research Institute, DuBois’ team collaborated with programmers and researchers in the WashU Medicine Institute for Informatics, Data Science & Biostatistics (I2DB), as well as researchers and data curators at a large data repository at the University of Michigan, to develop resources to help scientists to meet the NIH’s new requirements. They created a toolkit to guide researchers through the intricacies of qualitative data sharing and developed the new software to help researchers de-identify private information quickly and correctly.

De-ID highlights and suggests generic replacements using a color-coding scheme: red for the 18 identifiers protected under the Health Insurance Portability and Accountability Act (HIPAA), including names, Social Security numbers and phone numbers; yellow for information that has a medium risk of identifying a participant on their own or when combined with other details contained in the file, such as dates or locations; and blue for low-risk information that is probably safe to keep but can occasionally identify an individual when combined with other information in the file, such as age references or driver’s license numbers.

The original owner of the data is prompted to review the flagged details and accept the suggested changes or otherwise alter identifiable details as warranted. DuBois cautioned that this process requires care and discernment: Mindlessly deleting or modifying potential identifiers could negatively impact the usefulness of the data, he warned.

“We follow the rule of as much as necessary and as little as possible,” said DuBois, who is also the Steven J. Bander Professor of Medical Ethics and Professionalism at WashU Medicine. “It’s a fine line. If you want other researchers using the data to understand the context and who is speaking, you want to leave in as much as you can — but you absolutely don’t want to risk identifying any individuals.”

Aditi Gupta, PhD, an assistant professor of biostatistics at the I2DB, and Albert M. Lai, PhD, a professor of medicine in the Division of General Medical Sciences at WashU Medicine, also were leaders in the software’s development.

“The software knows to look for certain details,” Lai said. “But it doesn’t know how to combine details and determine whether they can identify a person. For example, it can flag a person’s age, but it would be challenging to identify someone based on that information. However, when combined with information such as their profession and workplace institution, it could make someone look unique.”

A scientific manuscript serves as a vehicle for disseminating discoveries and fostering innovation. It allows other researchers to build upon a study’s findings, advancing societal understanding and progress — in theory. But the challenge of sharing sensitive, qualitative data has limited its potential impact, DuBois explained.

“Most research participants’ voices never get heard,” said DuBois, also a professor of psychology and brain sciences. “Researchers performing qualitative studies control how the story is told. There is huge room for bias in research articles that rely on data collected from interviews, focus groups and surveys. Sharing full data can help correct for this, but it must be done in a way that respects confidentiality. That was the aim in developing this software.”

 

About WashU Medicine

WashU Medicine is a global leader in academic medicine, including biomedical research, patient care and educational programs with more than 3,000 faculty. Its National Institutes of Health (NIH) research funding portfolio is the second largest among U.S. medical schools and has grown 83% since 2016. Together with institutional investment, WashU Medicine commits well over $1 billion annually to basic and clinical research innovation and training. Its faculty practice is consistently among the top five in the country, with more than 2,000 faculty physicians practicing at 130 locations. WashU Medicine physicians exclusively staff Barnes-Jewish and St. Louis Children’s hospitals — the academic hospitals of BJC HealthCare — and Siteman Cancer Center, a partnership between BJC HealthCare and WashU Medicine and the only National Cancer Institute-designated comprehensive cancer center in Missouri. WashU Medicine physicians also treat patients at BJC’s community hospitals in our region. With a storied history in MD/PhD training, WashU Medicine recently dedicated $100 million to scholarships and curriculum renewal for its medical students, and is home to top-notch training programs in every medical subspecialty as well as physical therapy, occupational therapy, and audiology and communications sciences.

Marta covers pathology & immunology, pediatrics, obstetrics & gynecology, anesthesiology, ophthalmology and technology management, among other topics. She holds a bachelor’s degree in biology from Georgetown University and a PhD in immunology from the University of California, San Francisco. She did her postdoctoral work in Washington University’s Department of Pathology & Immunology. Marta joined WashU Medicine Marketing & Communications in 2023 after working as a science writer in the Department of Biology on the Danforth Campus for five years.