# Greater genetic diversity is needed in human pluripotent stem cell models – Nature Communications

While efforts are underway to increase diversity, it is worth taking a moment to consider how different populations are determined and described in hPSC collections. When it comes to verification, most cell banks use self-reported race or ethnicity as opposed to genetically inferred ancestry (HipSci being a notable exception). Self-reported race or ethnicity reflects categories of identity that may change over time, while genetically inferred ancestry (e.g., quantitative estimates of ancestral components by continent) reflects aspects of underlying biology which remain static for a given individual. Both types of data provide relevant information, but relying on self-reported race or ethnicity alone has several specific limitations. The 2020 United States Census provides a timely example of how changing social, political and cultural factors can influence self-reporting24which is less reliable for populations composed of multiple ancestries and individuals who identify with multiple races or ethnicities25. Indeed, as discussed by ref. 26, an individual’s racial or ethnic identity may have little match with their genetic ancestry. A study looking at the accuracy of self-reporting for over 9000 people found that the method of data collection itself, in this case a request form versus the consultation, was sufficient to have a impact on the level of concordance with genetic ancestry.27. Another study analyzing nearly 2,000 people in a pediatric HIV/AIDS cohort who were asked to identify as “Black/African American”, “White” or “Hispanic”, found that using the % of highest genetic ancestry, 9.5% of subjects were misidentified based on self-reporting and when $$\ge$$75% of the genetic ancestry of a specific population was required, 26% of individuals were misidentified based on self-reporting28. These studies and others highlight how reliance on self-reported race or ethnicity in HPSC collections can impact the accuracy as well as the longevity of resources, particularly when identity labels, some of which have a busy history, change. The inclusion of genetically inferred ancestry is a strategy to improve the accuracy of cellular resources, to better understand the genetic architectures of specific subpopulations, and to ensure that resources retain their utility as identity tags change. Coupling this information with self-reported race or ethnicity using standardized nomenclature will provide a more complete picture of individual donors. This will of course require clear communication to ensure donors understand and agree to genetic testing to infer ancestry.