Although the need for more diverse hPSC resources is clear, significant challenges remain in expanding cell collections and implementing these resources in the laboratory. First, it is critical to recognize that current efforts toward greater inclusion exist within a historical context of discrimination, where actions as well as inactions have eroded trust in scientific and medical establishments (for additional discussion on historical issues race and ancestry in medicine, please see ref 22). Therefore, conscious efforts to rebuild trust and increase participation are essential, including ensuring informed consent; wide access to the resources, data and results collected; deliberate and ongoing engagement with all stakeholders contributing to cell collections; and clear and precise language to describe race, ethnicity, ancestry and their potential roles in specific biological findings. As global cell collections grow, it is essential to invest in training and capacity building specifically in currently underrepresented countries, and to establish scientific partnerships to facilitate the use of hPSC resources in the communities from which they come. At the same time, countries with established cell banking capabilities should continue to improve the representation of the diverse backgrounds of donors that may be available in those countries. Cell line distribution and data sharing may also be subject to country-specific limitations, highlighting the importance of simultaneously advocating for increased diversity of collections in a given country, as well as increasing research efforts. global collaboration between countries. Data governance is another important consideration. While deeper and more extensive clinical phenotyping and metadata would be extremely valuable when combined with genetic and cellular resources to enable genotype-phenotype associations, it can be difficult to make such sensitive data available to the scientific community, as they often relate to health protection. category of information that, if shared, could compromise patient privacy and affect identifiability. Thus, the depth of shareable information must be appropriately balanced with donor confidentiality. The increasing diversity of cellular models also raises obvious questions of feasibility, as it forces laboratories to invest time and financial resources to incorporate additional HPSC lines into experimental paradigms. As noted above, some labs can take advantage of diverse cell lines to interrogate known alleles in different genetic backgrounds using targeted approaches and therefore require relatively small sample sets, while others may s engage in discovery studies such as mapping the effects of genetic variants on cellular phenotypes that require substantial scale. Here, repositories with well-characterised, diverse, and accessible hPSC lines, combined with additional support from research funding mechanisms specifically for the purpose of incorporating hPSC lines from underrepresented populations, and clear reporting on breeding ancestry in individual studies will be essential for the practical implementation of these resources (see additional guidance in Ref. 23).
While efforts are underway to increase diversity, it is worth taking a moment to consider how different populations are determined and described in hPSC collections. When it comes to verification, most cell banks use self-reported race or ethnicity as opposed to genetically inferred ancestry (HipSci being a notable exception). Self-reported race or ethnicity reflects categories of identity that may change over time, while genetically inferred ancestry (e.g., quantitative estimates of ancestral components by continent) reflects aspects of underlying biology which remain static for a given individual. Both types of data provide relevant information, but relying on self-reported race or ethnicity alone has several specific limitations. The 2020 United States Census provides a timely example of how changing social, political and cultural factors can influence self-reporting24which is less reliable for populations composed of multiple ancestries and individuals who identify with multiple races or ethnicities25. Indeed, as discussed by ref. 26, an individual’s racial or ethnic identity may have little match with their genetic ancestry. A study looking at the accuracy of self-reporting for over 9000 people found that the method of data collection itself, in this case a request form versus the consultation, was sufficient to have a impact on the level of concordance with genetic ancestry.27. Another study analyzing nearly 2,000 people in a pediatric HIV/AIDS cohort who were asked to identify as “Black/African American”, “White” or “Hispanic”, found that using the % of highest genetic ancestry, 9.5% of subjects were misidentified based on self-reporting and when \(\ge\)75% of the genetic ancestry of a specific population was required, 26% of individuals were misidentified based on self-reporting28. These studies and others highlight how reliance on self-reported race or ethnicity in HPSC collections can impact the accuracy as well as the longevity of resources, particularly when identity labels, some of which have a busy history, change. The inclusion of genetically inferred ancestry is a strategy to improve the accuracy of cellular resources, to better understand the genetic architectures of specific subpopulations, and to ensure that resources retain their utility as identity tags change. Coupling this information with self-reported race or ethnicity using standardized nomenclature will provide a more complete picture of individual donors. This will of course require clear communication to ensure donors understand and agree to genetic testing to infer ancestry.
With respect to the language used to describe race, ethnicity and/or ancestry, there is a general lack of concordance between different hPSC libraries, between different genomic studies, and between hPSC libraries and genomic studies (Fig. 2a). Additionally, some HPSC banks rely on terms such as “Other” or “More than one race”, which fail to capture the increasing degree of ancestral complexity in global populations and essentially exclude such individuals from a accurate representation. These issues complicate the identification of relevant HPSC lineages for further insights from human genomic datasets. For example, the HLA-B*5701 variant associated with hypersensitivity to Abacavir, a drug used to treat HIV, has a frequency of 13.6% among individuals of the Masai group in Kenya, 0% among individuals of the Yoruba group in Nigeria and 5.8% among people of European descent29. Here, the allelic variant does not separate in the population terms used in hPSC libraries. While different studies will require different levels of granularity in the populations studied, current estimates place the number of subcontinental ancestries at a minimum of 21, with 97.3% of individuals harboring ancestral heterogeneity.30. In other words, while many hPSC collections were started before or at the same time as large-scale genomics initiatives (Fig. 1a), it is critical that hPSC collections now think about how best to s adapt to the rapidly expanding genomic knowledge of diverse ancestral populations (Fig. 1c).

a On the left, examples of how individuals of European (blue) and Asian (green) ancestry are reported in current hPSC banks, including CIRM (US), WiCell (US), Coriell (US ), SKiP (Japan) and HipSci (UK). On the right, examples of how individuals of European (blue) and Asian (green) ancestry are flagged in human genomic studies, including Bergstrom et al. 2020 (Human Genome Diversity Project (HGDP))37, Karczewski et al. 2020 (gnomeAD)38 and Smedley et al. 2021 (pilot 100,000 genomes)39. b Key recommendations for expanding hPSC diversity. Map adapted from Templates by Yourfreetemplates.com/.
Ideally, for participants who have consented to iPSC derivation, material for iPSC reprogramming and banking would be collected alongside material for genomic and/or phenotypic studies, providing a direct link between this data and available cellular resources (Fig. 2b). Notably, such initiatives should be combined with community engagement efforts to ensure that participants have an appropriate understanding of how their samples will be used and stored, the possible scientific and medical benefits that could result, but also that these benefits may not be immediate and/or personal. Efforts such as NeuroDev have succeeded in establishing sample collections for exome sequencing as well as the lymphocyte cell bank of individuals in South Africa31 and similar approaches could be undertaken for future hPSC collections. In collaboration with participants of the Brazilian Longitudinal Study of Adult Health (ELSA-Brasil), Brazilian laboratories derived iPSC lines and performed ancestry analyzes to establish cellular resources that better reflect the Brazilian population, related to clinical phenotyping data32. Alternatively, as groups like TOPMed33 and 1000 Genomes34 expand their reference populations and that cell banks like the California Institute for Regenerative Medicine use SNP chips to test genomic integrity, this data could be combined to make more accurate ancestry predictions instead of relying solely on self-reported race or ethnicity. These more diversified reference panels will also make it possible to deepen the analyzes of experimental models from under-represented populations. At a minimum, standardized and more accurate descriptions of race, ethnicity, and ancestry should be used in future hPSC collections. Here, frameworks developed for reporting data in genomic studies could be leveraged to provide greater harmonization across disciplines (e.g. Morales 2018 Genome Biology)35.
#Greater #genetic #diversity #needed #human #pluripotent #stem #cell #models #Nature #Communications