Summarries

I need a 2 article summaries. each summary need to be 2-3 paragraph per article.

The screenshot attached shows the instructions for the articles under “alternate extra credit opportunity”.

Also attached are articles to be summarized.

I need this in the next 15 hours.

A rticle

A Genetic Mechanism for Convergent Skin Lightening during Recent Human Evolution Zhaohui Yang,†,1,2,13 Hua Zhong,†,3 Jing Chen,†,4 Xiaoming Zhang,†,1 Hui Zhang,1 Xin Luo,1,2 Shuhua Xu,5

Hua Chen,6 Dongsheng Lu,5 Yinglun Han,7 Jinkun Li,8 Lijie Fu,8 Xuebin Qi,1 Yi Peng,1 Kun Xiang,1

Qiang Lin,1,2 Yan Guo,1 Ming Li,1 Xiangyu Cao,1 Yanfeng Zhang,1 Shiyu Liao,1 Yingmei Peng,1 Lin Zhang,4

Xiaosen Guo,9,10 Shanshan Dong,9 Fan Liang,9 Jun Wang,9,10 Andrew Willden,1 Hong Seang Aun,11

Bun Serey,11 Tuot Sovannary,11 Long Bunnath,11 Ham Samnom,12 Graeme Mardon,3 Qingwei Li,7

Anming Meng,*,4 Hong Shi,*,1,13 and Bing Su*,1 1State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China 2Kunming College of Life Science, University of Chinese Academy of Sciences, Beijing, China 3Department of Pathology and Immunology, Baylor College of Medicine, Houston, TX 4State Key Laboratory of Membrane Biology, School of Life Sciences, Tsinghua-Peking Center for Life Sciences, Tsinghua University, Beijing, China 5Max Planck Independent Research Group on Population Genomics, Chinese Academy of Sciences and Max Planck Society (CAS- MPG) Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China 6Center for Computational Genomics, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China 7College of Life Science, Liaoning Normal University, Dalian, China 8Department of Urology, First Affiliated Hospital of Kunming Medical University, Kunming, China 9BGI-Shenzhen, Shenzhen, China 10Department of Biology, University of Copenhagen, Copenhagen, Denmark 11Geography and Land Management, Royal University of Phnom Penh, Phnom Penh, Kingdom of Cambodia 12Capacity Development Facilitator for Handicap International Federation and Freelance Researcher, Battambang, Kingdom of Cambodia 13Institute of Primate Translational Medicine, Kunming University of Science and Technology, Kunming, Yunnan, China †These authors contributed equally to this work.

*Corresponding author: E-mail: sub@mail.kiz.ac.cn; mengam@mail.tsinghua.edu.cn; shihong@kbimed.com.

Associate editor: Anna Di Rienzo

Abstract

Skin lightening among Eurasians is thought to have been a convergence occurring independently in Europe and East Asia as an adaptation to high latitude environments. Among Europeans, several genes responsible for such lightening have been found, but the information available for East Asians is much more limited. Here, a genome-wide comparison between dark-skinned Africans and Austro-Asiatic speaking aborigines and light-skinned northern Han Chinese identified the pigmentation gene OCA2, showing unusually deep allelic divergence between these groups. An amino acid substi- tution (His615Arg) of OCA2 prevalent in most East Asian populations—but absent in Africans and Europeans—was significantly associated with skin lightening among northern Han Chinese. Further transgenic and targeted gene mod- ification analyses of zebrafish and mouse both exhibited the phenotypic effect of the OCA2 variant manifesting decreased melanin production. These results indicate that OCA2 plays an important role in the convergent skin lightening of East Asians during recent human evolution.

Key words: OCA2, East Asians, skin lightening, pigmentation genes, adaptation, natural selection.

Introduction Light skin is thought to be an adaptation gained by modern humans needed to cope with the new environmental and biological factors encountered after exiting Africa about 100,000 years ago (Loomis 1967; Jablonski and Chaplin 2000; Chaplin 2004). Among many environmental factors, ultraviolet

(UV) is required for the synthesis of vitamin D, a key nutrient for human bone development (Loomis 1967), and selection on vitamin D synthesis would favor a lighter skin for people living in high latitude regions with relatively weak UV radiation.

Previous investigations into the bases of skin lightening among Europeans found a set of pigmentation genes (e.g.,

� The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com Open Access Mol. Biol. Evol. 33(5):1177–1187 doi:10.1093/molbev/msw003 Advance Access publication January 6, 2016 1177

http://creativecommons.org/licenses/by-nc/4.0/
SLC24A5 and SLC45A2) responsible for their light skin (Graf et al. 2005; Lamason et al. 2005; Izagirre et al. 2006; Yuasa et al. 2006; Lao et al. 2007; Norton et al. 2007; Sabeti et al. 2007; Soejima and Koda 2007; Han et al. 2008; Beleza et al. 2013; Wilde et al. 2014). The Ala111Thr variant of SLC24A5 proved the most striking, being nearly fixed in Europeans (495%) due to local Darwinian positive selection, but extremely rare in Africans and East Asians (Lamason et al. 2005). Curiously, while East Asians share similarly light skin, the identified mu- tations of the pigmentation genes found in Europeans are rare in East Asians; instead, other pigmentation genes (e.g., OCA2, DCT, and ATRN) have been reported under selection in East Asians. For example, the variants of OCA2 (rs1800414 and rs74653330) were shown to be associated with skin melanin content (Edwards et al. 2010; Eaton et al. 2015). The DCT gene harbored a signature of positive selection in Chinese using multiple neutrality tests (Lao et al. 2007; Myles et al. 2007), and many loci in ATRN showed extended blocks in East Asians, an indication of selection (Norton et al. 2007). These data suggest convergent evolution of light skin in both East Asians and Europeans (Lao et al. 2007; Myles et al. 2007; Norton et al. 2007; Edwards et al. 2010). However, no functional analysis has been conducted to reveal the underlying genetic mechanism of how these pig- mentation genes affect melanin production in the skin and cause skin lightening in East Asians.

Compared with Europeans north of the Mediterranean, East Asian populations living between the equator and North Pole manifest a wider spectrum of skin pigmentation (fig. 1A). Previous studies proposed that in the population tree, the dark-skinned Austro-Asiatic speaking populations (abbreviated as A-A) currently residing in southwestern China and Southeast Asia occupy a more basal clade than the light-skinned northern populations who migrated north- ward some 25,000–30,000 years ago (supplementary fig. S1, Supplementary Material online) (Su et al. 1999; Shi et al. 2005; Consortium et al. 2009). By extension, the A-A group may represent the skin type of early modern human settlers in eastern Asia. Accordingly, a comparative genomic compari- son between light-skinned northern populations (e.g., north- ern Han Chinese, CHN) and dark-skinned A-A as well as African populations may help identify the pigmentation gene(s) responsible for skin lightening in East Asians. Meanwhile, the phenotypic effect of selected mutations in the pigmentation gene(s) can be modeled in laboratory an- imals to reveal the genetic mechanism.

Results

Skin Pigmentation in CHN and A-A Populations

We measured the level of darkness of the constitutive skin areas (underarm and buttock, unexposed to sunlight) and the facultative skin area (hand back, exposed to sunlight) of 1,518 unrelated individuals from A-A (1,159 individuals from 15 geographic regions) and CHN (359 individuals from 8 northern provinces of China) populations (supplementary table S1, Supplementary Material online). The L* value (a measurement of skin pigmentation reflectance) was used to quantify the level

of darkness, that is, lower L* values denote darker skin. A-A populations showed markedly darker skin than the CHN pop- ulation for both the constitutive and facultative areas (fig. 1 and supplementary fig. S2, Supplementary Material online). After adding published data from African Americans and European Americans (underarm) (Shriver and Parra 2000), there was a clear similarity between A-A and African Americans and also between CHN and European Americans (fig. 1B).

Genome-Wide Analysis of Sequence Variations

Using an Affymetrix Genome-wide Human Single Nucleotide Polymorphism (SNP) Array 6.0 (4900,000 SNPs), we geno- typed 48 individuals randomly selected from A-A populations

47.50 53.84 65.44 69.86

0 10 20 30 40 50 60 70 80 90 100

Africans(9) A-A(1,159) CHN(359) Europeans(55)

L* (u nd er ar m )

A

B

FIG. 1. Skin pigmentation variation in world populations. (A) Skin pig- mentation comparison between Austro-Asiatic speakers (left) and north- ern Han Chinese (right); these individuals represent the middle of skin color values of each population. (B) Comparison of skin darkness among world populations. The y axis indicates the L* value of underarm. The average L* value of each population is shown on the histogram, and the error bar indicates standard deviation. The L* values of Africans (African Americans), Europeans (European Americans) were from a previous report (Shriver and Parra 2000). The data of A-A (Austro-Asiatic speak- ers) and CHN (northern Han Chinese) were collected in this study. Sample size is indicated in the parenthesis.

1178

Yang et al. . doi:10.1093/molbev/msw003 MBE

http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
of southwestern China. Paired with published Affymetrix 6.0 data of other populations (International HapMap et al. 2010), including Han Chinese from Beijing (CHB, 89 individuals), Han Chinese from Denver (CHD, 90 individuals), Japanese (JPT, 90 individuals), Europeans (CEU, 118 individuals), and Africans (YRI, 120 individuals), principal component analysis (PCA) indicated a close relationship among Asian populations (A- A, CHB, CHD, and JPT) (supplementary fig. S3A, Supplementary Material online), with some local divergence (supplementary fig. S3B, Supplementary Material online).

To identify the genes responsible for skin pigmentation divergence, we looked for genes showing extraordinarily large divergence between CHN (with CHB used to represent CHN) and A-A and Africans (with YRI used to represent Africans) by calculating between-population allele frequency difference of each SNP (measured by FST) (Weir and Hill 2002). Compared with the randomly selected SNPs, SNPs located in pigmentation genes showed larger enrichment scores (P < 0.001, �2 test) (supplementary fig. S4, Supplementary Material online), suggesting that pigmentation genes may be under selection in CHB.

In total, we identified 1,283 (CHB vs. A-A), 1,640 (CHB vs. YRI), and 1,706 gene regions (YRI vs. A-A) showing large di- vergences (FST, top 1% regions). Overlapping these gene re- gions with 171 known pigmentation genes from the human genome (Color Genes database, http://www.espcr.org/mice mut/, last accessed September 03, 2015) yielded three gene- sets (15, 21 and 19 genes, respectively), among which three pigmentation genes (OCA2, EDAR, and PAH) showed consis- tent signals (supplementary table S2, Supplementary Material online). In the genome-wide FST analysis, we pooled the sam- ples of three A-A populations from China, and the potential population substructure may cause false negatives. Consequently, the listed gene sets are conservative callings of genes under selection (supplementary table S2, Supplementary Material online). The identified candidate pig- mentation genes are mostly different from those in Europeans, supporting the hypothesized convergent evolu- tion of skin pigmentation in East Asia and Europe.

Compared with EDAR and OCA2, the signal of PAH is rel- atively weak and its functional role in pigmentation remains elusive (Schallreuter et al. 1999). While EDAR has accumu- lated East Asian specific mutations due to local selection, its functional effect lies in an increased number of active eccrine glands (Kamberov et al. 2013). In contrast, OCA2 showed the largest between-population (light-skinned vs. dark-skinned) divergence and was previously reported to be associated with melanin content (Edwards et al. 2010; Eaton et al. 2015). For further analysis, we focused on OCA2.

OCA2 Accounts for Light Skin in CHN

To search for functional mutation(s) of OCA2, we conducted amplicon sequencing of 40 randomly selected A-A individuals (25 from China and 15 from Cambodia), and obtained se- quences of the entire genomic region of OCA2 (344.5 kb) which we compared to data from other populations collected within the 1000 Human Genomes Project (Genomes Project

et al. 2012). First, the maps of linkage disequilibrium (LD) (supplementary fig. S5, Supplementary Material online) showed that CHB exhibited the strongest LD, which could be explained either by Darwinian positive selection or popu- lation history (e.g., bottleneck effect).

Within the gene region of OCA2, we observed a peak of allele frequency divergence (measured by FST) (Weir and Hill 2002) between CHB and A-A, spanning about 44.2 kb, in which rs1800414 was among a group of SNPs showing large FST values (fig. 2A). Further tests via extended haplotype ho- mozygosity (EHH) (Sabeti et al. 2002) and Fay and Wu’s H statistics (Fay and Wu 2000) (the 44.2 kb region was ana- lyzed) supported the signal of selection in CHB (supplemen- tary fig. S6 and table S3, Supplementary Material online), in line with the previous report using other tests of selection (LSBL statistics, InRH, and Tajima’D test) (Edwards et al. 2010). Conversely, no selective signal was observed in A-A and Europeans (supplementary table S3, Supplementary Material online). rs1800414 is a nonsynonymous SNP located in exon 17 of OCA2, and an A to G mutation causes an amino acid change from histidine to arginine (His615Arg). This mis- sense mutation is located in the melanosomal cytoplasmic topological domain (http://www.uniprot.org/, last accessed September 15, 2015). The geographic distribution of this SNP across world populations (data from HGDP database: http://hgdp.uchicago.edu, last accessed July 10, 2015) showed that the derived G allele of rs1800414 is highly prev- alent in most East Asian populations (G allele 450% in Han Chinese), as well as in American Indians, but is in low fre- quency in A-A populations (G allele = 16.0% [A-A from China]; 21.3% [A-A from Cambodia]), totally absent in Africans, and absent or extremely rare in western Europeans (G allele <0.3%) (fig. 2B). This piece of evidence suggests that rs1800414 is likely an East Asian-specific muta- tion, consistent with the proposed local selection on this SNP and an independent evolution of skin pigmentation in the area (Edwards et al. 2010).

Assuming rs1800414 being the functional SNP under se- lection, we estimated the selective intensity and the time of selection using 1000 Human Genomes Project data (Genomes Project et al. 2012). In CHB, the selective coefficient was estimated to be 0.0157 (95% confidence interval [CI]: 0.0127–0.0186). Assuming a generation time of 25 years, the time of selection onset was estimated at 15,224 years ago (95% CI: 12,818–18,750 years ago), which is slightly older than a recently reported age (10,660 years ago with 95% CI of 8,070–15,780 years ago) using a Hidden Markov model (Chen and Slatking, 2015). The estimated age falls into the Upper Paleolithic, following the proposed northward migra- tion of modern humans in East Asia some 25,000–30,000 years ago (Shi et al. 2005).

We conducted skin phenotyping of CHN (359 individuals) to determine whether the OCA2 SNPs affect skin pigmenta- tion. We analyzed five SNPs located in the 44.2 kb region and three SNPs outside this region with strong selective signals. Under the additive genetic model, we detected a significant association of rs1800414 with skin pigmentation for both ex- posed (P = 5.12� 10�4, hand back) and unexposed (P = 5.34

1179

Skin Lightening during Recent Human Evolution . doi:10.1093/molbev/msw003 MBE

http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://www.espcr.org/micemut/
http://www.espcr.org/micemut/
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://www.uniprot.org/
http://hgdp.uchicago.edu
� 10�6, underarm and P = 1.75 � 10�8, buttock) areas (as- sociation remained significant after Bonferroni correction; fig. 3A), consistent with a previous association analysis of rs1800414 (Edwards et al. 2010; Eaton et al. 2015). No signif- icant association was observed for the other seven SNPs (P4 0.05), except for rs12440942, with a marginal significance for skin on the buttock (P = 0.0395) (supplementary table S4, Supplementary Material online). When conditioning for rs1800414, rs12440942 became nonsignificant (P = 0.632).

Human Skins of Different rs1800414 Genotypes Have Different Melanin Productivity

To test whether rs1800414 affects skin pigmentation, we con- ducted primary cell culture and purification of melanocytes (human foreskin tissue was used) according to published pro- tocols (Cook et al. 2003), and established 11 melanocyte cell lines covering the three genotypes of rs1800414 (4 AA, 3 AG, and 4 GG). We quantified the relative abundance of melanin

in each cell line using the total amount of proteins as a control according to published protocols (Cook et al. 2009). As pre- dicted, homozygotes of the derived G allele (Arg615) had the least amount of melanin (correspondent to a relatively lighter skin), whereas the homozygotes of the ancestral A allele (His615) had the highest amount (correspondent to a rela- tively darker skin), leaving the heterozygotes between the two types of homozygotes (fig. 3B). Surprisingly, the ancestral allele of rs1800414 (His615) is highly conserved among species from zebrafish to human (fig. 3C). The measurement of GERP’s (Genomic Evolutionary Rate Profiling) score suggested a strong functional constraint on this site during evolution (GERP’s score = 366, P = 2.93� 10�72) (http://mendel.stan- ford.edu/SidowLab, last accessed October 20, 2015; Cooper et al. 2005), implying that the Arg615 substitution may change OCA2 function.

Equivalent A and G Alleles of rs1800414 in Zebrafish Have Different Functions

The deficiency of Oca2 leads to hypopigmentation in the skin melanophores and the retinal pigment epithelium, indicating an essential role in melanin synthesis (Beirl et al. 2014). The zebrafish Oca2 protein has His610 residue equivalent to His615 of human OCA2. The underpigmentation of melano- phores depleted of oca2 allows us to test functional relevance of the His610Arg mutation genocopying the A to G substitute at rs1800414 in human. The expression of oca2 could be ef- fectively blocked by injection of an oca2-specific antisense morpholino (oca2-MO) (supplementary fig. S7A, Supplementary Material online) and oca2 morphants ex- hibited progressive loss of melanophore pigmentation in a dose-dependent manner at 36 or 48 h postfertilization (sup- plementary fig. S7B, Supplementary Material online), which phenocopied oca2 mutants (Beirl et al. 2014). Next, we per- formed rescue experiments using wild-type zebrafish oca2 mRNA (wt(A) with His610) or mutant oca2 mRNA (mt(G) with His610Arg mutation). Results indicated that coinjection of 100 pg oca2 wt (A) mRNA largely recovered melanophore pigmentation in oca2 morphants whereas coinjection of 100 pg oca2 mt (G) mRNA failed to do so (fig. 4A and B). Hence, His610 of zebrafish Oca2 protein is important for melanin synthesis, supporting the idea that the rs1800414 variant of human OCA2 may be functionally associated with skin coloration.

Mice Homozygous for G Allele-Like Oca2 (His610Arg) Display Light Coloration

To further test the phenotypic effect of rs1800414 on melanin production and skin pigmentation, using the CRISPR/Cas9 approach (Li et al. 2013; Wang et al. 2013; Yang et al. 2013), we generated two gene-modified mouse strains using black hair C57BL/6 mice (WT), including the Oca2 knockout strain (KO) (a frame-shift mutation leading to a truncated Oca2) and the rs1800414 targeted modification strain with His610Arg mutation resembling human His615Arg mutation (#1 mutation, supplementary fig. S8A and B, Supplementary Material online). We also generated a control strain carrying

Ancestral Allele: A Derived Allele: G

0° 30° 60° 90° 120° 150° -30°

30°

60°

270° 300°

30°

28000 28050 28100 28150 28200 28250 28300 0

0.05

0.10

0.15

0.20

0.25

0.30

Position (kb)

rs1800414

A v er

ag ed

F S

T

44.2 kb

A

B

Reverse strand 344.48 kb

SNP: rs1800414

FIG. 2. Genetic divergence between A-A and CHN and geographic distribution of rs1800414 in world populations. (A) Genetic divergence between A-A and CHN in the sequenced OCA2 gene region (~344.5 kb) (measured by the averaged FST). The averaged FST of each position is the simple moving average with a sliding window size of 10 kb. The rs1800414 (showing the largest FST value) is labeled. (B) Geographic distribution of rs1800414 derived allele frequencies in world popula- tions. The rs1800414 genotyping data of one CHN population and two A-A populations (Cambodia and China, marked with “red circles”) were from this study. Data of the other populations were from the HGDP database (http://hgdp.uchicago.edu/, last accessed July 10, 2015).

1180

Yang et al. . doi:10.1093/molbev/msw003 MBE

http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mendel.stanford.edu/SidowLab
http://mendel.stanford.edu/SidowLab
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://hgdp.uchicago.edu/
two Oca2 synonymous mutations (WT’) (2 and 3 mutations, supplementary fig. S8A and B, Supplementary Material online) in order to rule out any potential effect of these two synonymous mutations introduced when generating the targeted modification using CRISPR/Cas9. No off-target effect was detected when sequencing the top five predicted potential off-targeted sites in all mouse strains (supplemen- tary table S5, Supplementary Material online).

As expected, the KO mice displayed white body hair, white tail skin, white paws, and pink iris (fig. 4C), in line with the phenotypes reported in previous Oca2 studies (Gardner et al. 1992; Rinchik et al. 1993). These results indicate that Oca2 is indeed a key gene for melanin production. Conversely, mice carrying the two Oca2 synonymous mutations (WT’) showed the same coloration with the wild-type controls, that is, black body hair, dark tail skin, dark paws, and black iris (fig. 4C), implying that the two synonymous mutations do not inter- fere Oca2 function.

In line with the phenotypic effect observed in the zebrafish, the targeted modification mice (TM), homozygous for the derived allele of rs1800414, showed light coloration, including brown/gray body hair, light tail skin and light paws, but with no obvious color change seen in the iris (fig. 4C). We con- ducted quantitative measurements of hair and skin pigmen- tation of the four mouse strains. The TM mice showed

a significantly lighter coloration than the WT/WT’ mice (fig. 4E, supplementary fig. S9 and tables S6 and S7, Supplementary Material online). Sequencing of Oca2 RNA transcripts did not detect any change of gene alternative splicing among the mouse strains (supplementary fig. S8C, Supplementary Material online), suggesting that the pheno- typic effect of rs1800414 is caused by protein functional al- teration due to the observed amino acid change.

We compared the distributions of melanosomes in the tail skin among different mouse strains by transmission electron microscope. At cellular level, the TM mice showed a clear reduction of mature melanosomes compared with the WT and WT’ mice and the KO mice appeared to lack mature melanosomes (fig. 4D). These results imply that the derived allele of rs1800414 alters the function of Oca2, ultimately af- fecting the maturation and/or transportation of melano- somes in the melanocytes.

Discussion Here, we identified a set of pigmentation genes showing large genetic divergence between dark-skinned (A-A and YRI) and light-skinned populations (CHN) (supplementary fig. S4 and table S2, Supplementary Material online), which comprise potential candidates contributing to skin lightening among East Asians. The sustained dark skin of A-A people is likely due

P = 0.016

P = 0.015 Human

Chimpanzee

Gibbon

Macaque

Marmoset Rat

Mouse Dog Wild boar Horse

Rabbit

Elephant Chicken King Cobra

Xenopus

Zebrafish

A

B C

*rs1800414

L * L *L *

Hand back Underarm Buttock P = 5.12×10-4 P = 5.34×10-6 P = 1.75×10-8

70

65

60

55

50

45

80

75

70

65

60

55

AA(77) AG(174) GG(108)

AA(4) AG(3) GG(4)

M el

an in

/P ro

te in

(u g/

m g)

100

AA(77) AG(174) GG(108) AA(77) AG(174) GG(108) 40 50

80

75

70

65

60

55

50

80

60

40

20

0

FIG. 3. Genetic association of rs1800414 with skin pigmentation in northern Han Chinese. (A) Measurements of skin darkness (by L* value) of three rs1800414 genotypes. The P values (additive genetic model) are shown in the scatted plots. The y axis represents the L* value; the x axis shows three rs1800414 genotypes with sample sizes in the parentheses. The “error bar” indicates standard deviation. (B) The levels of melanin production in human foreskin melanocytes with different rs1800414 genotypes. Two-tailed t-test was used for statistical evaluation. The y axis represents relative abundance of melanin (Melanin/Protein, mg/mg); the x axis shows the three genotypes with sample size in parentheses. The error bar indicates standard deviation. (C) OCA2 protein sequence (human protein position 614–642) alignment among vertebrate species and rs1800414 (labeled with an asterisk) is highly conserved. The protein sequences were from NCBI (http://www.ncbi.nlm.nih.gov/, last accessed July 10, 2015).

1181

Skin Lightening during Recent Human Evolution . doi:10.1093/molbev/msw003 MBE

http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://www.ncbi.nlm.nih.gov/
to their settling in low latitude tropical areas of Asia, which are similar to those inhabited by our African ancestors. When these groups migrated northward in East Asia about 25,000–30,000 years (Jin and Su 2000; Shi et al. 2005), natural selection would seemingly favor lighter skin, due to the re- duced UV radiation at high latitudes. Though multiple

pigmentation genes might be involved in this transition, our results strongly indicate that OCA2 served as a major- effect gene conferring skin lightening of East Asians.

OCA2 is one of the key players in the pigmentation path- way, and previous reports associated OCA2 with abnormal skin pigmentation, oculocutaneous albinism type 2 (OCA2)

A

B

Class I

Class III Class IV

Class II

Class I

Class III

Class IV

Class II

N = 137 111 107 96

% o

f e m

b ry

o s

0

20

40

60

80

100

oca2-MO

oca2-mRNA – – – 2 ng 2 ng 2 ng

mt(G) 100 pg wt(A) 100 pg

C

D

WT WT’

KO TM

WT WT’

KO TM

WT WT’ KOTM

L *

P 0.05 P < 0.05

E

Hair (backside-left)

Hair (backside-middle)

Hair (backside-right)

Skin (tail-up)

80

70

60

50

40

30

20

10

FIG. 4. Transgenic and targeted gene modification analyses of zebrafishes and mice. (A and B) Effect of oca2 wt(A) or mt(G) mRNA overexpression in oca2 morphants. (A) The embryos at 36 h postfertilization were classified into four classes based on melanophore pigmentation. Class IV, wild-type pigmentation; Class I–III, different levels of pigmentation reduction. (B) Ratios of different classes of embryos. N, the total number of observed embryos. Note that, unlike wt(A) mRNA, mt(G) mRNA coinjection could not efficiently restore pigmentation in oca2-MO injected embryos. (C) Coloration comparison among different mouse strains, including the wild-type strain (WT), the strain containing two synonymous mutations (WT’), the knockout strain (KO), and the targeted modification strain (TM). (D) The distribution of melanosomes in the tail skin of different mouse strains. The “arrow” indicates the melanosomes in sectioned skin. (E) Quantification of hair and skin pigmentation among the mouse strains (WT, WT’, TM, and KO). The y axis indicates the L* value of the four measured body parts with the error bars indicating standard deviations. The P values are shown in the plot (Wilcoxon rank sum test). More details are presented in supplementary tables S6 and S7, Supplementary Material online.

1182

Yang et al. . doi:10.1093/molbev/msw003 MBE

http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
(Gardner et al. 1992; Rinchik et al. 1993; Lee et al. 1994). The protein encoded by OCA2 (a 12-transmembrane protein) is localized on the melanosomal membrane (Rinchik et al. 1993; Rosemblat et al. 1994), and may be involved in the control of L-tyrosine transport (Toyofuku et al. 2002), pH adjustment in melanosome (Puri et al. 2000), glutathione metabolism (Staleva et al. 2002), and tyrosinase processing (Chen et al. 2002). We showed that the nonsynonymous SNP rs1800414 is the most diverged variant between CHN and A-A, and signif- icantly associated with skin pigmentation for the tested skin areas in CHN (fig. 3A). Moreover, the derived G allele homo- zygotes produced the least amount of melanin in the cultured melanocytes, and showed the lightest skin in northern Han Chinese (fig. 3B). This phenotypic effect was confirmed in the gene-modified zebrafishes and mice (fig. 4), suggesting that the His615Arg mutation at rs1800414 affects the maturation and/or transportation of melanosomes, eventually leading to skin lightening.

Given the demonstrated role of OCA2 for skin lightening of East Asians and previous reports of different pigmentation genes responsible for light skin in Europeans (SLC24A5 and SLC45A2) (Graf et al. 2005; Lamason et al. 2005; Izagirre et al. 2006; Yuasa et al. 2006; Lao et al. 2007; Norton et al. 2007; Sabeti et al. 2007; Soejima and Koda 2007; Han et al. 2008; Beleza et al. 2013; Wilde et al. 2014), our data support a con- vergent evolution of skin pigmentation in East Asians (Lao et al. 2007; Myles et al. 2007; Norton et al. 2007; Edwards et al. 2010). Although OCA2 also showed a potential signal of se- lection in Europeans (supplementary table S2, Supplementary Material online; Lao et al. 2007; Norton et al. 2007), the causal SNP rs1800414 enriched in East Asians is nearly absent in Europeans. Intriguingly, we found that the proteins of OCA2, SLC24A5, and SLC45A2 are all trans-membrane pro- teins located on the melanosomal membrane, implying that similar molecular checkpoints were acted on by natural se- lection, eventually leading to independent skin-lightening in both East Asians and Europeans, who live in high latitude areas with weak light conditions.

There are also pigmentation genes (DCT, EGFR, DRD2, and MC1R) reportedly under selection in dark-skinned people (e.g., Africans) (Harding et al. 2000; John et al. 2003; Lao et al. 2007), which have been explained by the protection conferred by melanin against the harmful effects of UV radi- ation, including protection against sunburn and folate de- struction. For light-skinned people, because UV is required for the synthesis of vitamin D, a key nutrient for bone devel- opment (Loomis 1967), and selection on vitamin D synthesis would favor a lighter skin at these high latitude regions and thereby explain the selective signal of OCA2 in northern Han Chinese. However, we cannot rule out other environmental and biological factors (e.g., diet change and/or sexual selec- tion) that may also create selective pressures on skin lighten- ing. Despite differences in the underlying causes, our evidence for a convergent evolution of skin lightening in Europe and East Asia represents an excellent case of how natural selection has shaped our phenotypic diversity in recent human evolu- tionary history.

Materials and Methods

Sample Collection and Skin Pigmentation Measurement

We collected blood samples from 1,159 Austro-Asiatic speak- ing individuals (716 females and 443 males) living in south- western China (Yunnan province) and northeastern Cambodia as well as 359 blood samples (210 females and 149 males) from northern Han Chinese of China (supplemen- tary table S1, Supplementary Material online). Sample collec- tion was conducted randomly and biologically related individuals were excluded from this study. DNA samples were prepared using the standard phenol–chloroform method.

We used a CR-400 tristimulus colorimeter (Konica Minolta, Tokyo, Japan) to measure skin pigmentation, which generates three numerical readings that can express the color of any object: L* (levels of darkness), a* (amount of green or red), and b* (amount of yellow or blue). Both con- stitutive skin pigmentation from skin not exposed to sunlight (underarm and buttock) and facultative skin pigmentation that is exposed to sunlight (back of hand) were measured for comparisons. To minimize technical variations, each area was measured three times, and the average used for statistical analysis. Due to the known seasonal variation in skin pheno- type, skin pigmentation measurements were conducted in the same season (summer of 2011–2012) for all populations, the same seasons with the published data (Shriver and Parra 2000). GraphPad Prism 5 was used to plot L* distribution.

Written informed consent was obtained from each indi- vidual prior to their inclusion in the study. Additional written consent was obtained for use of portrait photos. All protocols of this study were approved by the Institutional Review Board of Kunming Institute of Zoology, Chinese Academy of Sciences (approval No: KMDWS-SEYX-20120101010).

Genome-Wide SNP Analysis

We employed an Affymetrix Genome-wide Human SNP Array 6.0, which includes 906,600 SNPs with an average inter-SNP distance of 1.5 kb, to scan 48 randomly selected Austro-Asiatic individuals from southwestern China (20 Wa, 10 Bulang, and 18 De’ang) (supplementary table S1, Supplementary Material online). The published Affymetrix data of Han Chinese from Beijing (CHB, 89 individuals) (International HapMap et al. 2010) was used to represent northern Han Chinese and compared with the A-A population.

PCA was used to show the relationship of the A-A samples with other populations. The SNPs were pruned based on LD for PCA, no particular genomic regions were removed prior to the PCA.

We also calculated the enrichment score for each FST bin when comparing A-A and CHB among genes involved in the pigmentation pathways and those not involved. We divided the FST values into five bins with the value interval of 0.05 in each bin. Simply, the FST values assigned to each bin were (0–0.05), (0.05–0.1), (0.1–0.15), (0.15–0.2), and (�0.2), as

1183

Skin Lightening during Recent Human Evolution . doi:10.1093/molbev/msw003 MBE

http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
shown in supplementary fig. S4, Supplementary Material online. The genes spanning the SNP were assigned to the relevant FST bin based on the FST value of that SNP. The enrichment score in each bin was calculated based on 1,000 bootstrapping data sets from random SNPs or pigmentation- associated SNPs, the score was the fraction of SNPs in each bin, divided by the fraction of all SNPs from known genes except for the pigmentation associated genes.

To identify genomic regions showing deep divergence be- tween dark-skinned (A-A and YRI) and light-skinned popu- lations (CHN), we used the moments estimate of FST described by Weir and Hill to measure between-population genetic divergence (Weir and Hill 2002). Overlapping with 171 known pigmentation genes from the human genome (Color Genes database, http://www.espcr.org/micemut/, last accessed September 20, 2015) yielded 15 (CHB vs. A-A), 21 (CHB vs. YRI), and 19 (YRI vs. A-A) pigmentation gene regions with large divergence (FST, top 1% gene regions).

Resequencing of OCA2 and Molecular Evolution Analysis

For sequencing, we randomly selected 40 A-A individuals (25 from China and 15 from Cambodia) and designed a series of primers to cover the entire 344.483 kb genomic region of OCA2. Using data from the 1000 Human Genomes Project (Genomes Project et al. 2012), we extracted all OCA2 SNPs of individuals from populations of Han Chinese (CHB 97), west- ern Europeans (CEU 85), and Africans (YRI 88).

Haploview software (D’ algorithm) (Bandelt et al. 1999) was used to analyze LD pattern of OCA2 in different popula- tions. Genetic divergence between A-A and CHN in the se- quenced OCA2 genomic region (~344.5 kb) was measured by the averaged FST. The averaged FST of each position is the simple moving average with a sliding window size of 10 kb. We also used the EHH test described by Sabeti et al. (2002) for detecting recent positive selection displaying incomplete se- lective sweep, that is, the selected allele has not reached fix- ation. We carried out tests of neutrality following the method developed by Fay and Wu (2000), using the program DnaSPv5 (Librado and Rozas 2009). A significant negative Fay and Wu’s H indicates an excess of high-frequency derived alleles due to Darwinian positive selection.

Inferring Allele Age and Intensity of Selection of rs1800414

The SNP data covering a region of 942.7 kb for 97 CHB indi- viduals was obtained from the 1000 Genomes Project (Genomes Project et al. 2012). We used fastPHASE (Scheet and Stephens 2006) to infer the haplotype phase from SNP genotypes. The inferred haplotypes were then used in further analysis. Starting from the putatively selected mutant posi- tion, we ran a hidden Markov model to identify ancestral haplotypes retained around the vicinity of the selected mutant. The extent of ancestral haplotypes was recorded and the likelihood function was built based on the haplotype extent. We adopted the importance sampling approach by Chen and Slatkin (2013) to infer the parameters, including

selection intensity and allele age (see the original paper for the details of the Materials and Methods). A generation time of 25 years was used.

Genetic Association Analysis

According to the result of molecular evolution analysis of OCA2, we chose eight weak linkage SNPs as possible for asso- ciation analysis, including five SNPs located in the 44.2 kb region with strong selective signal and other three SNPs out- side this region. A total of 359 blood samples (210 females and 149 males) from Han Chinese individuals (CHN, light-skinned) from northern China (supplementary table S1, Supplementary Material online) were used for skin pigmentation association analysis. Genotyping was conducted by the SnapShots method on an ABI 3130 sequencer (Applied Biosystems, Forster city, CA, USA). Skin pigmentation association of single SNP with L* (hand back, underarm, and buttock) were tested by utilizing PLINK v1.07 (Purcell et al. 2007), using the linear regression option, with age and sex as covariates.

Melanocyte Culture and In Vitro Analysis of Melanin Synthesis

We collected foreskin tissue samples of male individuals (8–20 years old) who had circumcisions at the local hospitals. Written informed consent was obtained from each donor or donor’s parents prior to sample collection. The published method by Cook et al. (2003) was used to culture and purify melanocytes. We used the melanin analytical method by Cook et al. (2009) to calculate the total amount of melanin and protein of melanocytes in each cell culture dish. Total melanin and protein were isolated from cells with different rs1800414 genotypes (4 AA, 3 AG, and 4 GG). The protocol of this study was approved by the Institutional Review Board of Kunming Institute of Zoology, Chinese Academy of Sciences (approval No: KMDWS-SEYX-20120101010).

Functional Study of oca2 in Zebrafish Embryos

Wild-type embryos of Tuebingen strain were used. Embryos were incubated in Holtfreter’s solution at 28.5�C and staged as previously described (Kimmel et al. 1995). Ethical approval was obtained from the Animal Care and Use Committee of Tsinghua University.

The coding sequence of oca2 was amplified and cloned into the pXT7 vector using enzymatic assembly method (Gibson et al. 2009), with Kozak sequence added to enhance the translation efficiency. The A4G nucleotide substitution at + 1829, which resulted in His610Arg mutation, was made with site direct mutation.

The oca2-MO, synthesized by Gene Tools, LLC, targets 50-C TGCCAAATAAGTGAATGAAATGAT-30 in the 50-untrans- lated region (UTR) of oca2. The reporter construct poca2- EGFP was made by fusing 50-UTR and adjacent coding sequence to EGFP cds in frame and used to test the effective- ness of oca2-MO. Capped mRNAs were synthesized using T7 mMessage mMachine Kit (Ambion) and purified using RNeasy Mini Kit (QIAGEN) according to the manufacturer’s instructions. MO was injected alone or in combination with

1184

Yang et al. . doi:10.1093/molbev/msw003 MBE

http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://www.espcr.org/micemut/
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
mRNA into the yolk of zebrafish embryos at the one-cell stage. Pigment phenotypes were observed and imaged with stereomicroscope (Nikon SMZ 1500). Because higher doses (4150 pg) of oca2 mRNA impaired early embryonic develop- ment, we chose to combine a lower dose (100 pg) of mRNA with a lower dose of oca2-MO.

Targeted Gene Modification Experiment in Mouse

Using a CRISPR design tool (http://crispr.mit.edu/, last accessed July 10, 2015), we selected the sgRNA target se- quence (TAAAGCACAGGATTTCAGAC) for murine Oca2 to direct Cas9 cleavage. The murine Oca2 sgRNA sequence was cloned into pDR274 (a sgRNA cloning vector from Addgene, Cambridge MA) to form a T7 promoter-mediated sgRNA expression vector (Hwang et al. 2013). With DraI digestion, the linearized expression vector was purified using a QIAquick Gel Purification Kit (QIAGEN) and used as a DNA template to produce sgRNA by a MAXIscript T7 kit (Life Technologies). For Cas9 mRNA production, T7 promoter-contained px330 vector (Cong et al. 2013) was digested with NotI and then purified using a QIAquick Gel Purification Kit (QIAGEN). Subsequently, the linearized and purified vector was used as a DNA template to synthesize Cas9 mRNA using a mMESSAGE mMACHINE T7 Ultra Kit (Life Technologies). Cas9 mRNA and sgRNA were purified using RNA Clean & Concentrator-25 (ZYMO Research) and dissolved in RNase- free water. RNA concentrations were measured using a NanoDrop ND1000. Finally, 2 ml each of sgRNA and Cas9 mRNA were mixed with an equal volume of formamide, re- spectively, and the denatured mixtures were run on a DNA agarose gel to evaluate RNA quality.

We synthesized 180-bp single-stranded donor oligonucle- otides (ssODN) to introduce the designed mutations by ho- mologous recombination mediated repair. As shown in the supplementary figure S8, Supplementary Material online, mu- tation #1 is the expected mutation to cause His610Arg sub- stitution (by mutating Oca2 A!G); mutation #2 to delete protospacer adjacent motif sequence of NGG to avoid re- peated Cas9 cleavage of target DNA; mutation #3 to delete PshAI cut site (GACNNNNGTC) for convenience of subse- quent genotyping. Mutations #2 and #3 do not cause amino acid changes. According to the literature(Wang et al. 2013; Yang et al. 2013; Singh et al. 2015) and our previous injection experience (Zhong et al. 2015), sgRNA and Cas9 mRNA mix- ture (containing 50 ng/ml sgRNA, 100 ng/ml Cas9 mRNA, and 100 ng/ml ssODN) were prepared using RNase-free water. RNA was microinjected into the cytoplasm of C57BL/6 inbred zygotes. After injection, surviving zygotes were imme- diately transferred into oviducts of ICR albino pseudopreg- nant females. All mice were maintained under cycles of 12-h light and 12-h dark. The Oca2 genotypes of the generated mouse strains were confirmed by sequencing (supplementary fig. S8B, Supplementary Material online). All animal opera- tions were approved by the Institutional Animal Care and the Institutional Review Board of Baylor College of Medicine and Kunming Institute of Zoology, Chinese Academy of Sciences (approval No: KMDWS-SEYX-20120101010).

Quantitative measurements of hair and skin pigmentation were conducted for the four mouse strains. Adult mice (~5 months) were measured (4 WT, 4 WT’, 6 TM, and 6 KO) with a CR-400 tristimulus colorimeter. Hair pigmentation was measured in three different regions of the back (left, middle, and right), and we also obtained skin pigmentation measures of the tail (supplementary fig. S9, Supplementary Material online).

Transmission electron microscopic analysis was performed. Mouse skin tissues were prepared by conventional methods: fixations, staining, and dehydration, embedding in resin and sectioned to obtain thin slices and viewed under a JEOL JEM- 1101 transmission electron microscope at room temperature. Refer to the published method of Varani et al. (2001) for details.

Supplementary Material Supplementary material is available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).

Acknowledgments

The authors are grateful to all the voluntary donors in this study. Special thanks to persons who agreed to provide their portrait photos for publication in this paper. The authors thank Prof. Yongtang Zheng for his help in collecting blood samples. They also thank Xiaofeng Ma, Shuangjuan Yang, Wenting Wan and Xiaolu Li for their technical assistances. This study was supported by grants from the Strategic Priority Research Program of the Chinese Academy of Sciences (XDB13010000); the National Natural Science Foundation of China (91131001, 91231203, 31321002, 31123005, 31371268, and 31371269); the Natural Science Foundation of Yunnan Province (2010CI044); State Key Laboratory of Genetic Resources and Evolution grant (GREKF15-06); and the Personnel Training Project of Yunnan Province (KKSY201526061).

References 1000 Genomes Project Consortium, Abecasis GR, Auton A, Brooks LD,

DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT, et al. 2012. An integrated map of genetic variation from 1,092 human genomes. Nature 491:56–65.

Bandelt HJ, Forster P, Rohl A. 1999. Median-joining networks for infer- ring intraspecific phylogenies. Mol Biol Evol. 16:37–48.

Beirl AJ, Linbo TH, Cobb MJ, Cooper CD. 2014. oca2 regulation of chro- matophore differentiation and number is cell type specific in zebra- fish. Pigment Cell Melanoma Res. 27:178–189.

Beleza S, Johnson NA, Candille SI, Absher DM, Coram MA, Lopes J, Campos J, Araujo II, Anderson TM, Vilhjalmsson BJ, et al. 2013. Genetic architecture of skin and eye color in an African-European admixed population. PLoS Genet. 9:e1003372.

Chaplin G. 2004. Geographic distribution of environmental factors influencing human skin coloration. Am J Phys Anthropol. 125:292–302.

Chen H, Hey J, Slatkin M. 2015. A hidden Markov model for investigating recent positive selection through haplotype structure. Theor Popul Biol. 99:18–30.

Chen H, Slatkin M. 2013. Inferring selection intensity and allele age from multilocus haplotype structure. G3 3:1429–1442.

Chen K, Manga P, Orlow SJ. 2002. Pink-eyed dilution protein controls the processing of tyrosinase. Mol Biol Cell. 13:1953–1964.

1185

Skin Lightening during Recent Human Evolution . doi:10.1093/molbev/msw003 MBE

http://crispr.mit.edu/
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw003/-/DC1
http://www.mbe.oxfordjournals.org/
Cong L, Ran FA, Cox D, Lin S, Barretto R, Habib N, Hsu PD, Wu X, Jiang W, Marraffini LA, et al. 2013. Multiplex genome engineering using CRISPR/Cas systems. Science 339:819–823.

Cook AL, Chen W, Thurber AE, Smit DJ, Smith AG, Bladen TG, Brown DL, Duffy DL, Pastorino L, Bianchi-Scarra G, et al. 2009. Analysis of cultured human melanocytes based on polymorphisms within the SLC45A2/MATP, SLC24A5/NCKX5, and OCA2/P loci. J Invest Dermatol. 129:392–405.

Cook AL, Donatien PD, Smith AG, Murphy M, Jones MK, Herlyn M, Bennett DC, Leonard JH, Sturm RA. 2003. Human melanoblasts in culture: expression of BRN2 and synergistic regulation by fibroblast growth factor-2, stem cell factor, and endothelin-3. J Invest Dermatol. 121:1150–1159.

Cooper GM, Stone EA, Asimenos G, Program NCS, Green ED, Batzoglou S, Sidow A. 2005. Distribution and intensity of constraint in mam- malian genomic sequence. Genome Res. 15:901–913.

Eaton K, Edwards M, Krithika S, Cook G, Norton H, Parra EJ. 2015. Association study confirms the role of two OCA2 polymorphisms in normal skin pigmentation variation in East Asian populations. Am J Hum Biol. 27:520–525.

Edwards M, Bigham A, Tan J, Li S, Gozdzik A, Ross K, Jin L, Parra EJ. 2010. Association of the OCA2 polymorphism His615Arg with melanin content in East Asian populations: further evidence of convergent evolution of skin pigmentation. PLoS Genet. 6:e1000867.

Fay JC, Wu CI. 2000. Hitchhiking under positive Darwinian selection. Genetics 155:1405–1413.

Gardner JM, Nakatsu Y, Gondo Y, Lee S, Lyon MF, King RA, Brilliant MH. 1992. The mouse pink-eyed dilution gene: association with human Prader-Willi and Angelman syndromes. Science 257:1121–1124.

Gibson DG, Young L, Chuang RY, Venter JC, Hutchison CA III, Smith HO. 2009. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat Methods. 6:343–345.

Graf J, Hodgson R, van Daal A. 2005. Single nucleotide polymorphisms in the MATP gene are associated with normal human pigmentation variation. Hum Mutat. 25:278–284.

Han J, Kraft P, Nan H, Guo Q, Chen C, Qureshi A, Hankinson SE, Hu FB, Duffy DL, Zhao ZZ, et al. 2008. A genome-wide association study identifies novel alleles associated with air color and skin pigmenta- tion. PLoS Genet. 4:e1000074

Harding RM, Healy E, Ray AJ, Ellis NS, Flanagan N, Todd C, Dixon C, Sajantila A, Jackson IJ, Birch-Machin MA, et al. 2000. Evidence for variable selective pressures at MC1R. Am J Hum Genet. 66:1351–1361.

HUGO Pan-Asian SNP Consortium, Abdulla MA, Ahmed I, Assawamakin A, Bhak J, Brahmachari SK, Calacal GC, Chaurasia A, Chen CH, Chen J, et al. 2009. Mapping human genetic diversity in Asia. Science 326:1541–1545.

Hwang WY, Fu Y, Reyon D, Maeder ML, Tsai SQ, Sander JD, Peterson RT, Yeh JR, Joung JK. 2013. Efficient genome editing in zebrafish using a CRISPR-Cas system. Nat Biotechnol. 31:227–229.

International HapMap Consortium, Altshuler DM, Gibbs RA, Peltonen L, Altshuler DM, Gibbs RA, Peltonen L, Dermitzakis E, Schaffner SF, Yu F, et al. 2010. Integrating common and rare genetic variation in diverse human populations. Nature 467:52–58.

Izagirre N, Garcia I, Junquera C, de la Rua C, Alonso S. 2006. A scan for signatures of positive selection in candidate loci for skin pigmenta- tion in humans. Mol Biol Evol. 23:1697–1706.

Jablonski NG, Chaplin G. 2000. The evolution of human skin coloration. J Hum Evol. 39:57–106.

Jin L, Su B. 2000. Natives or immigrants: modern human origin in East Asia. Nat Rev Genet. 1:126–133.

John PR, Makova K, Li WH, Jenkins T, Ramsay M. 2003. DNA polymor- phism and selection at the melanocortin-1 receptor gene in nor- mally pigmented southern African individuals. Ann N Y Acad Sci. 994:299–306.

Kamberov YG, Wang S, Tan J, Gerbault P, Wark A, Tan L, Yang Y, Li S, Tang K, Chen H, et al. 2013. Modeling recent human evolution in mice by expression of a selected EDAR variant. Cell 152:691–702.

Kimmel CB, Ballard WW, Kimmel SR, Ullmann B, Schilling TF. 1995. Stages of embryonic development of the zebrafish. Dev Dyn. 203:253–310.

Lamason RL, Mohideen MA, Mest JR, Wong AC, Norton HL, Aros MC, Jurynec MJ, Mao X, Humphreville VR, Humbert JE, et al. 2005. SLC24A5, a putative cation exchanger, affects pigmentation in zeb- rafish and humans. Science 310:1782–1786.

Lao O, de Gruijter JM, van Duijn K, Navarro A, Kayser M. 2007. Signatures of positive selection in genes associated with human skin pigmentation as revealed from analyses of single nucleotide polymorphisms. Ann Hum Genet. 71:354–369.

Lee ST, Nicholls RD, Bundey S, Laxova R, Musarella M, Spritz RA. 1994. Mutations of the P gene in oculocutaneous albinism, ocular albi- nism, and Prader-Willi syndrome plus albinism. N Engl J Med. 330:529–534.

Li D, Qiu Z, Shao Y, Chen Y, Guan Y, Liu M, Li Y, Gao N, Wang L, Lu X, et al. 2013. Heritable gene targeting in the mouse and rat using a CRISPR-Cas system. Nat Biotechnol. 31:681–683.

Librado P, Rozas J. 2009. DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics 25:1451–1452.

Loomis WF. 1967. Skin-pigment regulation of vitamin-D biosynthesis in man. Science 157:501–506.

Myles S, Somel M, Tang K, Kelso J, Stoneking M. 2007. Identifying genes underlying skin pigmentation differences among human popula- tions. Hum Genet. 120:613–621.

Norton HL, Kittles RA, Parra E, McKeigue P, Mao X, Cheng K, Canfield VA, Bradley DG, McEvoy B, Shriver MD. 2007. Genetic evidence for the convergent evolution of light skin in Europeans and East Asians. Mol Biol Evol. 24:710–722.

Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, et al. 2007. PLINK: a tool set for whole-genome association and population-based linkage analy- ses. Am J Hum Genet. 81:559–575.

Puri N, Gardner JM, Brilliant MH. 2000. Aberrant pH of melanosomes in pink-eyed dilution (p) mutant melanocytes. J Invest Dermatol. 115:607–613.

Rinchik EM, Bultman SJ, Horsthemke B, Lee ST, Strunk KM, Spritz RA, Avidano KM, Jong MT, Nicholls RD. 1993. A gene for the mouse pink-eyed dilution locus and for human type II oculocutaneous albinism. Nature 361:72–76.

Rosemblat S, Durham-Pierre D, Gardner JM, Nakatsu Y, Brilliant MH, Orlow SJ. 1994. Identification of a melanosomal membrane protein encoded by the pink-eyed dilution (type II oculocutaneous albi- nism) gene. Proc Natl Acad Sci U S A. 91:12071–12075.

Sabeti PC, Reich DE, Higgins JM, Levine HZ, Richter DJ, Schaffner SF, Gabriel SB, Platko JV, Patterson NJ, McDonald GJ, et al. 2002. Detecting recent positive selection in the human genome from haplotype structure. Nature 419:832–837.

Sabeti PC, Varilly P, Fry B, Lohmueller J, Hostetter E, Cotsapas C, Xie X, Byrne EH, McCarroll SA, Gaudet R, et al. 2007. Genome-wide de- tection and characterization of positive selection in human popu- lations. Nature 449:913–918.

Schallreuter KU, Moore J, Tobin DJ, Gibbons NJ, Marshall HS, Jenner T, Beazley WD, Wood JM. 1999. alpha-MSH can control the essential cofactor 6-tetrahydrobiopterin in melanogenesis. Ann N Y Acad Sci. 885:329–341.

Scheet P, Stephens M. 2006. A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet. 78:629–644.

Shi H, Dong YL, Wen B, Xiao CJ, Underhill PA, Shen PD, Chakraborty R, Jin L, Su B. 2005. Y-chromosome evidence of southern origin of the East Asian-specific haplogroup O3-M122. Am J Hum Genet. 77:408–419.

Shriver MD, Parra EJ. 2000. Comparison of narrow-band reflectance spectroscopy and tristimulus colorimetry for measurements of skin and hair color in persons of different biological ancestry. Am J Phys Anthropol. 112:17–27.

1186

Yang et al. . doi:10.1093/molbev/msw003 MBE

Singh P, Schimenti JC, Bolcun-Filas E. 2015. A mouse geneticist’s practical guide to CRISPR applications. Genetics 199:1–15.

Soejima M, Koda Y. 2007. Population differences of two coding SNPs in pigmentation-related genes SLC24A5 and SLC45A2. Int J Legal Med. 121:36–39.

Staleva L, Manga P, Orlow SJ. 2002. Pink-eyed dilution protein modulates arsenic sensitivity and intracellular glutathione metabolism. Mol Biol Cell. 13:4206–4220.

Su B, Xiao J, Underhill P, Deka R, Zhang W, Akey J, Huang W, Shen D, Lu D, Luo J, et al. 1999. Y-Chromosome evidence for a northward mi- gration of modern humans into Eastern Asia during the last Ice Age. Am J Hum Genet. 65:1718–1724.

Toyofuku K, Valencia JC, Kushimoto T, Costin GE, Virador VM, Vieira WD, Ferrans VJ, Hearing VJ. 2002. The etiology of oculocutaneous albinism (OCA) type II: the pink protein modulates the processing and transport of tyrosinase. Pigment Cell Res. 15:217–224.

Varani J, Spearman D, Perone P, Fligiel SE, Datta SC, Wang ZQ, Shao Y, Kang S, Fisher GJ, Voorhees JJ. 2001. Inhibition of type I procollagen synthesis by damaged collagen in photoaged skin and by collagenase-degraded collagen in vitro. Am J Pathol. 158:931–942.

Wang H, Yang H, Shivalila CS, Dawlaty MM, Cheng AW, Zhang F, Jaenisch R. 2013. One-step generation of mice carrying mutations in multiple genes by CRISPR/Cas-mediated genome engineering. Cell 153:910–918.

Weir BS, Hill WG. 2002. Estimating F-statistics. Annu Rev Genet. 36:721–750. Wilde S, Timpson A, Kirsanow K, Kaiser E, Kayser M, Unterlander M,

Hollfelder N, Potekhina ID, Schier W, Thomas MG, et al. 2014. Direct evidence for positive selection of skin, hair, and eye pigmentation in Europeans during the last 5,000 y. Proc Natl Acad Sci U S A. 111:4832–4837.

Yang H, Wang H, Shivalila CS, Cheng AW, Shi L, Jaenisch R. 2013. One- step generation of mice carrying reporter and conditional alleles by CRISPR/Cas-mediated genome engineering. Cell 154:1370–1379.

Yuasa I, Umetsu K, Harihara S, Kido A, Miyoshi A, Saitou N, Dashnyam B, Jin F, Lucotte G, Chattopadhyay PK, et al. 2006. Distribution of the F374 allele of the SLC45A2 (MATP) gene and founder-haplotype analysis. Ann Hum Genet. 70:802–811.

Zhong H, Chen Y, Li Y, Chen R, Mardon G. 2015. CRISPR-engineered mosaicism rapidly reveals that loss of Kcnj13 function in mice mimics human disease phenotypes. Sci Rep. 5:8366.

1187

Skin Lightening during Recent Human Evolution . doi:10.1093/molbev/msw003 MBE
Genetic evidence for natural selection in humans in the contemporary United States Jonathan P. Beauchampa,1

aDepartment of Economics, Harvard University, Cambridge, MA 02138

Edited by Kenneth W. Wachter, University of California, Berkeley, CA, and approved May 20, 2016 (received for review January 20, 2016)

Recent findings from molecular genetics now make it possible to test directly for natural selection by analyzing whether genetic variants associated with various phenotypes have been under selection. I leverage these findings to construct polygenic scores that use indi- viduals’ genotypes to predict their body mass index, educational at- tainment (EA), glucose concentration, height, schizophrenia, total cholesterol, and (in females) age at menarche. I then examine associ- ations between these scores and fitness to test whether natural se- lection has been occurring. My study sample includes individuals of European ancestry born between 1931 and 1953 who participated in the Health and Retirement Study, a representative study of the US population. My results imply that natural selection has been slowly favoring lower EA in both females and males, and are suggestive that natural selection may have favored a higher age at menarche in females. For EA, my estimates imply a rate of selection of about −1.5 mo of education per generation (which pales in comparisonwith the increases in EA observed in contemporary times). Although they cannot be projected over more than one generation, my results pro- vide additional evidence that humans are still evolving—albeit slowly, especially compared with the rapid changes that have occurred over the past few generations due to cultural and environmental factors.

natural selection | human evolution | educational attainment | menarche | polygenic scores

Whether natural selection has been operating and still oper-ates in modern humans—and at what rate—has been the subject of much debate. Until recently, it was often held that human evolution had come to an end about 40,000–50,000 y ago (see, e.g., ref. 1). However, new evidence that has been accumulating over the last decade suggests that natural selection has been operating in humans over the past few thousand years (2–4) and that a number of adaptations—such as lactase persistence (5), resistance to malaria (6), and adaptation to high altitude (7)—have occurred relatively recently. It has also been shown that height (HGT) and body mass index (BMI) have been under selection in Europeans (8). In parallel, a number of recent studies have sought to examine

the association between lifetime reproductive success (LRS)—the number of children an individual ever gave birth to or fathered— and various phenotypes in contemporary human populations. [In modern populations with low mortality, fitness can be reasonably approximated by LRS (9, 10), notwithstanding some caveats summarized in Discussion.] These studies have typically found that natural selection has been operating in contemporary humans (9, 11–14). It has also been shown that there was significant variance in relative fitness in a preindustrial human population, such that there was much potential for natural selection (15). However, this literature has analyzed the relationship between

phenotypes and LRS, and natural selection occurs only when genotypes that are associated with the phenotypes covary with re- productive success. This literature’s conclusions regarding ongoing natural selection are thus particularly sensitive to assumptions that are needed to estimate the relationship between genotypes and phenotypes and to the inclusion in the analysis of all correlated phenotypes with causal effects on fitness (16, 17). Some of those assumptions have been criticized and debated (e.g., ref. 18), and it has proven challenging to include all relevant correlated phenotypes in analyses of selection in natural populations (17). (The latter point

matters because if, for example, phenotypes 1 and 2 are pheno- typically but not genetically correlated and if only phenotype 2 is under selection, an analysis based on phenotypic data that does not include phenotype 2 or that fails to account for the lack of genetic correlation may erroneously conclude that phenotype 1 is also under selection.) Recent advances in molecular genetics now make it possible to

look directly at the relationship between LRS and genetic vari- ants associated with various phenotypes, thus eliminating those potential confounds. Here, I examine the association between relative LRS (rLRS)—the ratio of LRS to the mean LRS of individuals of the same gender born in the same years—and genetic variants associated with various phenotypes, for a sample of females and males in the Health and Retirement Study (HRS). Using rLRS instead of LRS as the measure of fitness helps control for the effects of time trends in LRS and makes it possible to interpret my estimates as rates of natural selection (9, 16). (My results are robust to using LRS instead of rLRS as the measure of fitness.) The phenotypes I analyze are BMI, educational attainment

(EA), fasting glucose concentration (GLU), HGT, schizophrenia (SCZ), plasma concentrations of total cholesterol (TC), and age at menarche (AAM; in females). These phenotypes were selected on the basis of previous evidence showing that selection acts on some of them (see, e.g., ref. 9) and because summary statistics [i.e., the estimated effects of the single-nucleotide polymorphisms (SNPs) on the phenotypes] from previous large-scale genome-wide asso- ciation studies (GWAS) are available for them (19–25). The HRS is a representative longitudinal panel study of ∼20,000

Americans shortly before and during retirement. It is well suited for this study for several reasons. First, the HRS was designed to be representative of the US population over the age of 50 y (26), which makes it possible to generalize my results to the entire US population of European ancestry born in the years of my study sample. In addition, individuals in the study are in the later stages of their lives, when they have typically completed their lifetime reproduction. Nonetheless, as I explain in Discussion, selection

Significance

I leverage recent advances in molecular genetics to test directly whether genetic variants associated with a number of pheno- types have been under natural selection in the contemporary United States. My finding that natural selection has been slowly occurring for genetic variants associated with educa- tional attainment and (suggestively, in females) for variants associated with age at menarche provides additional evidence that humans are still evolving—albeit slowly and at a rate that cannot account for more than a small fraction of the large changes that have occurred over the past few generations.

Author contributions: J.P.B. designed research, performed research, analyzed data, and wrote the paper.

The author declares no conflict of interest.

This article is a PNAS Direct Submission.

See Commentary on page 7693. 1Email: jonathan.pierre.beauchamp@gmail.com.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1600398113/-/DCSupplemental.

7774–7779 | PNAS | July 12, 2016 | vol. 113 | no. 28 www.pnas.org/cgi/doi/10.1073/pnas.1600398113

http://crossmark.crossref.org/dialog/?doi=10.1073/pnas.1600398113&domain=pdf
mailto:jonathan.pierre.beauchamp@gmail.com
http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1600398113/-/DCSupplemental
http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1600398113/-/DCSupplemental
www.pnas.org/cgi/doi/10.1073/pnas.1600398113
bias due to incomplete genotyping of the study participants and differential survival remains a concern, although genotyped indi- viduals do not appear to differ markedly from nongenotyped indi- viduals in my study sample. To mitigate the risks of confounding by population stratifica-

tion, my analyses focus on unrelated individuals of European ancestry and control for the top 20 principal components of the genetic relatedness matrix [which capture the main dimensions along which the ancestry of the individuals in the dataset vary (27)]. To mitigate the risk of selection bias due to differential mortality, and to ensure that the LRS variable is a good proxy for completed fertility, I limit my analyses to individuals born between 1931 and 1953 who were at least 45 y old (for females) or 50 y old (for males) when asked the number of children they ever gave birth to or fathered. I refer to the resulting sample as the “study sample.” I performed my main analyses separately for females and males, as different selection gradients can operate across genders. In a recent paper, Tropf et al. (28) used genetic data to study the

relationship between LRS and age at first birth in a sample of fe- males, and found that the two phenotypes are negatively genetically correlated. My analyses complement theirs in several important ways: My analyses cover both females and males and seven different phenotypes; they include childless individuals (who can have an important impact on the gene pool by foregoing reproduction); they translate selected estimates into interpretable measures of the rate at which natural selection has been operating; and they indirectly leverage the statistical power of previous large-scale GWAS to estimate the relationship between rLRS and genetic variants associated with the phenotypes, thus increasing the pre- cision of my estimates for some phenotypes. [An alternative to my score regression approach (described in Genetic Evidence for Natural Selection) is the bivariate GREML method (29) (used by Tropf et al.), but it is not well suited for the present study. It has very low power and yields imprecise estimates of the genetic cor- relation in samples of moderate size like the HRS, given the low SNP heritability of rLRS (28), according to the GCTA-GREML Power Calculator (30). It also requires a dataset with phenotypic data for every studied phenotype and assumes normally distributed phenotypes (which is not realistic for rLRS).]

Phenotypic Evidence for Natural Selection I begin by looking at the phenotypic evidence for natural selection in the HRS. The HRS contains phenotypic variables for four of the seven phenotypes I study: BMI, EA, HGT, and TC. (The phenotypic variable for TC is an indicator for a self-reported health problem with high cholesterol, and not plasma concentra- tions of TC as in the GWAS of TC.) Table S1 reports summary statistics for these and for the other phenotypic variables I use, both for all individuals in the study sample and for the genotyped individuals in the study sample. As can be seen, the two samples look remarkably similar. Table 1 reports estimates from separate regressions of rLRS on each of these phenotypic variables (and on control variables) for the sample of all individuals (genotyped and not genotyped) in the study sample. Stouter females and males, less educated females and males, and smaller females have sig- nificantly higher rLRS (P < 0.001 in all cases). The estimates for the sample of genotyped individuals are very similar (Table S2), thus suggesting that the two samples are similar in terms of the selection gradients that were operating on the various phenotypes. As mentioned, without assumptions to estimate the relation-

ship between genotypes and phenotypes and without considering all correlated phenotypes with possible causal effects on fitness, it is not possible to translate these estimates into estimates of evolutionary change—even over a single generation. Previous research, however, has established that these phenotypes are all moderately to highly heritable (31); notwithstanding the possible effects of correlated phenotypes, this is suggestive that genotypes associated with BMI, EA, and HGT covary with fitness and that natural selection has been operating on these phenotypes.

Genetic Evidence for Natural Selection To test directly whether natural selection has been operating on the genetic variants associated with BMI, EA, GLU, HGT, SCZ, TC, and AAM, the summary statistics from the latest GWAS of these phenotypes were used to construct polygenic scores that partially predict the genotyped individuals’ phenotypes based on their genotyped SNPs. To avoid overfitting (32), the GWAS summary statistics used are all based on meta-analyses that ex- clude the HRS. For the main analyses, LDpred (33) was used to construct the scores. LDpred uses a prior on the SNPs’ effect sizes and adjusts summary statistics for linkage disequilibrium (LD) between SNPs to produce scores that have higher pre- dictive power than the alternatives. (My results are robust to using scores constructed with PLINK (34), which does not adjust the summary statistics for LD between SNPs.) The scores were standardized to have mean zero and an SD of 1. Additional details on the construction of the scores are provided inMaterials and Methods and in SI Materials and Methods. Fig. 1 shows the highest previously reported R2 of the scores from

the articles reporting the GWAS whose summary statistics were used to construct the scores, as well as the incremental R2 of the scores of BMI, EA, HGT, and TC (for which there are phenotypic variables in the HRS) in the study sample. (The incremental R2 of the score of a phenotype is defined as the difference between the R2 of the re- gression of the phenotype on controls for sex and birth year, the top 20 principal components of the genetic relatedness matrix, and the score, and the R2 of the same regression but without the score.) The incremental R2 estimates range from 0.012 (for TC) to 0.174 (for HGT) and are all significantly larger than zero; nonetheless, they are all much smaller than estimates of the phenotypes’ herita- bility in the existing literature (31), implying that the scores are very imperfect proxies for the individuals’ true genetic scores (defined as the sum of the true average causal effects of all their alleles) for the various phenotypes. [The R2 of the polygenic score of a phenotype is bounded by the phenotype’s heritability (32) and depends, in part, on the precision with which the effects of the individual genetic variants were estimated in the GWAS of that phenotype, which, in turn, depends on the GWAS sample size. Future, larger GWAS should allow more precise estimation of the effects of the genetic variants and the construction of more precise scores.] Table 2 reports estimates from separate regressions of rLRS on

the polygenic scores of the various phenotypes [and on control variables, which include the top 20 principal components of the genetic relatedness matrix (27)]. The score of EA is significantly negatively associated with rLRS for both females (P = 0.002) and males (P = 0.013). The association remains significant after Bonferroni correction for 13 tests (the number of estimates reported in Table 2) for females, but not for males (Bonferroni- corrected P = 0.022 for females, = 0.174 for males). The estimates for females and males are similar, and the association is also sig- nificant in the sample of females and males together (Table S3; P = 1.2 × 10−5) and remains significant after Bonferroni correction for seven tests (the number of phenotypes) (Bonferroni-corrected P = 8.3 × 10−5). Fig. 2 shows the mean polygenic score of EA as a

Table 1. Estimates from separate regressions of rLRS on each phenotypic variable, for all individuals

Variable

Females Males

Coefficient estimate N Coefficient estimate N

BMI 0.008*** (0.001) 6,396 0.006*** (0.002) 5,431 EA −0.057*** (0.003) 6,403 −0.022*** (0.003) 5,419 HGT −0.006*** (0.001) 6,411 −0.001 (0.001) 5,435 TC 0.000 (0.021) 4,152 −0.027 (0.026) 3,078

This table shows estimates of the coefficients on the phenotypic variables and their SEs (in parentheses) from separate regressions of rLRS on each phenotypic variable and on control variables, for all individuals (genotyped and not genotyped) in the study sample. (***P < 0.01.)

Beauchamp PNAS | July 12, 2016 | vol. 113 | no. 28 | 7775

SO CI A L SC

IE N CE

S EV

O LU

TI O N

SE E CO

M M EN

TA RY

http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1600398113/-/DCSupplemental/pnas.201600398SI.pdf?targetid=nameddest=ST1
http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1600398113/-/DCSupplemental/pnas.201600398SI.pdf?targetid=nameddest=ST2
http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1600398113/-/DCSupplemental/pnas.201600398SI.pdf?targetid=nameddest=STXT
http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1600398113/-/DCSupplemental/pnas.201600398SI.pdf?targetid=nameddest=ST3
function of LRS, by sex. Both females and males who had no children have a significantly higher mean score of EA than those who had one or more children (P < 0.005 in both cases, unpaired t tests). Thus, the negative association between rLRS and the score of EA appears to be driven primarily by score differences between individuals with and without children. The polygenic score of AAM is also significantly and positively

associated with rLRS for females at the 10% level (P = 0.080), but this association does not remain significant after Bonferroni correction for 13 tests. I therefore interpret it as being weakly suggestive that genetic variants associated with higher AAM may have been selected for. The polygenic scores of the other pheno- types (BMI, GLU, HGT, SCZ, and TC) are not robustly signifi- cantly associated with rLRS. Although these estimates are small in magnitude and insignificant, this could be due to lack of statistical power and because my polygenic scores are imperfect proxies for the true genetic scores, and does not prove that natural selection has not been operating on genetic variants associated with those phenotypes. According to the Robertson−Price identity (35, 36), the di-

rectional selection differential of a “character” is equal to the genetic covariance between the character and relative fitness. (A character is an observable feature of an organism, and its di- rectional selection differential is the change in its mean value due to natural selection in one generation.) As I show in SI Materials and Methods, if we define the polygenic scores as the characters of interest, it follows that the coefficients on the scores reported in Table 2 can be interpreted as the directional selection differentials of the scores themselves, expressed in Haldanes—i.e., each co- efficient equals the implied change in the score that will occur due to natural selection in one generation, expressed in SDs of the score per generation. Hence, the estimates from Table 2 imply that natural selection has been operating on the score of EA at a rate of −0.033 Haldanes among females and of −0.031 Haldanes among males in the study sample. (Even if the mechanism that underlies the negative association between rLRS and the score of EA is that more educated people choose to have fewer children, it would still be the case that natural selection has been operating.) I rescaled these estimates of the directional selection differ-

ential of the score of EA to express them in years of educa- tion per generation rather than in Haldanes (SI Materials and

Methods). My rescaled estimates imply that natural selection has been operating on the score of EA at rates of −0.022 [95% confidence interval (CI): −0.036 to −0.009] and −0.022 (95% CI: −0.040 to −0.004) years of education per generation for females and males—or about −1 wk of education per gener- ation for both sexes. I also obtained estimates of the directional selection differ-

ential of EA (or, equivalently, of the true genetic score of EA), which is equal (under some assumptions) to the directional se- lection differential of the polygenic score of EA multiplied by the ratio of the heritability of EA to the R2 of the score of EA (SI Materials and Methods). [I assume the heritability of EA to be 0.40, based on a recent meta-analysis of existing heritability es- timates of EA (37).] Generalizing my results from the study sample to the general population, my estimates imply that nat- ural selection has been operating on EA at rates of −1.30 (95% CI: −2.12 to −0.54) and −1.53 (95% CI: −2.85 to −0.31) months of education per generation among US females and males of European ancestry born between 1931 and 1953. As I discuss in Discussion, these rates are small relative to the increases in EA that have been observed over the past few generations. I performed a number of checks to verify the robustness of my

results. First, I repeated the analyses with LRS instead of rLRS. Second, I used polygenic scores constructed with PLINK (34) in- stead of LDpred. Third, I only included individuals aged no more than 70 y in 2008 (the last year in which individuals were genotyped) and at least 50 y old (for females) or 55 y old (for males) when asked their number of children—to mitigate the risk of selection bias due to differential mortality and to ensure that almost every individual had completed fertility when asked his or her number of children. Fourth, I included the HRS0 cohort of individuals born between 1924 and 1930 together with the study sample. (As detailed in Materials and Methods, I define cohorts based on the individuals’ birth years; the study sample includes the HRS1, HRS2, and HRS3 cohorts, but excludes the HRS0 cohort because of possible selection bias based on mortality.) Table S4 presents the results of those checks. In all cases, the results for EA are robust. Further, for the score of EA for both females and males and for the score of AAM (for females), the estimates are not significantly different from one another at the 5% level across the HRS1, HRS2, and HRS3 cohorts (Table S3 and t tests of the interactions between the coefficients on the scores and cohort dummies). Following Lande and Arnold (16), I also estimated quadratic

regressions of rLRS on all of the polygenic scores and their squares and interactions together and on control variables, to test for nonlinear selection (Table S5 and SI Materials and Methods). I found no convincing evidence that nonlinear selec- tion has been operating on the genetic variants associated with the various phenotypes.

0

0.04

0.08

0.12

0.16

0.20

R 2

BMI EA HGT SCZ TC row

Previously Reported Estimated in HRS

95% C.I.

(No data)

(No data)

Fig. 1. Predictive power (R2) of polygenic scores of the various phenotypes. “Previously reported” denotes the highest previously reported R2 of scores from prediction analyses from the articles reporting the GWAS whose summary statis- tics were used to construct the scores of BMI (19), EA (20), HGT (22), and SCZ (23) (the R2 of the score was not reported for GLU, AAM, and TC). The previously reported R2 for SCZ is the R2 on the liability scale. “Estimated in HRS” denotes estimates of the incremental R2 of the LDpred scores used in this article and for which there are phenotypic variables in the HRS, with percentile confidence in- tervals estimatedwith the nonparametric bootstrapwith 1,000 bootstrap samples.

Table 2. Estimates from separate regressions of rLRS on the polygenic score of each phenotype

Score Females Males

Score of BMI 0.006 (0.010) 0.016 (0.013) Score of EA −0.033*** (0.010) −0.031** (0.012) Score of GLU 0.009 (0.010) −0.013 (0.013) Score of HGT −0.011 (0.014) −0.005 (0.018) Score of SCZ −0.001 (0.011) 0.009 (0.013) Score of TC −0.012 (0.011) −0.003 (0.013) Score of AAM 0.018* (0.011) — N 3,416 2,571

This table shows estimates of the coefficients on the polygenic scores and their SEs (in parentheses) from separate regressions of rLRS on the polygenic score of each phenotype and on control variables, for the study sample. All regressions for each sex had the same number of observations. (*P < 0.10; **P < 0.05; ***P < 0.01.)

7776 | www.pnas.org/cgi/doi/10.1073/pnas.1600398113 Beauchamp

http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1600398113/-/DCSupplemental/pnas.201600398SI.pdf?targetid=nameddest=STXT
http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1600398113/-/DCSupplemental/pnas.201600398SI.pdf?targetid=nameddest=STXT
http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1600398113/-/DCSupplemental/pnas.201600398SI.pdf?targetid=nameddest=STXT
http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1600398113/-/DCSupplemental/pnas.201600398SI.pdf?targetid=nameddest=STXT
http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1600398113/-/DCSupplemental/pnas.201600398SI.pdf?targetid=nameddest=STXT
http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1600398113/-/DCSupplemental/pnas.201600398SI.pdf?targetid=nameddest=STXT
http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1600398113/-/DCSupplemental/pnas.201600398SI.pdf?targetid=nameddest=ST4
http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1600398113/-/DCSupplemental/pnas.201600398SI.pdf?targetid=nameddest=ST3
http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1600398113/-/DCSupplemental/pnas.201600398SI.pdf?targetid=nameddest=ST5
http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1600398113/-/DCSupplemental/pnas.201600398SI.pdf?targetid=nameddest=STXT
http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1600398113/-/DCSupplemental/pnas.201600398SI.pdf?targetid=nameddest=STXT
www.pnas.org/cgi/doi/10.1073/pnas.1600398113
Discussion My results suggest that natural selection has been operating on the genetic variants associated with EA, and possibly with AAM. Although I find no evidence that natural selection has been oper- ating on the genetic variants associated with the other phenotypes or that nonlinear selection has been operating, I emphasize that this could be because my polygenic scores are imperfect proxies for the true genetic scores, which limits the statistical power of my analyses. My estimates of the negative associations between rLRS and

both phenotypic EA and the polygenic score of EA are consistent with previous findings of negative associations between LRS and phenotypic EA in samples of females (38–41), males (38, 41), and females and males together (42) in contemporary Western pop- ulations, although positive phenotypic associations have also been reported for males (39). My estimates are also consistent with concurrent findings of a negative phenotypic association for females and males together and of a negative correlation between LRS and a score of EA (constructed with the summary statistics from an earlier, smaller GWAS of EA) (43). To my knowledge, few articles have investigated the relationship between phenotypic AAM and LRS in contemporary Western populations. Kirk et al. (44) find a quadratic relationship that is suggestive of stabilizing selection, but they find no genetic covariation (using behavioral genetic techniques in a sample of twins). Consistent with the results of my regressions of rLRS on phenotypic BMI and HGT (Table 1), there is previous phenotypic evidence of positive selection for weight and negative selection for HGT in females (9, 45). Pre- vious studies have also documented a positive (9) and an inverted-U (45) relationship between LRS and phenotypic HGT in males, and have found evidence of negative selection for TC and of stabilizing selection for GLU in females (11), in contem- porary Western human populations. Consistent with the results from previous studies with phenotypic

data (e.g., ref. 9), my results suggest that natural selection has been operating slowly relative to the rapid changes that have occurred over the past few generations, presumably due to cultural and environmental factors. For instance, my estimate of a directional selection differential of EA of about −1.5 mo of education per generation pales in comparison with the increase of 6.2 y in the mean level of EA that took place for native-born Americans born between 1876 and 1951 (46) (which is equivalent to about 2 y of education per generation). Moreover, although I find suggestive evidence that genetic variants associated with higher AAM may have been selected for, AAM has substantially decreased in con- temporary Western populations (47). Also, although I find no ev- idence of selection for the genetic variants associated with BMI

and HGT, both phenotypes have markedly increased over the past century (48). Thus, although natural selection is still operating, the environment appears to have achieved an “evolutionary override” (28) on the measurable phenotypes I study. As shown in Okbay et al. (20), the association between the score

of EA and EA is not likely to be driven by the effects of culture, the environment, or population stratification, and is likely to re- flect the true causal effects of multiple genetic variants. For in- stance, in cohorts that are independent of those used in the GWAS of EA, the score remains significant in regressions of EA on the score when family fixed effects are also included. More- over, estimates from an LD score regression (49)—which disen- tangles the signal due to the genetic variants’ causal effects from the signal due to confounding biases—suggest that stratification is not a major source of bias in the GWAS summary statistics of EA. Okbay et al. also analyzed the summary statistics of EA and obtained sizeable and significant estimates of the genetic corre- lation between EA and several neuropsychiatric and cognitive phenotypes, as well as of the genetic variance of EA accounted for by SNPs annotated to the central nervous system relative to other SNPs. Thus, although it is not possible to rule out with certainty that my results are (at least partly) confounded by stratification, stratification is unlikely to be an important concern. Several additional caveats should be kept in mind when inter-

preting my results. First, rLRS is not a perfect proxy for long-term genetic contribution. Among other possible reasons for this, a tradeoff between the quantity and quality of children has been documented in preindustrial human societies and may still exist in modern societies (50). In the presence of such a tradeoff, the number of grandchildren or third-generation descendants is a better measure of fitness—although most datasets (including the HRS) lack such data, and it has been shown that LRS and number of grandoffspring were perfectly genetically correlated in a postindustrial human population (10). Also, in growing populations, individuals who suc- cessfully reproduce earlier in life tend to have higher fitness (51), but rLRS does not account for fertility timing. In the case of EA, indi- viduals with high EA typically have children at a more advanced age, which may further reduce their fitness. Alternative measures of fit- ness—such as the intrinsic rate of increase (the exponentiated Lotka’s r)—account for fertility timing, but they require data on the age at birth of every offspring and do not always perform better in natural populations (52). A second caveat is that it is not possible to translate my estimates into projected evolutionary changes over more than one generation, because my results do not account for the effects of all phenotypes that correlate genetically with the pheno- types I study and that also have causal effects on fitness (16). (Se- lection on phenotypes that are genetically correlated with the phenotypes of interest impacts their genetic covariance, which, in turn, impacts the selection gradients on the phenotypes of interest in future generations.) Furthermore, because the cultural environment changes through time, the selection gradients that existed from 1931 to 1953 may not apply to earlier and subsequent periods, which makes long-term projections problematic. For instance, it has been shown that the demographic transition has significantly changed the selective forces in some populations (41, 53–55). Lastly, there are several reasons why my results in the study

sample of genotyped individuals might not be fully generalizable to the entire US population of European ancestry born between 1931 and 1953. First, the HRS only targets individuals who survived until age 50 y, and about 10% of female and 15% of male Americans born in 1940 died before reaching age 50 y, based on data from the US Social Security Administration (56). Second, in the study sample, only 85% of the participants were still alive in 2008 (the last year when they could be genotyped), 69% were asked to be genotyped, and 59% consented to be genotyped. That being said, a comparison of the summary sta- tistics for all individuals in the study sample and for the geno- typed individuals in the study sample (Table S1) suggests that there are no important differences between the two samples, and the results of the phenotypic regressions are very similar across the two samples (Table 1 and Table S2).

-0.2

-0.1

0

0.1

0.2

S co

re o

f E A

0 1 2 3 4 5+ LRS

Females

0 1 2 3 4 5+ LRS

Males

Mean score 95% C.I.

Fig. 2. Mean polygenic score of EA as a function of LRS, for females and males in the study sample.

Beauchamp PNAS | July 12, 2016 | vol. 113 | no. 28 | 7777

SO CI A L SC

IE N CE

S EV

O LU

TI O N

SE E CO

M M EN

TA RY

http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1600398113/-/DCSupplemental/pnas.201600398SI.pdf?targetid=nameddest=ST1
http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1600398113/-/DCSupplemental/pnas.201600398SI.pdf?targetid=nameddest=ST2
In sum, and keeping those limitations in mind, my results strongly suggest that genetic variants associated with EA have slowly been selected against among both female and male Americans of European ancestry born between 1931 and 1953, and that natural selection has thus been occurring in that pop- ulation—albeit at a rate that pales in comparison with the rapid changes that have occurred in recent generations. My results also suggest that genetic variants positively associated with AAM may have been positively selected for among females in that pop- ulation. As larger GWAS are conducted and better estimates of genetic variants’ effects on various phenotypes become available, polygenic scores will become more precise. The eventual com- pletion of a GWAS of LRS will also make it possible to use other methods, such as LD score regressions (57), to estimate the genetic covariance between LRS and other phenotypes. Future studies that address the above-mentioned limitations will be able to leverage these developments to replicate my results and to obtain more precise estimates of the rate at which natural se- lection has been and is occurring in humans.

Materials and Methods The Study Sample and the Cohorts. The HRS is a longitudinal panel study for which a representative sample of ∼20,000 Americans have been surveyed every 2 y since 1992. My main analyses focus on individuals born between 1931 and 1953. To reduce the risks of confounding by population stratifi- cation, I restrict the analyses to unrelated individuals of European ancestry (i.e., non-Hispanic White individuals). To ensure that the LRS variable is a good proxy for completed fertility, I only include females who were at least 45 y old and males who were at least 50 y old when asked the number of children they ever gave birth to or fathered. Further, to ensure that the sample of individ- uals who have been successfully genotyped (whose DNA samples were col- lected between 2006 and 2008) is comparable to the sample of individuals who have not, I only include individuals who were enrolled in the HRS and asked the number of children they ever gave birth to or fathered in 2008 or earlier. This left 6,414 females and 5,436 males with phenotypic data and 3,416 unrelated females and 2,571 unrelated males who have been successfully genotyped and who passed the quality control filters described in SI Materials and Methods (and for whom I could thus construct polygenic scores). I refer to the resulting sample as the “study sample.”

For some specifications, I divided the study sample into three nonoverlapping cohorts based on the individuals’ birth years. This allowed me to test the ro- bustness of my results across cohorts (my definition of the cohorts resembles the definition used by the HRS, which recruited its different cohorts at different times). Table S6 summarizes the three cohorts—which I label HRS1 (birth years 1931–1941), HRS2 (birth years 1942–1947), and HRS3 (birth years 1948–1953)— as well as the HRS0 cohort of individuals born between 1924 and 1930. To mitigate the risk of selection bias based on mortality, I excluded the HRS0 co- hort of individuals born between 1924 and 1930 from the study sample (Table S6 and SI Materials and Methods), but my main results are robust to the in- clusion of that cohort (Table S4). (Table S7 reports results for the HRS0 cohort.) For the same reason, I excluded individuals born before 1924 from the study sample. I also excluded individuals born after 1953 from the study sample, as very few of them have been genotyped.

Phenotypic Variables. For my baseline analyses, I operationalize relative fit- ness with the rLRS variable. As Fig. S1 shows, LRS for females and males declined gradually between 1931 and 1953, from around three children in the early 1930s to two children around 1950. Table S1 presents summary statistics for birth year, LRS, and childlessness, as well as for the phenotypic variables for BMI, EA, HGT, and TC. SI Materials and Methods provide details on the construction of these variables. The HRS does not contain phenotypic variables for GLU, SCZ, and AAM (in females).

Quality Control of the Genotypic Data and Polygenic Scores. I followed the HRS recommendations regarding the use of the genotypic data (58). The indi- viduals’ genotyped SNPs that passed the quality control filters and that were present in the phenotypes’ GWAS summary statistics files were used to construct the polygenic scores. Depending on the phenotype, there were between 505,254 and 544,493 such overlapping SNPs (except for GLU, for which there were only 22,895 such overlapping SNPs). The average sample sizes across the SNPs used to construct the scores are �NBMI = 232,186, �NEA = 386,098, �NHGT = 243,630, and �NTC = 92,793 individuals; the summary statistics for GLU, SCZ, and MEN did not contain sample size information, but the reported samples sizes for the main GWAS of these phenotypes are

NGLU = 133,010, NSCZ ≈80,000, and NMEN = 132,989 individuals. The GWAS summary statistics used to construct the scores are all based on meta- analyses that exclude the HRS.

For the main analysis, I used LDpred (33) to construct the polygenic scores; for a robustness check, I also constructed polygenic scores with PLINK (34). [The polygenic scores of EA were constructed and provided to me by the Social Science Genetic Association Consortium (SSGAC), following the pro- cedure described here and which I used to construct the other scores.] Both the LDpred and the PLINK scores for an individual are weighted sums of the individual’s genotype across all SNPs. For the PLINK scores, the weight for each SNP is the GWAS estimate of the SNP’s effect, which captures the causal effects of both the SNP and of SNPs that are in LD; for the LDpred scores, the weight for each SNP is the LDpred estimate of the SNP’s causal effect, which LDpred calculates by adjusting the SNPs’ GWAS estimates with a prior on the SNPs’ effect sizes and information on the LD between the SNPs from a ref- erence panel. The LDpred prior on the SNPs’ effect sizes depends on an as- sumed Gaussian mixture weight. For each phenotype, LDpred scores were constructed for a range of Gaussian mixture weights, and I selected the score with the weight that maximizes the incremental R2 of the score or the correlations between the score and known correlates of the phenotype. Both the LDpred and PLINK scores were standardized to have mean zero and an SD of 1. SI Materials and Methods provide more information on the quality control steps and the construction of the polygenic scores, and Table S8 shows the parameters used to construct the scores and the sources for each phenotype’s summary statistics.

Association Analyses. For each of BMI, EA, HGT, and TC—for which phenotypic variables are available in the HRS—I regressed rLRS on the corresponding phenotypic variable, separately for females and males; those regressions included birth year dummies and HRS-defined cohort dummies and were estimated by ordinary least squares (OLS). For all phenotypes, I also regressed rLRS on the polygenic score of the phenotype in various samples; the regressions also included birth year dummies, HRS-defined cohort dummies, and the top 20 principal components of the genetic relatedness matrix [to control for population stratification (27); see also section 5 of the supplemental material of ref. 59)], and were also estimated by OLS. For the regressions in the sample of females and males together, I also controlled for sex and only included the respondent with the lowest person number (PN, an HRS identifier) in each household, as spouses very often have the same number of children, which induces a complex correlation structure between the error terms (the results for the score of EA are robust to alternative ways of selecting one respondent per household). In most results tables, I report the coefficient estimates and SEs, with asterisks to indicate statistical sig- nificance; P values for the main results are reported in Table S2. The SEs and P values of my estimates from regressions of rLRS on the LDpred scores do not account for the uncertainty stemming from the selection of the Gaussian mixture weights for the LDpred scores; however, the fact that my main re- sults are robust to the use of the PLINK scores instead of the selected LDpred scores implies that my results are not driven by this weight selection procedure.

Directional Selection Differentials. Based on the Robertson−Price identity (35, 36), the directional selection differential of a character is equal to its genetic covariance with relative fitness. As I show in SI Materials and Methods, it follows that the estimates of the coefficients on the polygenic scores reported in Table 2 can be interpreted as directional selection dif- ferentials of the scores, expressed in Haldanes (1 Haldane is 1 SD per generation). SI Materials and Methods also show how to rescale the esti- mates of the directional selection differential for the score of EA to ex- press them in years of education per generation instead of in Haldanes, and shows how to obtain estimates of the directional selection differential of EA (or, equivalently, of the true genetic score of EA—rather than of the polygenic score of EA) expressed in years of education per generation (under some assumptions). I used the nonparametric bootstrap method with 1,000 bootstrap samples to obtain percentile confidence intervals for the estimates of the directional selection differentials.

Institutional Review of this Project. This project was reviewed and approved by the National Bureau of Economic Research (NBER) Institutional Review Board. The Harvard University Committee on the Use of Human Subjects also reviewed the protocol for this project and determined that it is not human subjects research.

ACKNOWLEDGMENTS. I thank David Cesarini, Joseph Henrich, Lawrence Katz, Iain Mathieson, Steven Pinker, Alkes Price, Stephen Stearns, and Peter

7778 | www.pnas.org/cgi/doi/10.1073/pnas.1600398113 Beauchamp

http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1600398113/-/DCSupplemental/pnas.201600398SI.pdf?targetid=nameddest=STXT
http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1600398113/-/DCSupplemental/pnas.201600398SI.pdf?targetid=nameddest=STXT
http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1600398113/-/DCSupplemental/pnas.201600398SI.pdf?targetid=nameddest=ST6
http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1600398113/-/DCSupplemental/pnas.201600398SI.pdf?targetid=nameddest=ST6
http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1600398113/-/DCSupplemental/pnas.201600398SI.pdf?targetid=nameddest=ST6
http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1600398113/-/DCSupplemental/pnas.201600398SI.pdf?targetid=nameddest=STXT
http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1600398113/-/DCSupplemental/pnas.201600398SI.pdf?targetid=nameddest=ST4
http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1600398113/-/DCSupplemental/pnas.201600398SI.pdf?targetid=nameddest=ST7
http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1600398113/-/DCSupplemental/pnas.201600398SI.pdf?targetid=nameddest=SF1
http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1600398113/-/DCSupplemental/pnas.201600398SI.pdf?targetid=nameddest=ST1
http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1600398113/-/DCSupplemental/pnas.201600398SI.pdf?targetid=nameddest=STXT
http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1600398113/-/DCSupplemental/pnas.201600398SI.pdf?targetid=nameddest=STXT
http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1600398113/-/DCSupplemental/pnas.201600398SI.pdf?targetid=nameddest=ST8
http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1600398113/-/DCSupplemental/pnas.201600398SI.pdf?targetid=nameddest=ST8
http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1600398113/-/DCSupplemental/pnas.201600398SI.pdf?targetid=nameddest=ST2
http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1600398113/-/DCSupplemental/pnas.201600398SI.pdf?targetid=nameddest=STXT
http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1600398113/-/DCSupplemental/pnas.201600398SI.pdf?targetid=nameddest=STXT
http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1600398113/-/DCSupplemental/pnas.201600398SI.pdf?targetid=nameddest=STXT
www.pnas.org/cgi/doi/10.1073/pnas.1600398113
Visscher for helpful comments. I also thank Dan Benjamin and David Laibson for helpful comments and postdoctoral supervision. The polygenic scores of EA were accessed under Section 4 of the Data Sharing Agreement of the Social Science Genetic Association Consortium (SSGAC) and were constructed and provided by

the SSGAC; I thank Aysu Okbay for constructing these scores on behalf of the SSGAC. I contributed to the GWAS of EA reported in Okbay et al. (20); in accor- dance with SSGAC policy, I acknowledge the remaining authors of that paper in the SI Appendix.

1. Gould SJ (2000) The spice of life. Leader Leader 15(Winter):14–19. 2. Voight BF, Kudaravalli S, Wen X, Pritchard JK (2006) A map of recent positive selection

in the human genome. PLoS Biol 4(3):e72. 3. Mathieson I, et al. (2015) Genome-wide patterns of selection in 230 ancient Eurasians.

Nature 528(7583):499–503. 4. Pickrell JK, et al. (2009) Signals of recent positive selection in a worldwide sample of

human populations. Genome Res 19(5):826–837. 5. Tishkoff SA, et al. (2007) Convergent adaptation of human lactase persistence in

Africa and Europe. Nat Genet 39(1):31–40. 6. Kwiatkowski DP (2005) How malaria has affected the human genome and what hu-

man genetics can teach us about malaria. Am J Hum Genet 77(2):171–192. 7. Yi X, et al. (2010) Sequencing of 50 human exomes reveals adaptation to high alti-

tude. Science 329(5987):75–78. 8. Turchin MC, et al.; Genetic Investigation of ANthropometric Traits (GIANT) Consor-

tium (2012) Evidence of widespread selection on standing variation in Europe at height-associated SNPs. Nat Genet 44(9):1015–1019.

9. Stearns SC, Byars SG, Govindaraju DR, Ewbank D (2010) Measuring selection in con- temporary human populations. Nat Rev Genet 11(9):611–622.

10. Zietsch BP, Kuja-Halkola R, Walum H, Verweij KJH (2014) Perfect genetic correlation between number of offspring and grandoffspring in an industrialized human pop- ulation. Proc Natl Acad Sci USA 111(3):1032–1036.

11. Byars SG, Ewbank D, Govindaraju DR, Stearns SC (2010) Colloquium papers: Natural selection in a contemporary human population. Proc Natl Acad Sci USA 107(Suppl 1):1787–1792.

12. Milot E, et al. (2011) Evidence for evolution in response to natural selection in a contemporary human population. Proc Natl Acad Sci USA 108(41):17040–17045.

13. Elguero E, et al. (2015) Malaria continues to select for sickle cell trait in Central Africa. Proc Natl Acad Sci USA 112(22):7051–7054.

14. Stulp G, Barrett L, Tropf FC, Mills M (2015) Does natural selection favour taller stature among the tallest people on Earth? Proc Biol Sci 282(1806):20150211.

15. Courtiol A, Pettay JE, Jokela M, Rotkirch A, Lummaa V (2012) Natural and sexual selection in a monogamous historical human population. Proc Natl Acad Sci USA 109(21):8044–8049.

16. Lande R, Arnold SJJ (1983) The measurement of selection on correlated characters. Evolution 37(6):1210–1226.

17. Morrissey MB, Kruuk LEB, Wilson AJ (2010) The danger of applying the breeder’s equation in observational studies of natural populations. J Evol Biol 23(11):2277–2288.

18. Kamin LJ, Goldberger AS (2002) Twin studies in behavioral research: A skeptical view. Theor Popul Biol 61(1):83–95.

19. Locke AE, et al.; LifeLines Cohort Study; ADIPOGen Consortium; AGEN-BMIWorking Group; CARDIOGRAMplusC4D Consortium; CKDGen Consortium; GLGC; ICBP;MAGIC Investigators; MuTHER Consortium; MIGen Consortium; PAGE Consortium; ReproGen Consortium; GENIE Consortium; International Endogene Consortium (2015) Genetic studies of bodymass index yield new insights for obesity biology. Nature 518(7538):197–206.

20. Okbay A, et al. (2016) Genome-wide association study identifies 74 loci associated with educational attainment. Nature 533(7604):539–542.

21. Scott RA, et al.; DIAbetes Genetics Replication and Meta-analysis (DIAGRAM) Consor- tium (2012) Large-scale association analyses identify new loci influencing glycemic traits and provide insight into the underlying biological pathways. Nat Genet 44(9):991–1005.

22. Wood AR, et al.; Electronic Medical Records and Genomics (eMEMERGEGE) Consor- tium; MIGen Consortium; PAGEGE Consortium; LifeLines Cohort Study (2014) De- fining the role of common variation in the genomic and biological architecture of adult human height. Nat Genet 46(11):1173–1186.

23. Ripke S, et al.; Schizophrenia Working Group of the Psychiatric Genomics Consortium (2014) Biological insights from 108 schizophrenia-associated genetic loci. Nature 511(7510):421–427.

24. Teslovich TM, et al. (2010) Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466(7307):707–713.

25. Perry JRB, et al.; Australian Ovarian Cancer Study; GENICA Network; kConFab; Life- Lines Cohort Study; InterAct Consortium; Early Growth Genetics (EGG) Consortium (2014) Parent-of-origin-specific allelic associations among 106 genomic loci for age at menarche. Nature 514(7520):92–97.

26. Sonnega A, et al. (2014) Cohort profile: The Health and Retirement Study (HRS). Int J Epidemiol 43(2):576–585.

27. Price AL, et al. (2006) Principal components analysis corrects for stratification in ge- nome-wide association studies. Nat Genet 38(8):904–909.

28. Tropf FC, et al. (2015) Human fertility, molecular genetics, and natural selection in modern societies. PLoS One 10(6):e0126821.

29. Lee SH, Yang J, Goddard ME, Visscher PM, Wray NR (2012) Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood. Bioinformatics 28(19):2540–2542.

30. Visscher PM, et al. (2014) Statistical power to detect genetic (co)variance of complex traits using SNP data in unrelated samples. PLoS Genet 10(4):e1004269.

31. Polderman TJC, et al. (2015) Meta-analysis of the heritability of human traits based on fifty years of twin studies. Nat Genet 47(7):702–709.

32. Wray NR, et al. (2013) Pitfalls of predicting complex traits from SNPs. Nat Rev Genet 14(7):507–515.

33. Vilhjálmsson BJ, et al.; Schizophrenia Working Group of the Psychiatric Genomics Consortium, Discovery, Biology, and Risk of Inherited Variants in Breast Cancer (DRIVE) study (2015) Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am J Hum Genet 97(4):576–592.

34. Purcell S, Chang C (2015) PLINK 1.9. Available at https://www.cog-genomics.org/plink2. 35. Robertson A (1966) A mathematical model of the culling process in dairy cattle. Anim

Prod 8(1):95–108. 36. Price GR (1970) Selection and covariance. Nature 227(5257):520–521. 37. Branigan AR, McCallum KJ, Freese J (2013) Variation in the heritability of educational

attainment: An international meta-analysis. Soc Forces 92(1):109–140. 38. Weeden J, Abrams MJ, Green MC, Sabini J (2006) Do high-status people really have fewer

children?: Education, income, and fertility in the contemporary U.S. Hum Nat 17(4):377–392. 39. Fieder M, Huber S (2007) The effects of sex and childlessness on the association between

status and reproductive output in modern society. Evol Hum Behav 28(6):392–398. 40. Huber S, Bookstein FL, Fieder M (2010) Socioeconomic status, education, and reproduction

in modern women: An evolutionary perspective. Am J Hum Biol 22(5):578–587. 41. Skirbekk V (2008) Fertility trends by social status. Demogr Res 18(5):145–180. 42. Hopcroft RL (2006) Sex, status, and reproductive success in the contemporary United

States. Evol Hum Behav 27(2):104–120. 43. Conley D, et al. (2016) Assortative mating and differential fertility by phenotype and

genotype across the 20th century. Proc Natl Acad Sci USA 113(24):6647–6652. 44. Kirk KM, et al. (2001) Natural selection and quantitative genetics of life-history traits

in Western women: A twin study. Evolution 55(2):423–435. 45. Stulp G, Barrett L (2016) Evolutionary perspectives on human height variation. Biol

Rev Camb Philos Soc 91(1):206–234. 46. Goldin CD, Katz LF (2009) The Race Between Education and Technology (Harvard Univ

Press, Cambridge, MA). 47. Wyshak G, Frisch RE (1982) Evidence for a secular trend in age of menarche. N Engl

J Med 306(17):1033–1035. 48. Cole TJ (2003) The secular trend in human physical growth: A biological view. Econ

Hum Biol 1(2):161–168. 49. Bulik-Sullivan BK, et al.; Schizophrenia Working Group of the Psychiatric Genomics

Consortium (2015) LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet 47(3):291–295.

50. Lawson DW, Mace R (2011) Parental investment and the optimization of human family size. Philos Trans R Soc Lond B Biol Sci 366(1563):333–343.

51. Jones JH, Bird RB (2014) The marginal valuation of fertility. Evol Hum Behav 35(1):65–71. 52. Brommer JE, Gustafsson L, Pietiäinen H, Merilä J (2004) Single-generation estimates

of individual fitness as proxies for long-term genetic contribution. Am Nat 163(4): 505–517.

53. Courtiol A, et al. (2013) The demographic transition influences variance in fitness and selection on height and BMI in rural Gambia. Curr Biol 23(10):884–889.

54. Moorad JA (2013) A demographic transition altered the strength of selection for fitness and age-specific survival and fertility in a 19th century American population. Evolution 67(6):1622–1634.

55. Vogl TS (2016) Differential fertility, human capital, and development. Rev Econ Stud 83(1):365–401.

56. Bell FC, Miller ML (2005) Life Tables for the United States Social Security Area 1900−2100, Actuarial Study (Soc Secur Admin, Baltimore), Vol 120.

57. Bulik-Sullivan B, et al. (2015) An atlas of genetic correlations across human diseases and traits. Nat Genet 47(11):1236–1241.

58. University of Washington (2012) Quality control report for genotypic data. Available at hrsonline.isr.umich.edu/sitedocs/genetics/HRS_QC_REPORT_MAR2012.pdf. Ac- cessed August 5, 2015.

59. Rietveld CA, et al.; Social Science Genetics Association Consortium (2014) Replicability and robustness of genome-wide-association studies for behavioral traits. Psychol Sci 25(11):1975–1986.

60. Van Os J, Jones PB (2001) Neuroticism as a risk factor for schizophrenia. Psychol Med 31(6):1129–1134.

61. Kendler KS, Ohlsson H, Sundquist J, Sundquist K (2015) IQ and schizophrenia in a Swedish national sample: their causal relationship and the interaction of IQ with genetic risk. Am J Psychiatry 172(3):259–265.

62. Okasha M, McCarron P, McEwen J, Smith GD (2001) Age at menarche: secular trends and association with adult anthropometric measures. Ann Hum Biol 28(1):68–78.

63. Efron B, Tibshirani RJ (1994) An Introduction to the Bootstrap (CRC, New York).

Beauchamp PNAS | July 12, 2016 | vol. 113 | no. 28 | 7779

SO CI A L SC

IE N CE

S EV

O LU

TI O N

SE E CO

M M EN

TA RY

http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1600398113/-/DCSupplemental/pnas.201600398.sapp.pdf
https://www.cog-genomics.org/plink2
http://hrsonline.isr.umich.edu/sitedocs/genetics/HRS_QC_REPORT_MAR2012.pdf

 
Do you need a similar assignment done for you from scratch? We have qualified writers to help you. We assure you an A+ quality paper that is free from plagiarism. Order now for an Amazing Discount!
Use Discount Code "Newclient" for a 15% Discount!

NB: We do not resell papers. Upon ordering, we do an original paper exclusively for you.

Buy Custom Nursing Papers