RED HOT Contributors

 

Selective sweep with significant positive selection serves as the driving force for the differentiation of japonica and indica rice cultivars

0

Selection of the rice accessions

Although the 3,000 rice genomes dataset [21] provides an unprecedented resource for detecting the selective sweep regions in rice cultivars, most of which have low-coverage or inadequate sequencing depths that are difficult to employ in the high-resolution detection analysis due to the frequent coverage gaps. Therefore, 330 cultivars with ≥15× sequencing depths were downloaded from the 3,000 samples (Additional file 1: Table S1). To detect the regions with significant differentiation between japonica and indica, only those cultivars with significant differentiation that was detected by diversity calculation and PCA were retained (Additional file 1: Figure S1 and Table S2 see Methods). Finally, 296 out of the 330 rice cultivars, including 154 indica and 142 japonica cultivars, from 45 different countries (or regions), were used for further analysis, which have 23.8× average sequencing depth (ranging from 15.0× to 51.1×; Additional file 1: Table S1). The other 34 cultivars were removed due to the potential genetic admixtures of these two subspecies (Additional file 1: Table S1). And the retained cultivars exhibited scattered geographic distribution. About 1/3 of these cultivars were collected from China and India, both of which are geographically larger. Another 1/3 were from countries in Southeast Asia, including Thailand, Burma, Cambodia and so on. The last 1/3 were from other countries, such as Japan, South Korea and so on.

In addition, the sequences of 39 diverse wild rice O. rufipogon or O. nivara (also referred to annual of O. rufipogon) accessions, which were believed to be the immediate progenitor of the Asian cultivated rice of O. sativa, were collected from previous reports [10] (Additional file 1: Table S3). These sequences have 11× sequencing depth on average (ranging from 4.0× to 68.6×). On the other hand, the whole-genome sequences of 20 African rice O. glaberrima accessions, which are closely related to O. sativa and O. rufipogon, were downloaded at a 42.0× average sequencing depth (ranging from 5.9× to 120.3×; Additional file 1: Table S3) as outgroups for further analysis.

Nucleotide diversity/divergence within or between these two subspecies

After trimming and removing low-quality bases, the clean reads of 296 Asian cultivated rice, 39 wild rice, and 20 Africa cultivated rice accessions were mapped to the Nipponbare reference genome. Then, joint SNP calling and genotyping of the sequenced samples were performed (see Methods). After stringent filtering, a total of 23,147,437 SNPs across all the 355 various rice samples were called and used to estimate diversity among samples or divergence among groups.

Based on the SNP data, π was estimated at 0.00196 within japonica and 0.00265 within indica groups (Additional file 1: Figure S2), which were higher than that within japonica (0.0006) and indica (0.0016) that were estimated in 517 Chinese indica and japonica landraces [22]. These discrepancies may be due to (i) higher genetic diversity in the worldwide cultivars than that only in Chinese landraces; (ii) an underestimation of the genetic diversity in these 517 Chinese landraces caused by the low-coverage sequencing depth (average 1× for each sample). D xy between indica and japonica was about 0.00617 in our selected samples (Additional file 1: Figure S2), suggesting a significant differentiation between these two subspecies, which also had been confirmed by the PCA analysis. As expected, the diversity within these wild rice accessions was about 0.0067 (Additional file 1: Figure S2), which was significantly higher than that in japonica or indica, suggesting the strong bottleneck during the domestication that has also been reported by other studies [10, 23, 24]. As is known, the short-read sequencing technology could generate massive erroneous SNP calls if not properly handled. However, it was less likely for those false signals to mimic the truly differentiated SNPs. Such erroneous calls would distribute more randomly, both in japonica and in indica, which rarely resulted in low diversities within each subspecies. Furthermore, we did not observe any inflation in the estimated diversities, or any significant departure in the PCA analysis. This implies a conservative SNP call set, given the high coverage dataset and the stringent criteria used in this study.

Detection of regions that underwent selective sweep

The large SNP data set detected in our samples provides an opportunity to identify artificial selected regions by comparing polymorphism levels in these cultivated and wild rice accessions. The selective signature from domestication with selective sweep includes a reduction in nucleotide diversity/divergence and altered allele frequency in these domestication loci. As mentioned above, the average diversity within japonica or indica is about 0.002. Therefore, the value of 0.002 was employed as one cutoff to detect the divergent regions between japonica and indica. Among these divergent regions, only those which have at least 10-fold lower diversity (0.0002) between the two subspecies can be defined as selective sweep regions both in japonica and indica. Compared with previous studies, two advantages can be found in our definition: i) the ten-fold-lower diversity is a more stringent criterion than previous study [10]; ii) by using 0.002, we can easily exclude these regions having very low divergence between these two subspecies and very low diversity within each of the two subspecies, which may not be the result of divergence and selection, instead, caused by sequencing or analytical errors.

Therefore, two types of domesticated regions were detected (Tables 

1

,

2

and

3

): domesticated region type I (defined as DR-I), which have undergone selective sweep within each subspecies (π ≤ 0.0002) but have high divergence between these two subspecies (

Dxy

≥0.002); domesticated region type II (defined as DR-II), which have low diversity regions shared by both subspecies (not only π ≤ 0.0002 within each subspecies, but also

Dxy

≤0.0002 between the two subspecies), but have high diversity within the wild rice (π ≥ 0.001) (See Methods for details).

Table 1

Blocks with selective sweep

Block numbers

28

28

Average length (kb)

31.8

23

Total length (Mb)

0.89

0.64

Including genes

163

110

Table 2

Genome-wide identification of selective sweep regions with low diversity (π ≤ 0.0002) within each of the two subspecies, respectively, but with high divergence (D xy ≥0.002) between these two subspecies, which were defined as DR-I. Six regions showed slightly higher π values (≥0.0002 but ≤0.0003, flagged by * in this table) because these were merged regions (See methods)

1

chr01

35499730–35510003

10274

4

2

6

14

242

246

241

89

178

121

2*

chr01

35839043–35890683

51641

19

10

13

21

468

478

467

232

320

263

3

chr02

13139975–13160003

20029

18

5

6

18

295

305

301

133

209

142

4

chr02

13349983–13360044

10062

13

2

4

11

241

249

246

101

149

131

5

chr02

14944936–14962544

17609

15

3

4

14

250

260

252

121

157

145

6

chr02

27708263–27721727

13465

11

4

11

14

263

271

263

87

143

145

7

chr03

1579192–1602557

23366

16

4

15

14

302

307

311

153

183

193

8*

chr03

2483329–2542460

59132

16

20

9

25

434

444

435

198

275

239

9

chr03

2706518–2724355

17838

13

18

7

18

440

453

440

188

310

210

10

chr03

2832279–2854311

22033

3

3

4

16

304

310

306

214

210

243

11

chr03

2896026–2923841

27816

5

3

5

15

321

327

323

151

178

220

12*

chr03

2997394–3195565

198172

18

11

22

21

578

590

580

260

356

316

13

chr03

3479756–3498823

19068

3

2

3

14

241

245

243

132

180

138

14*

chr03

24189350–24230136

40787

20

14

28

22

504

511

508

289

341

306

15

chr03

28469725–28489493

19769

14

21

7

12

429

437

427

245

327

308

16

chr04

34409908–34420007

10100

13

6

15

19

314

316

319

159

216

176

17

chr04

34469603–34500654

31052

14

6

16

16

262

264

265

123

169

136

18*

chr05

21659599–21670204

10606

16

7

25

21

289

295

291

156

177

202

19

chr05

22789981–22800012

10032

17

6

16

15

319

328

331

472

379

421

20

chr05

22869880–22880236

10357

6

2

3

17

295

305

302

137

155

208

21

chr05

24026607–24070766

44160

20

9

35

18

604

614

616

318

438

349

22

chr05

24300403–24330214

29812

16

8

13

11

343

356

350

348

346

329

23

chr05

26824033–26840483

16451

11

2

4

17

233

243

239

131

172

137

24

chr07

4149981–4163826

13846

9

8

2

18

214

220

217

420

365

365

25

chr07

25598750–25620217

21468

8

10

5

16

323

327

323

155

219

190

26

chr09

22759164–22771433

12270

6

1

3

18

251

258

254

171

164

204

27

chr09

22910542–22930021

19480

14

8

11

18

345

356

349

242

251

256

28*

chr10

21289972–21400480

110509

18

10

12

23

517

534

521

250

336

263

Average

31829

13

7

11

17

344

352

347

203

247

227

Table 3

Genome-wide identification of the DR-II type selective sweep regions with low diversity (π ≤ 0.0002) within cultivars, but high diversity (π >0.001) in wild rice

1

chr01

8645929–8660126

14198

1

18

16

13

118

104

108

106

2

chr03

26199939–26210003

10065

10

15

14

13

141

140

144

142

3

chr04

25959995–26040007

80013

10

15

14

13

181

228

229

229

4

chr04

26089892–26100078

10187

6

8

7

7

155

250

249

249

5

chr04

26229998–26260103

30106

10

15

15

14

230

327

325

326

6

chr04

26359991–26370003

10013

5

6

6

6

125

168

166

167

7

chr04

26789966–26810053

20088

10

16

14

14

155

251

247

249

8

chr04

26949945–26970083

20139

9

14

13

12

141

191

190

190

9

chr04

27009995–27060009

50015

7

20

17

16

286

360

356

358

10

chr04

32529159–32540000

10842

18

13

16

16

109

104

123

114

11

chr04

33889971–33900005

10035

15

10

20

16

173

228

222

225

12

chr04

33979989–33990038

10050

15

10

16

14

100

135

145

140

13

chr04

34219891–34249999

30109

12

7

12

11

277

316

319

318

14

chr05

29729953–29740204

10252

9

5

7

7

115

115

103

109

15

chr07

2739991–2770008

30018

15

13

18

16

406

456

451

453

16

chr07

2799992–2810002

10011

11

6

9

9

262

311

305

308

17

chr07

2879923–2910069

30147

14

10

13

13

280

437

432

434

18

chr07

2979996–2991842

11847

1

4

3

3

140

104

105

105

19

chr07

3096102–3109999

13898

10

9

11

10

110

116

108

112

20

chr07

3699995–3710053

10059

12

7

10

10

364

394

396

395

21

chr07

3759873–3790006

30134

9

7

10

9

264

273

272

273

22

chr07

4019981–4030093

10113

12

6

12

10

138

131

124

127

23

chr07

4070000–4090006

20007

9

17

15

14

236

256

252

254

24

chr08

23749999–23770091

20093

12

14

14

14

246

269

263

266

25

chr08

23809970–23860036

50067

9

16

15

14

549

500

501

501

26

chr08

23919999–23940003

20005

5

10

9

8

206

283

282

283

27

chr08

23989975–24040080

50106

7

14

11

11

273

315

312

313

28

chr12

24930000–24950016

20017

13

1

11

9

168

124

121

122

Average

22951

10

11

12

11

213

247

246

246

DR-I showed that differentiation has already occurred in these wild rice, and selective sweep was independently imposed on the domestication of indica or japonica, respectively. Therefore, the indicajaponica differentiation was driven not only by genetic drifts or possible natural selection of wild rice, but also by artificial selection of cultivated rice. The artificial selection should be much more powerful in accelerating the differentiation progress. For this type of domesticated regions, a total of 28 blocks, including about 890 kb of DNA (average: 31.8 kb; ranging from 10 kb to 198 kb) and 163 genes, were detected (Tables 1 and 2). Meanwhile, since japonica had been subdivided into temperate japonica and tropical japonica [25, 26], π within temperate japonica and tropical japonica, and D xy between temperate japonica and indica and tropical japonica and indica were calculated (Table 2). All the data showed a similar result, suggesting that the DR-I regions were conserved in temperate japonica and tropical japonica, and divergent between temperate japonica and indica or tropical japonica and indica.

For DR-II, normal diversity was detected within these wild rice accessions (π ≥ 0.001), whereas very low diversity was detected not only within each subspecies (π ≤ 0.0002), but also between the two subspecies (D xy  ≤ 0.0002), suggesting that selective sweep was simultaneously imposed on both indica and japonica, and these overlapping DR-IIs may have originated only once or as a result of selection and subsequent introgression from one subspecies to another. For this type of domesticated region, a total of 28 blocks, including about 640 kb genomic length (average: 23.0 kb, ranging from 10 kb to 80 kb), was detected (Tables 1 and 3).

Origin of DR-I

When using the SNPs of DR-I to reconstruct a phylogenetic tree, as expected, all

japonica

or

indica

samples clustered together within its own clade, respectively, whereas the

japonica

clade was distantly located from the

indica

clade (Fig. 

1

), which was consistent with the results of PCA analysis that the

japonica

varieties clearly segregate from the

indica

individuals (Additional file

1

: Figure S1). Furthermore, Tajima’s

D

and F

st

statistics also indicated significant differences of the genetic structure between

japonica

and

indica

populations in the DR-I region (Table 

4

). On the other hand, some wild rice accessions were clustered within the

japonica

or

indica

clade, respectively, and some wild rice accessions were scattered outside the two cultivated rice clades, which was also consistent with the findings of previous studies [

10

].

Fig. 1

Phylogenetic tree of the 28 DR-I blocks. The low depth wild rice (whole genome depth <4) were not used in this tree

Table 4

Results of the analysis by Tajima’s D and Fst statistics between the japonica and indica populations in the DR-I region

1

chr01

35499730–35510003

−2.344**

−2.319**

2.564**

0.93668**

2

chr01

35839043–35890683

−2.284**

−2.388**

1.470

0.88126**

3

chr02

13139975–13160003

−1.971**

−1.896

1.050

0.93705**

4

chr02

13349983–13360044

−1.940**

−2.265**

0.953

0.93638**

5

chr02

14944936–14962544

−1.871**

−1.518

1.551

0.8978**

6

chr02

27708263–27721727

−1.921**

−1.917**

2.404**

0.94298**

7

chr03

1579192–1602557

−1.794

−2.420**

1.862*

0.90987**

8

chr03

2483329–2542460

−2.171**

−1.616

1.587*

0.93704**

9

chr03

2706518–2724355

−2.379**

−2.458**

1.533*

0.84626**

10

chr03

2832279–2854311

−2.412**

−2.235*

1.182

0.58542

11

chr03

2896026–2923841

−2.343**

−2.108**

1.708*

0.90956**

12

chr03

2997394–3195565

−2.220**

−1.027

2.526**

0.94684**

13

chr03

3479756–3498823

−2.313**

−0.066

2.738**

0.9228**

14

chr03

24189350–24230136

−2.058**

−2.162**

1.219

0.92474**

15

chr03

28469725–28489493

−2.303**

−1.961**

1.236

0.8877**

16

chr04

34409908–34420007

−2.097**

−1.729

2.262*

0.91761**

17

chr04

34469603–34500654

−1.915**

−2.145**

1.701*

0.93408**

18

chr05

21659599–21670204

−1.757

−2.137**

0.900

0.89436**

19

chr05

22789981–22800012

−1.857

−2.009**

2.331**

0.91203**

20

chr05

22869880–22880236

−1.993**

−2.020**

2.582**

0.85427**

21

chr05

24026607–24070766

−2.116**

−1.458

2.510**

0.74986**

22

chr05

24300403–24330214

−2.017**

−2.275**

2.936**

0.94534**

23

chr05

26824033–26840483

−1.731

−1.795

2.738**

0.93111**

24

chr07

4149981–4163826

−1.766

−1.686

2.518**

0.92099**

25

chr07

25598750–25620217

−2.310**

−2.049**

2.955**

0.89888**

26

chr09

22759164–22771433

−2.131**

−2.444**

0.349

0.93668**

27

chr09

22910542–22930021

−1.892

−2.106**

2.634**

0.88126**

28

chr10

21289972–21400480

−2.120**

−1.634

2.594**

0.93705**

In these 28 DR-I blocks, the average diversity was 0.00013 (ranging from 0.00003 to 0.00020) within japonica and 0.00017 (ranging from 0.00011 to 0.00025) within indica (Table 2). However, the average divergence between these two groups was 0.0034, which was 11- to 101-fold higher (~27-fold on average) than the diversity within each group (Table 2), suggesting significant differentiation between indica and japonica in these regions. The average diversity is ~15-fold lower within japonica and ~16-fold lower within indica than their corresponding genome-wide diversity, suggesting strong selection with selective sweep on these regions. On the other hand, in these blocks, significantly positive correlations were displayed either in the diversity within indica vs. japonica, or D ind-jap vs. π jap or π ind , or D jap-wild and D ind-wild (Additional file 1: Figure S3). This was consistent with the possibility that japonica and indica may have undergone strong artificial selection from different isolates of the wild rice O. rufipogon during domestication at least in these domesticated blocks.

For this reason, a phylogenetic tree was reconstructed using SNPs of the 28 DR-Is to determine which wild rice was more similar to the 28 domesticated blocks that were fixed or near fixed in japonica or indica, respectively. Similar to the findings of previous studies [10], four clear clades were detected, including the African rice clade (also as an outgroup clade), japonica clade with some O. rufipogon accessions, indica clade with some O. rufipogon accessions, and an independent O. rufipogon clade (Fig. 1). All japonica or indica samples were clustered together, whereas all japonica samples were far from all of these indica samples, which matched the PCA result that the japonica varieties clearly segregated from the other groups, suggesting that, at least in these 28 DR-I regions, these two subspecies were probably independently domesticated from different wild rice isolates. Interestingly, the wild rice accessions with the highest similarity to japonica varieties were W1725, W2198, P46 and W3046 (Fig. 1), which were collected from Thailand to Southern China, suggesting that these japonica domesticated regions may have independently originated from the regions of Southern China [10, 27] to the Indo-China Peninsula. On the other hand, the wild rice accessions with the highest similarity to indica were W1559, CA97, L89_12 and IRGC88812, all of which were collected from Indo-China Peninsula, suggesting that these indica domesticated regions may have independently originated from the regions of Indo-China Peninsula. Therefore, the Indo-China Peninsula might be the overlapping place with the closest wild relatives both for japonica and indica.

Due to limited sampling size of wild rice, one possible explanation for the topology of the phylogenetic tree was that the DR-Is are retained by frequent introgression from cultivars to some wild rice. To test this hypothesis, more wild rice samples are necessary for further investigation of the origin of these domesticated regions. We expect if most other wild rice accessions have a similar pattern with above regions, then we can exclude the introgression from cultivars to wild rice in these two regions. Conversely if there were a variety of wild rice accessions exhibiting dissimilar pattern in these two regions, then introgression from cultivars to wild rice may have an impact in our selected wild rice samples. Then, more than 400 whole-genome sequenced wild rice samples were collected to enlarge the sample panel, mostly having low sequence depth (~1×) (Additional file 1: Table S4). Additional file 1: Table S5 repeatedly shows that the wild rice accessions from the Indo-China Peninsula (e.g., W1725 and W1506 from Thailand) or Southern China (e.g., W3093, P46 and W3040) have the highest identity to the genotype of japonica, whereas most of the wild rice accessions from the Indo-China Peninsula (e.g., W1559, W1086 and W1930) have the highest identity to the genotype of indica. This is consistent with the above result that, at least in these 28 DR-I regions, the closest wild relatives for japonica and indica have the overlapping place in Indo-China Peninsula, suggesting that both of the two domesticated subspecies may have a common place of the origin.

Origin of DR-II

In the 28 DR-II blocks, the average diversity was ~0.0001 either within japonica or indica, even in all of these cultivars, which was about 21-fold (from 6.1 to168-fold) lower than that of wild rice accessions (Table 3), suggesting strong selective sweep both in indica and japonica accessions. This result also suggested that these 28 DR-II blocks should have a common origin and subsequently introgressed from one subspecies to another [10]. The diversity in these regions within japonica, indica, or both was significantly lower than the diversity in the 28 DR-I blocks within japonica (t-test, P = 0.011) or indica (t-test, P < 0.0001), suggesting that the DR-I blocks may be independently fixed in indica or japonica prior to the fixation of the DR-II blocks.

A phylogenetic tree was also reconstructed using the SNPs from the 28 DR-II blocks. As expected, all

japonica

and

indica

samples were clustered together as a cultivar clade (Fig. 

2

), which was different from the topology of the DR-Is, in that tree

japonica

samples were distantly located from these

indica

samples and in different clades. The wild rice with the highest similarity to

japonica

and

indica

in this tree was also the wild rice of W1559, which had the highest similarity only to

indica

but not

japonica

samples in DR-Is, suggesting that these 28 DR-II blocks might have introgressed from

indica

to

japonica

.

Fig. 2

Phylogenetic tree of the 28 DR-II regions

To further investigate the direction of introgression in these DR-IIs, more whole-genome sequenced wild rice samples, including those with low depth, were used as earlier described. The wild rice accessions of W1086 (India), W0178 (Thailand), W1090 (India), CA97_053 (Cambodia), and W0639 (Burma) (Additional file 1: Table S6) have the highest identity to DR-IIs in both indica and japonica. On the other hand, all these wild rice accessions were clustered within the indica-type wild rice clade in the phylogenetic tree. This result further suggested that the 28 DR-IIs were introgressed from indica to japonica, which was contradictory to the reported 55 major domestication sweeps introgressed from japonica to indica [10], suggesting that not only introgression from japonica to indica, but also from indica to japonica could be detected in our rice cultivars.

Functional classification and selection pressure of specific genes

In the DR-I and DR-II regions, 163 and 110 genes were detected, respectively (Additional file 1: Table S7 and S8). To determine the functional categories of these genes, we summarized the possible functions of these genes using rice Gene Ontology (GO) annotation (http://geneontology.org/) (Figure S4).

Most of the genes in the DR-I regions have a significant functional enrichment in cell growth, anatomical structure morphogenesis, cellular component organization, DNA metabolic process, reproduction, embryo development, and photosynthesis (Additional file 1: Figure S4); for example, some functions responding to seed or coleoptile development (Os01g61380, Os03g05820, Os03g06010Os03g06060, Os03g06120, Os03g06360, Os03g06890, and Os05g41030), flower, pollen, or anther development (Os03g05140, Os05g38990, Os07g08170, and Os10g39880), root development (Os03g43400 and Os03g43410) (Additional file 1: Table S7). This functional enrichment analysis was consistent with the findings of previous reports that the differentiation of morphological and physiological traits was an adaptation to distinct climatic, ecogeographic, and cultural conditions between indica and japonica [2830].

Because the selective sweep was respectively detected in indica and japonica, and significant differentiation between these two subspecies was observed, we expected that more genes under positive selection would be detected in the genes of DR-I during the differentiation of indica and japonica. Therefore, the Ka/Ks between indica and japonica samples was employed to evaluate positive selection of these genes. Interestingly, 43 out of the 163 genes (25.9%) with Ka/Ks >1 (or Ks = 0, Ka > 0) were detected (Additional file 1: Table S7), which showed a significantly higher proportion (χ 2 with Yates correction = 32.4, d.f. = 1, P < 0.0001) than that (2,678 out of 27,384, ~8.7%) of the genome-wide reports [7], suggesting that a large number of the DR-I genes were under positive selection during the differentiation of indica and japonica, and the differentiation of the two subspecies was driven by both artificial and natural selection with strong positive selection, which directly acted on many characteristics.

Genes in the DR-II regions were enriched in cell differentiation, anatomical structure morphogenesis, cell growth, photosynthesis, awns development, flower development, cell cycle, embryo development, and reproduction (Additional file 1: Figure S5 and Table S8). This functional enrichment was consistent with the reports on the differentiation of morphological and physiological traits for cultivars and wild rice. For example, sh4 19, a gene related to shattering, which was included in the selective sweep regions in the research of Huang et al. [10], was also found in our DR-II. Remarkably, a more recent research reported a highly conserved region in chromosome 4, which was also included in our DR-II regions [26]. LABA1 (LOC_Os04g43840) and GAD1 (LOC_Os08g37890) were also detected in our results, which were associated with long, barbed awns in wild rice (Table S8) [31, 32].

About

We Support OUR Contributors

Get Our Newsletter

 Receive podcast updates
Exclusive insights
Patient Engagement Tips from industry experts
We hate SPAM as much as you do and promise to keep your email address safe.
  • Subscribe to the Podcast