Your conditions: 刘志刚
  • The impacts of reference database selection, indicator threshold determination and target data preparation in the sequence data analysis of eDNA monitoring -- taking fish as the target in middle Yangtze River

    Subjects: Biology >> Ecology submitted time 2024-01-23

    Abstract: In the meta-barcoding based eDNA monitoring technology, the analysis and annotation of eDNA sequencing data serve as the foundation for obtaining accurate and reliable monitoring results. The selection of reference databases, the determination of analysis & annotation indicator thresholds, and the preparation of target data are the most critical technical steps in eDNA sequencing data analysis and annotation. To clarify the impacts of these three technical aspects and provide scientific support for the standardization of eDNA monitoring technology, the current study used two sets of COI gene sequence data from eDNA monitoring in the middle reach of the Yangtze River as the analysis objects and designed three sets of experiments to test 1) the impacts of different reference databases and species annotation algorithms on the annotation results, 2) the impacts of different OTU clustering sequence similarity and species annotation classification confidence (sequence consistency and sequence coverage) on the annotation results, and 3) the impacts of different target sequence data richness of each species on the annotation results. The results showed that: 1) under the Blast algorithm, the annotated species matched with three versions of nt library from NCBI were generally consistent (72%~78%); those matched with two local sequence reference libraries were also generally consistent (91%~96%); and the annotated species from the five result matched with these five sequence reference libraries were consistent in 52%~68%. The RDP Classifier algorithm annotated species matched with nt libraries covered over 95% of Blast algorithm annotated species, and increased by 151%~443% species, but most additional species were misannotated. The RDP Classifier algorithm annotated species matched with local sequence reference libraries covered 66%~85% of Blast algorithm annotated species, and there were several results only annotated to family or genus level. 2) When the OTU clustering sequence similarity threshold was set to 0.999, it obtained 154%~209% more OTUs than when set to 0.99, and 240%~490% more annotated OTUs of fish were obtained. The classification confidence threshold (Blast algorithm) had little effect on species composition when changed from 0.8 to 0.99, with over 94% consistency, but there was a significant difference when it was set to 0.7. 3) When the OTU clustering sequence similarity threshold was 0.999 and the classification confidence threshold was 0.9, the number of fish species and OTUs obtained from multiple sequences data annotation was the largest, and had the highest species annotation accuracy (81.49%), which increased by 7% fish species, 215% OTUs and 5% accuracy respectively compared to single sequences data annotation. In eDNA sequencing data analysis and annotation, accuracy can be improved by establishing and improving local reference databases, optimizing OTU clustering sequence similarity and species annotation classification confidence thresholds (sequence consistency and sequence coverage), increasing target sequence data richness. However, due to the limitation of species annotation algorithms, problems such as species annotation errors and omissions may persist in eDNA sequencing data analysis and annotation in the future. Then, the species annotation accuracy of eDNA monitoring (based on the COI gene) would always lower than 85%.

  • Quantifying the spatial resolution of eDNA monitoring: a case study in Middle Yangtze River in mean-flow period

    Subjects: Biology >> Ecology submitted time 2023-03-28

    Abstract:长江中游是长江极为重要的自由流淌河段,为中华鲟、长江江豚等水生生物提供了关键生境,开展常态化系统化eDNA (environmental DNA)监测对域内水生生物多样性评估和保护具有重要意义。eDNA监测的空间分辨率未量化限制了长江中游常态化eDNA监测的实施。为了量化长江中游eDNA监测的空间分辨率,我们探索建立了一个基于黑箱模型、简化过程和概率化表述的量化方法。本研究2020年6月(平水期)在长江中游设置30个采样断面,断面间隔在30 km左右,开展eDNA采样,进行高通量测序(原核生物用16S rRNA基因扩增子测序、真核生物用线粒体COI基因扩增子测序),根据流域生物信息流分析框架计算eDNA所能监测到的生物信息输移的量化特征,确定eDNA监测空间分辨率(系列)值及其可信度、覆盖度。结果显示长江中游平水期eDNA所能监测到的原核生物的生物信息输移能力为99.91%/km,非生命个体生物信息输移占比23.83%,非生命个体生物信息输移半衰距离为48.45 km;真核生物的eDNA输移能力为99.85%/km,非生命个体生物信息输移占比67.93%,非生命个体生物信息输移半衰距离为30.00 km。eDNA监测空间分辨率可信度和覆盖度之间存在权衡,原核生物eDNA监测空间分辨率的可信度与覆盖度平衡点在39 km,特征值在86%左右,真核生物eDNA监测空间分辨率的可信度与覆盖度平衡点在28 km,特征值在65%左右。研究建议不同监测目的可以根据需要选择不同监测空间分辨率:以河段单元内的物种组成为目的的监测,可优先覆盖度、牺牲可信度选择eDNA监测空间分辨率;以生物多样性空间结构为目的的监测,可优先可信度、牺牲覆盖度选择eDNA监测空间分辨率。原核生物90%以上覆盖度对应的空间分辨率为27 km(可信度为84.18%),真核生物90%以上覆盖度对应的空间分辨率为6 km(可信度为41.38%),80%以上覆盖度对应的空间分辨率为13 km(可信度为50.64%);原核生物90%以上可信度对应的空间分辨率为58 km(覆盖度为82.30%),真核生物90%以上可信度对应的空间分辨率为78 km(覆盖度为38.61%),80%以上可信度对应的空间分辨率为50 km(覆盖度为49.70%)。本研究可为长江中游eDNA监测断面设置提供量化参考,为其它河流或河段eDNA监测分辨率估算提供方法借鉴。