分类: 信息科学与系统科学 >> 信息科学与系统科学基础学科 分类: 生物学 >> 生物进化论 分类: 生物学 >> 生物数学 分类: 物理学 >> 交叉学科物理及相关领域的科学与技术 分类: 生物学 >> 遗传学 提交时间: 2023-10-15
摘要: Background: In bioinformatics, tools like multiple sequence alignment and entropy methods probe sequence information and evolutionary relationships between species. Although powerful, they might miss crucial hierarchical relationships formed by the reuse of repetitive subsequences like duplicons and transposable elements. Such relationships are governed by evolutionary tinkering'', as described by Fran c{c}ois Jacob. The newly developed Ladderpath theory provides a quantitative framework to describe these hierarchical relationships.Results: Based on this theory, we introduce two indicators: order-rate $ eta$, characterizing sequence pattern repetitions and regularities, and ladderpath-complexity $ kappa$, characterizing hierarchical richness within sequences, considering sequence length. Statistical analyses on real amino acid sequences showed: (1) Among the typical species analyzed, humans possess relatively more sequences with large $ kappa$ values. (2) Proteins with a significant proportion of intrinsically disordered regions exhibit increased $ eta$ values. (3) There are almost no super long sequences with low $ eta$. We hypothesize that this arises from varied duplication and mutation frequencies across different evolutionary stages, which in turn suggests a zigzag pattern for the evolution of protein complexity. This is supported by our simulations and examples from protein families such as Ubiquitin and NBPF.Conclusions: Our method emphasizes how objects are generated'', capturing the essence of evolutionary tinkering and reuse. The findings hint at a connection between sequence orderliness and structural uncertainty, and suggest that different species or those in varied environments might adopt distinct protein elongation strategies. These insights highlight our method's value for further in-depth evolutionary biology applications.