"De novo" sequencing is a technique in bioinformatics mainly used to determine the amino acid sequences of new or unknown proteins or peptides. Compared to re-sequencing, this technique pays special attention to those proteins or peptides that have not been identified or are not present in existing databases.
Why Do We Need De Novo Sequencing?
Although there are a large amount of known protein sequence data, there are still a large number of proteins and peptides whose sequences and functions have not yet been discovered. De novo sequencing can help researchers reveal the amino acid sequences of these unidentified proteins or peptides.
How Is De Dovo Sequencing Performed?
De novo sequencing relies mainly on mass spectrometry technology, especially Tandem Mass Spectrometry (MS/MS). In this technology, the peptides are ionized and enter the mass spectrometer, where they are further split into smaller fragments. The mass of these fragments is measured and used to infer the amino acid sequence of the original peptide.
Analyzing the Structure and Function of Unknown Peptides
1. Structural Analysis
The amino acid sequences obtained through De novo sequencing can be used to predict the three-dimensional structure of proteins. Modern protein structure prediction algorithms, such as AlphaFold, can accurately predict the three-dimensional structure of proteins based on the amino acid sequence.
2. Functional Analysis
Once the structure information of the peptide or protein is obtained, researchers can further carry out functional experiments, such as binding experiments, enzyme activity determination, etc., to determine its biological function. In addition, its function can be inferred by comparing it with known proteins or functional domains.
Challenges of De Dovo Sequencing
1. Complexity of Fragment Analysis
Inferring the sequence of the original peptide based on the mass of the fragments is complex and there may be multiple possible sequences.
2. Difficulty of Sequencing Long Peptides
Long peptides may produce a large number of fragments, making sequence inference more difficult.
3. Large Amount of Data
De novo sequencing usually generates a large amount of data, which requires powerful computing capabilities and professional software for analysis.