Python dna sequence analysis

5/21/2023

The users only need to input their benchmark dataset and the query biological sequences, followed by getting their desired results from the output of the Pse-Analysis system. To speed up such processes, we are to propose a Python package called Pse-Analysis, which is based on the framework of LIBSVM and which can automatically generate the predictor desired by users. Each of the five procedures is time-consuming and tedious, particularly in how to select the optimal parameters for the samples concerned and for the operation engine adopted. It is quite laborious even if using computational approches to deal with these problems since the development of each computational predictor needs to undergo the following five steps : (1) benchmark dataset preparation, (2) optimise sample formulation, (3) optimize operation engine, (4) conduct cross-validations, and (5) establish a web-server. PPBS (proire-protein binding sites, as well as a long list of references cited in a recent comprehensive review. įor protein/peptide sequences, they are about how to identify various PTM (Posttranslational Modification) sites, anticancer peptides, interactions between drugs and target proteins, PPI (protein-protein interaction). For DNA/RNA sequences, these problems are about how to identify the recombination spots, nucleosome positioning, promoters, microRNA precursors, enhancers, translation initiation sites, various PTRM (postpost-replication modification) sites in DNA and PTCM (post-transcriptiom modification) sites in RNA, RNA pseudouridine sites, DNA origin of replication, adenosine to inosine editing sites in RNA, and many more other topics as mentioned in a recent review article. With the explosive growth of biological sequences in the post-genomic age, we are facing a lot of binary classification problems. The Pse-Analysis Python package is freely accessible to the public at, and can be directly run on Windows, Linux, and Unix. Moreover, the multiprocessing technique was adopted to enhance computational speed by about 6 folds. All the aforementioned tedious jobs can be automatically done by the computer. Based on the benchmark dataset, Pse-Analysis will automatically construct an ideal predictor, followed by yielding the predicted results for the submitted query samples. All the work a user needs to do is to input a benchmark dataset along with the query biological sequences concerned. The powerful package can automatically complete the following five procedures: (1) sample feature extraction, (2) optimal parameter selection, (3) model training, (4) cross validation, and (5) evaluating prediction quality.

To expedite the pace in conducting genome/proteome analysis, we have developed a Python package called Pse-Analysis. Received: DecemAccepted: DecemPublished: January 05, 2017 Keywords: sequence analysis, pseudo components, support vector machine, genome/proteome analysis Bin Liu 1, 2, 3, Hao Wu 1, Deyuan Zhang 4, Xiaolong Wang 1, 2, Kuo-Chen Chou 3, 5ġSchool of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, ChinaĢKey Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, ChinaģGordon Life Science Institute, Boston, Massachusetts, USAĤSchool of Computer, Shenyang Aerospace University, Shenyang, Liaoning, ChinaĥKey Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, Chinaīin Liu, email:,

0 Comments

Python dna sequence analysis

Leave a Reply.

Author

Archives

Categories