Title
Prediction of alphabets of local protein structures using data mining methods: doctoral dissertation
Creator
Maljković, Mirjana M., 1986-
CONOR:
103251977
Copyright date
2021
Object Links
Select license
Autorstvo-Nekomercijalno-Bez prerade 3.0 Srbija (CC BY-NC-ND 3.0)
License description
Dozvoljavate samo preuzimanje i distribuciju dela, ako/dok se pravilno naznačava ime autora, bez ikakvih promena dela i bez prava komercijalnog korišćenja dela. Ova licenca je najstroža CC licenca. Osnovni opis Licence: http://creativecommons.org/licenses/by-nc-nd/3.0/rs/deed.sr_LATN. Sadržaj ugovora u celini: http://creativecommons.org/licenses/by-nc-nd/3.0/rs/legalcode.sr-Latn
Language
Serbian
Cobiss-ID
Theses Type
Doktorska disertacija
description
Datum odbrane: 15.10.2019.
Other responsibilities
Academic Expertise
Prirodno-matematičke nauke
Academic Title
-
University
Univerzitet u Beogradu
Faculty
Matematički fakultet
Alternative title
Predviđanje alfabeta lokalne strukture proteina primenom metoda istraživanja podataka
Publisher
[M. Maljković]
Format
XV, 149 str.
description
computer science - data mining, bioinformatics / računarstvo - istraživanje podataka, bioinformatika
Abstract (en)
Proteins are linear biological polymers composed of amino acids
whose structure and function are determined by the number and order of amino
acids. The structure of the protein has three levels: primary, secondary and ter-
tiary (three-dimensional, 3D) structure. Since the experimental determination of
protein 3D structure is expensive and time-consuming, it is important to develop
predictors of protein 3D structure properties from the amino acid sequence (pri-
mary structure), such as 3D structure of the protein backbone. The 3D structure
of the backbone can be described using prototypes of local protein structure, i.e.
prototypes of protein fragments with a length of few amino acids. A set of local
structure prototypes determines the library of local protein structures, also called
the structural alphabet. A structural alphabet is defined as a set of N proto-
types of L amino acid length. The subject of this dissertation is the development
of models for the prediction of structural alphabet prototypes for a given amino
acid sequence using different data mining approaches. As one of the most known,
structural alphabet Protein Blocks (PBs) was used in one part of the doctorial re-
search. Structural alphabet PBs consists of 16 prototypes that are defined using
fragments of 5 consecutive amino acids. The amino acid sequence is combined
with the structural properties of a protein that can be determined based on amino
acid sequence (occurrence of repeats in the amino acid sequence) and results of
predictors of protein structural properties (backbone angles, secondary structures,
occurrence of disordered regions, accessible surface area of amino acids) as an
input to the prediction model of structural alphabet prototypes. Besides the de-
velopment of models for prediction of prototypes of existing structural alphabet,
the analysis of the capability of developing new structural alphabets is researched
by applying the TwoStep clustering algorithm and construction of models for the
prediction of prototypes of new structural alphabets. Several structural alpha-
bets, which differ in the length of prototypes and the number of prototypes, have
been constructed and analyzed. Fragments of the large number of proteins, whose
structure is experimentally determined, were used to construct the new structural
alphabets.
Abstract (sr)
Proteini su linearni biološki polimeri sastavljeni od aminokiselina čiji
broj i redosled određuju strukturu i funkciju proteina. Struktura proteina je defin-
isana sa tri nivoa: primarnom, sekundarnom i tercijarnom (trodimenzionalnom,
3D) strukturom. Pošto je eksperimentalno određivanje 3D strukture proteina
skupo i vremenski zahtevno, postoji potreba za razvojem programa koji na osnovu
aminokiselinske sekvence (primarne strukture) predviđaju osobine 3D strukture,
kao što je 3D struktura glavnog lanca proteina (eng. backbone). 3D struktura
glavnog lanca proteina može da se opiše korišćenjem prototipova lokalne strukture
proteina, tj. delova proteina od nekoliko uzastopnih aminokiselina. Skup defin-
isanih prototipova lokalne strukture čini biblioteku lokalnih struktura proteina,
koja se još naziva i strukturni alfabet (eng. structural alphabet). Svaki strukturni
alfabet je definisan kao skup od N prototipova dužine L aminokiselina. Pred-
met ove disertacije je pravljenje modela za predviđanje prototipova strukturnog
alfabeta za zadatu aminokiselinsku sekvencu primenom različitih algoritama is-
traživanja podataka. Kao jedan od najpoznatijih, strukturni alfabet Protenski
blokovi (eng. Protein Blocks) je korišćen u jednom delu istraživanja u okviru dis-
ertacije. Strukturni alfabet Proteinski blokovi se sastoji od 16 prototipova koji su
napravljeni na osnovu delova proteina od 5 uzastopnih aminokiselina. Kao ulaz
u model za predviđanje prototipova strukturnog alfabeta koriste se strukturne
osobine proteina koje mogu da se odrede na osnovu aminokiselinske sekvence
(lokacija ponavljajuće niske u aminokiselinskoj sekvenci) i rezultati predviđanja
nekih strukturnih osobina proteina (uglovi glavnog lanca, sekundarne strukture,
pojavljivanje neuređenih regiona, pristupačna površina). Pored razvoja modela
za predviđanje prototipova postojećeg strukturnog alfabeta, u radu je izvršena i
analiza mogućnosti razvoja novih strukturnih alfabeta primenom algoritma klas-
terovanja TwoStep i pravljenje modela za predviđanje prototipova novih struk-
turnih alfabeta. Radi analize, napravljeno je više strukturnih alfabeta sa različitim
brojem prototipova i različite dužine prototipova. Za istraživanje novih strukturni
alfabeta korišćeni su delovi velikog broja proteina čija je struktura eksperimen-
talno određena
Authors Key words
data mining, structural alphabet, prediction model, Protein Blocks
Authors Key words
straživanje podataka, strukturni alfabeti, model za predviđanje,
Proteinski blokovi
Classification
004.6:577.112(043.3)
Type
Tekst
Abstract (en)
Proteins are linear biological polymers composed of amino acids
whose structure and function are determined by the number and order of amino
acids. The structure of the protein has three levels: primary, secondary and ter-
tiary (three-dimensional, 3D) structure. Since the experimental determination of
protein 3D structure is expensive and time-consuming, it is important to develop
predictors of protein 3D structure properties from the amino acid sequence (pri-
mary structure), such as 3D structure of the protein backbone. The 3D structure
of the backbone can be described using prototypes of local protein structure, i.e.
prototypes of protein fragments with a length of few amino acids. A set of local
structure prototypes determines the library of local protein structures, also called
the structural alphabet. A structural alphabet is defined as a set of N proto-
types of L amino acid length. The subject of this dissertation is the development
of models for the prediction of structural alphabet prototypes for a given amino
acid sequence using different data mining approaches. As one of the most known,
structural alphabet Protein Blocks (PBs) was used in one part of the doctorial re-
search. Structural alphabet PBs consists of 16 prototypes that are defined using
fragments of 5 consecutive amino acids. The amino acid sequence is combined
with the structural properties of a protein that can be determined based on amino
acid sequence (occurrence of repeats in the amino acid sequence) and results of
predictors of protein structural properties (backbone angles, secondary structures,
occurrence of disordered regions, accessible surface area of amino acids) as an
input to the prediction model of structural alphabet prototypes. Besides the de-
velopment of models for prediction of prototypes of existing structural alphabet,
the analysis of the capability of developing new structural alphabets is researched
by applying the TwoStep clustering algorithm and construction of models for the
prediction of prototypes of new structural alphabets. Several structural alpha-
bets, which differ in the length of prototypes and the number of prototypes, have
been constructed and analyzed. Fragments of the large number of proteins, whose
structure is experimentally determined, were used to construct the new structural
alphabets.
“Data exchange” service offers individual users metadata transfer in several different formats. Citation formats are offered for transfers in texts as for the transfer into internet pages. Citation formats include permanent links that guarantee access to cited sources. For use are commonly structured metadata schemes : Dublin Core xml and ETUB-MS xml, local adaptation of international ETD-MS scheme intended for use in academic documents.