Publication details

Creation and visualization of secondary structure consensus for protein families

Authors	MIDLIK Adam HUTAŘOVÁ VAŘEKOVÁ Ivana HUTAŘ Jan NAVRÁTILOVÁ Veronika KOČA Jaroslav BERKA Karel SVOBODOVÁ VAŘEKOVÁ Radka
Year of publication	2019
Type	Appeared in Conference without Proceedings
MU Faculty or unit	Faculty of Science
Citation
Description	Protein structural data, deposited in the Protein Data Bank, are a valuable source of information and their amount is continuously growing (currently more than 150 000 structures). Furthermore, most protein structures can be classified into protein families based on their similarity [1]. Systematic study of these families is gaining importance and can yield interesting research results. Every protein family has a set of characteristic secondary structure elements (SSEs, namely helices and ß-strands). Their arrangement is well defined and consistent throughout the whole family. However, there will always be some differences between the members of the family and a single structure is not enough to represent the whole family of structures, just as a single amino acid sequence is not enough to represent the whole family of sequences. For sequences, this problem is solved by multiple sequence alignment, which produces the consensus sequence and can be visualized by a sequence logo [2] – this extracts the essential features of the family and shows the similarities and differences within the family. For secondary structure, such an approach is currently missing. In this work, we introduce computational methods for extracting and visualizing the secondary structure consensus for a given protein family. Apart from giving an overview of the family, this consensus can also be used as an annotation template for our previously developed program SecStrAnnotator [3]. This allows annotation of SSEs in any family and unlocks the possibility of automated annotation of the key regions (e.g. active sites and channels) based on their position relative to the SSEs. [1] Dawson NL, Lewis TE, Das S, Lees JG, Lee D, Ashford P, Orengo CA, Sillitoe I (2017) CATH: an expanded resource to predict protein function through structure and sequence. Nucleic Acids Res 45(D1):D289-D295. https://doi.org/10.1093/nar/gkw1098