Research interests

Biographical sketch

Publications

Links

Courses

Text mining search engine

Department of Electronics, Computer and Systems Sciences

Andrea Tagarelli
Assistant Professor [Ricercatore] of Computer Science
University of Calabria

andrea tagarelli
Via P. Bucci, 41C, 87036 Arcavacata di Rende, Italy

 

+39 0984 49 4751
+39 0984 49 4713
tagarelli(at)deis.unical.it


Text Mining search engine

Google search engine customized for scholarly resources for knowledge discovery and data mining from text data.         by Andrea Tagarelli, 2010

 Research interests

Areas:  Knowledge Discovery and Data Mining - Information Retrieval - Web Databases and Semistructured Data Management - Artificial Intelligence - Bioinformatics
main Topics:  Clustering of Unstructured and Semistructured Text Data, Web Wrapping and Print-oriented Document Wrapping, Similarity Detection in Time Series Databases, Clustering of Uncertain Data, Subspace Clustering, Web Mining and Personalization
 

 Software tools  

SemXClust

A data mining system prototype for clustering semantically related XML documents, according to structure as well as content information.


Main Features:
- transactional representation of XML data;
- structure and content XML features enriched with the support of lexical ontology knowledge;
- flexible clustering goals: structure-driven, content-driven, and structure/content-driven clustering.


SCRAP - SChema-based wRAPper for web data

A wrapping system prototype for extracting information from HTML documents.

SCRAP is based on a novel wrapping approach which exploits both extraction rules and the desired schema of extracted  information in wrapper definition and evaluation.

See the SCRAP web page for further details.


PDFWrap

A wrapping system prototype for extracting information from print-oriented documents such as Acrobat PDF documents.

PDFWrap is based on a novel bottom-up wrapping approach to extract information tokens and integrate them into groups related according the logical structure of a PDF document.

See the PDFWrap web page for further details.


MSPTool

A tool for preprocessing mass spectrometry data.

Main Features:
- standard MS preprocessing operations, including range cutting, peak smoothing, valid peaks recognition, baseline correction, quantization, normalization;
- step-by-step wizard

See the MSPTool web page for further details.


XRep

A data mining system prototype for clustering XML documents by structure based on tree matching and tree merging techniques.

Main Features:
- organization of documents into structurally homogeneous groups/hierarchies (Clustering module);
- incremental visualization of cluster dendrograms (Visualization module);
- narrowing of query search space based on the notion of XML cluster representative (Query Optimization module).

The XRep system was presented at the ?Industrial Day? (University Roma 3, June 10, 2004), organized  by participants to the "Technologies and Services for Enhanced Content Delivery" project.


AMCo - Automatic Mail Category organizer

A data mining system prototype for classifying email messages, based on clustering and pattern discovery techniques.

Main Features:
- organization of messages into homogeneous groups/hierarchies (Clustering module);
- redirection of further incoming messages according to an initial organization (Incremental Update module);
- generation and maintenance of reliable descriptions of the discovered message groups (Cluster Labelling module).

The AMCo system was presented at the ?Industrial Day? (University of Pisa, February 7, 2003), organized  by participants to the "Technologies and Services for Enhanced Content Delivery" project.


Biographical sketch

Andrea Tagarelli is an Assistant Professor of Computer Science at the University of Calabria, Italy. He graduated magna cum laude in Computer Engineering, in 2001, and obtained his Ph.D. in Computer and Systems Engineering, in 2006. His Ph.D. thesis work focused on information and knowledge extraction from semistructured text data. He was research fellow at the Department of Computer Science & Engineering, University of Minnesota at Minneapolis, USA, working in the George Karypis's Data Mining Lab., from March to September 2007.
His research interests include topics in knowledge discovery and text/data mining, information extraction, Web and semistructured data management, spatio-temporal databases and applications in biomedicine. On these topics, he has coauthored journal articles, conference papers and book chapters and developed practical software tools. He has served as a reviewer as well as a member of program committee for leading journals and conferences in the fields of information systems, knowledge and data management, and artificial intelligence. He has been a SIAM member since 2008, and an ACM member since 2009. His professional activities are in the areas of business strategy and information technology. He is a cofounder of the Ithea IT company.


 Publications   / publications listed in DBLP /

Journals
S. Flesca, E. Masciari, A. Tagarelli. A Fuzzy Logic Approach to Wrapping PDF Documents. IEEE Transactions on Knowledge and Data Engineering, Accepted April, 2010
A. Tagarelli, S. Greco. Semantic Clustering of XML Documents. ACM Transactions on Information Systems 28(1), 2010
B. Fazzinga, S. Flesca, A. Tagarelli. Schema-based Web Wrapping. Knowledge and Information Systems, published on-line December 8, 2009. DOI 10.1007/s10115-009-0275-2. [In press 2010]
F. Gullo, G. Ponti, A. Tagarelli, S. Greco. A Time Series Representation Model for Accurate and Fast Similarity. Pattern Recognition 42(11):2998-3014, 2009.
F. Gullo, G. Ponti, A. Tagarelli, G. Tradigo, P. Veltri. MaSDA: A System for Analyzing Mass Spectrometry Data. Computer Methods and Programs in Biomedicine 95(2):S12-21, 2009.
D. Bonofiglio, S. Catalano, A. Perri, M. P. Baldini, S. Marsico, A. Tagarelli, D. Conforti, R. Guido, S. Ando'. Beneficial effects of iodized salt prophylaxis on thyroid volume in an iodine deficient area of Southern Italy. Clinical Endocrinology 71:124-129, 2009.
G. Manco, E. Masciari, A. Tagarelli. Mining Categories for Emails via Clustering and Pattern Discovery. Journal of Intelligent Information Systems 30(2):153-181, 2008.
S. Flesca , S. Greco, A. Tagarelli, E. Zumpano. Mining User Preferences, Page Content and Usage to Personalize Website Navigation. World Wide Web: Internet and Web Information Systems 8(3):317-345, 2005.
A. Tagarelli, I. Trubitsyna, S. Greco. Combining Linear Programming and Clustering Techniques for the Classification of Research Centers. The European Journal on Artificial Intelligence, AI Communications 17(3):111-122, 2004.
S. Flesca, G. Manco, E. Masciari, E. Rende, A. Tagarelli. Web Wrapper Induction: A Brief Survey. The European Journal on Artificial Intelligence, AI Communications 17(2):57-61, 2004.
Conferences
F. Gullo, C. Domeniconi, A. Tagarelli. Projective Clustering Ensembles. 9th IEEE International Conference on Data Mining (ICDM ?09), pp. 794-799. Miami, Florida, USA, December 6-9, 2009.
G. Ponti, A. Tagarelli. Topic-based Hard Clustering of Documents using Generative Models. 18th International Symposium on Methodologies for Intelligent Systems (ISMIS ?09), pp. 231-240. Prague, Czech Republic, September 14-17, 2009.
F. Gullo, G. Ponti, A. Tagarelli, S. Greco. Collaborative XML Document Clustering. 1th International Workshop on Distributed XML Processing. Vienna, Austria, September 22-25, 2009.
F. Gullo, G. Ponti, A. Tagarelli, S. Iiritano, M. Ruffolo, D. Labate. Low-voltage Electricity Customer Profiling based on Load Data Clustering. 13th International Database Engineering and Applications Symposium (IDEAS ?09), pp. 330-333. Cetraro, Italy, September 16-18, 2009.
F. Gullo, G. Ponti, A. Tagarelli, G. Tradigo, P. Veltri. Hierarchical Clustering of Microarray Data with Probe-level Uncertainty. 22nd IEEE International Symposium on Computer-Based Medical Systems (CBMS ?09). Albuquerque, New Mexico, USA, August 3-4, 2009.
F. Gullo, G. Ponti, A. Tagarelli, Sergio Greco. Information-Theoretic Hierarchical Clustering of Uncertain Data. 17th Italian Symposium on Advanced Database Systems (SEBD ?09), pp. 273-280. Camogli (Genova), Italy, June 21-24, 2009.
A. Tagarelli, M. Longo, S. Greco. Word Sense Disambiguation for XML Structure Feature Generation. 6th European Semantic Web Conference (ESWC ?09), pp. 143-157. Heraklion, Greece, May 31-June 4, 2009.
F. Gullo, A. Tagarelli, S. Greco. Diversity-based Weighting Schemes for Clustering Ensembles. 9th SIAM International Conference on Data Mining (SDM ?09), pp. 437-448. Sparks, Nevada, USA, April 30-May 2, 2009.
F. Gullo, G. Ponti, A. Tagarelli, S. Greco. A Hierarchical Algorithm for Clustering Uncertain Data via an Information-Theoretic Approach. 8th IEEE International Conference on Data Mining (ICDM ?08), pp. 821-826. Pisa, Italy, December 15-19, 2008.
F. Gullo, G. Ponti, A. Tagarelli. Clustering Uncertain Data Via K-Medoids. 2nd International Conference on Scalable Uncertainty Management (SUM ?08), pp. 229-242, LNAI 5291. Naples, Italy, October 1-3, 2008.
A. Tagarelli, M. Longo, S. Greco. Extracting Structural Semantic Features for XML Data. 16th Italian Symposium on Advanced Database Systems (SEBD ?08), pp. 144-155. Mondello (Palermo), Italy, June 22-25, 2008.
F. Gullo, G. Ponti, A. Tagarelli, G. Tradigo, P. Veltri. MSPtool: A versatile tool for Mass Spectrometry Data Preprocessing. 21th IEEE International Symposium on Computer-Based Medical Systems (CBMS ?08), pp. 209-214. Jyväskylä, Finland, June 17-19, 2008.
A. Tagarelli, G. Karypis. A Segment-based Approach To Clustering Multi-Topic Documents. Workshop on Text Mining, in conjunction with the 8th SIAM International Conference on Data Mining (SDM ?08). Atlanta, Georgia, USA, April 24-26, 2008.
B. Fazzinga, S. Flesca, S. Garruzzo, E. Masciari, A. Tagarelli. A Wrapper Generation System for PDF Documents. 23rd ACM Symposium on Applied Computing (SAC ?08), pp. 442-446. Fortaleza, Brazil, March 16-20, 2008.
F. Gullo, G. Ponti, A. Tagarelli, G. Tradigo, P. Veltri. A Time Series Based Approach for Classifying Mass Spectrometry Data. 20th IEEE International Symposium on Computer-Based Medical Systems (CBMS ?07), pp. 412-420. Maribor, Slovenia, June 20-22, 2007.
F. Gullo, G. Ponti, A. Tagarelli, S. Greco. Accurate and Fast Similarity Detection in Time Series. 15th Italian Symposium on Advanced Database Systems (SEBD ?07), pp. 172-183. Fasano (Brindisi), Italy, June 17-20, 2007.
S. Greco, M. Ruffolo, A. Tagarelli. Effective and Efficient Similarity Search in Time Series. 15th ACM Conference on Information and Knowledge Management (CIKM ?06), pp. 808-809. Arlington, VA, USA, November 6-11, 2006.
A. Tagarelli, S. Greco. SemXClust: A System for Semantic XML Clustering. 14th Italian Symposium on Advanced Database Systems (SEBD ?06), pp. 72-79. Portonovo (Ancona), Italy, June 18-21, 2006.
S. Flesca, S. Garruzzo, E. Masciari, A. Tagarelli. Wrapping PDF Documents Exploiting Uncertain Knowledge. 18th Conference on Advanced Information Systems Engineering (CAiSE ?06), pp. 175-189. Luxembourg, June 5-9, 2006.
A. Tagarelli, S. Greco. Toward Semantic XML Clustering. 6th SIAM International Conference on Data Mining (SDM ?06), pp. 188-199. Bethesda, Maryland, USA, April 20-22, 2006.
S. Greco, A. Scicchitano, A. Tagarelli, E. Zumpano. A Mobile-Aware System for Website Personalization. 6th International Conference on Web-Age Information Management (WAIM ?05), pp. 810-815. Hangzhou, China, October 11-13, 2005.
B. Fazzinga, S. Flesca, A. Tagarelli. Learning Robust Web Wrappers. 16th International Conference and Workshop on Database and Expert Systems Applications (DEXA ?05), pp. 736-745. Copenhagen, Denmark, August 22-26, 2005.
S. Flesca, S. Garruzzo, E. Masciari, A. Tagarelli. Wrapping PDF Documents: A Preliminary Study. 13th Italian Symposium on Advanced Database Systems (SEBD ?05), pp. 272-283. Brixen-Bressanone, Italy, June 20-22, 2005.
A. Tagarelli, S. Greco. Clustering Transactional XML Data with Semantically-Enriched Content and Structural Features. 5th International Conference on Web Information Systems Engineering (WISE ?04), LNCS 3306, pp. 266-278. Brisbane, Australia, November 22-24, 2004.
S. Flesca, A. Tagarelli. Schema-based Web Wrapping. 23rd International Conference on Conceptual Modeling (ER ?04), LNCS 3288, pp. 286-299. Shangai, China, November 8-12, 2004.
G. Costa, G. Manco, R. Ortale, A. Tagarelli. A Tree-based Approach to Clustering XML Documents by Structure. 8th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD ?04), LNAI 3202, pp. 137-148. Pisa, Italy, September 20-24, 2004.
S. Flesca, S. Greco, A. Tagarelli, E. Zumpano. Non-Invasive Support for Personalized Navigation of Websites. 8th International Database Engineering and Applications Symposium (IDEAS ?04), pp. 183-192. Coimbra, Portugal, July 7-9, 2004.
G. Costa, G. Manco, R. Ortale, A. Tagarelli. Clustering of XML Documents by Structure based on Tree Matching and Merging. 12th Italian Symposium on Advanced Database Systems (SEBD ?04), pp. 314-325. S. Margherita di Pula (Cagliari), Italy, June 21-23, 2004.
A. Tagarelli, I. Trubitsyna, S. Greco, E. Zumpano. A System Supporting Website Navigation. 12th Italian Symposium on Advanced Database Systems (SEBD ?04), pp. 142-149. S. Margherita di Pula (Cagliari), Italy, June 21-23, 2004.
A. Tagarelli, I. Trubitsyna, S. Greco. Mining scientific results through the  combined use of clustering and linear programming techniques. 6th International Conference on Enterprise Information Systems (ICEIS ?04), vol. 2, pp. 84-91. Porto, Portugal, April 14-17, 2004.
F. De Francesca, G. Gordano, R. Ortale, A. Tagarelli. Distance-based Clustering of XML Documents. 1st International Workshop on Mining Graphs, Trees and Sequences (MGTS ?03) - in conjunction with  7th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD ?03), pp. 75-78. Cavtat-Dubrovnik, Croatia, September 22-26, 2003.
E. Cesario, F. Folino, G. Manco, R. Ortale, D. Saccà, A. Tagarelli. Un Sistema Adattativo per Servizi Bancari in Rete basato su Web Mining. Associazione Italiana per l'Informatica e il Calcolo Automatico (AICA ?03), pp. 207-216. Trento, Italy, September 2003.
A. Tagarelli, I. Trubitsyna, A. Mecchia, T. Mostardi, R. Pupo. Mining Scientific Results to Measure the Efficiency of Research Centers. 11th Italian Symposium on Advanced Database Systems (SEBD ?03), pp. 147-160. Cetraro, Italy, June 2003.
G. Manco, E. Masciari, A. Tagarelli. A Framework for Adaptive Mail Classification. 14th International Conference on Tools with Artificial Intelligence (ICTAI ?02), pp. 387-392. Washington DC, USA, November 4-6, 2002.
G. Manco, E. Masciari, M. Ruffolo, A. Tagarelli. Towards an Adaptive Mail Classifier. Workshop su "Apprendimento Automatico: Metodi ed Applicazioni", Ottavo Convegno dell'Associazione Nazionale per l'Intelligenza Artificiale (AI*IA ?02). Siena, Italy, September 10-13, 2002.
Book Chapters
A. Tagarelli. XML Document Clustering. In book: Encyclopedia of Database Technologies and Applications, 2nd edition. Edited by Viviana E. Ferraggine, Jorge H. Doorn, Laura C. Rivero. Accepted December 2006 [To appear]
G. Manco, R. Ortale, A. Tagarelli. The Scent of a Newsgroup - Providing Personalized Access to Usenet News through Web Mining. In book: Handbook of Research on Text and Web Mining Technologies (2-vols), chap. XXXIV, pp. 393-414. Edited by M. Song and Y.-F. Brook Wu, New Jersey Institute of Technology, USA. Published by Information Science Reference. Copyright 2009.

G. Manco, R. Ortale, A. Tagarelli. The Scent of a Newsgroup - Providing Personalized Access to Usenet News through Web Mining. In book: Web Mining: Applications and Techniques, chap. XIX, pp. 393-414. Edited by A. Scime, State University of New York, USA. Published by Idea Group Inc., August 2004. Copyright 2005.



 Courses 

Orario di ricevimento:  Martedì, 15:30-17:30

[Calendario delle lezioni (dal 2008)]

2009/10
Programming Algorithms and Managing Data, 2nd year Laurea course (DM 270) in Computer Engineering, School of Engineering, University of Calabria. 
Data Mining and Knowledge Discovery in Biology and Medicine, 2nd year of 2nd level Laurea course in Computer and Medical Systems Engineering, University Magna Graecia of Catanzaro.
Classical Approaches to Data Analysis, 2nd year of the 5-year post-Laurea course in Clinical Pathology, School of Pharmacy, University of Calabria.
Mathematics (Calculus), 1st year of 1st level Laurea courses in Nutritional Sciences and in Environmental Toxicology, School of Pharmacy, University of Calabria.
Mathematics (Calculus), 1st year of 1st level Laurea courses in Scientific Drug Information and in Cosmetic Technology, School of Pharmacy, University of Calabria.
2008/09
Data Mining and Knowledge Discovery in Biology and Medicine, 2nd year of 2nd level Laurea course in Computer and Medical Systems Engineering, University Magna Graecia of Catanzaro.
Object Oriented Programming, 1nd year of Laurea course (DM 270) in Management Engineering, School of Engineering, University of Calabria. 
Data and Text Mining, 1nd year of 2st level Laurea course in Computer Science for Humanities, School of Letters and Philosophy, University of Calabria.
Classical Approaches to Data Analysis, 2nd year of the 5-year post-Laurea course in Clinical Pathology, School of Pharmacy, University of Calabria.
Mathematics (Calculus), 1st year of 1st level Laurea courses in Nutritional Sciences and in Environmental Toxicology, School of Pharmacy, University of Calabria.
Mathematics (Calculus), 1st year of 1st level Laurea courses in Scientific Drug Information and in Cosmetic Technology, School of Pharmacy, University of Calabria.
2007/08
Data Mining and Knowledge Discovery in Biology and Medicine, 2nd year of 2nd level Laurea course in Computer and Medical Systems Engineering, University Magna Graecia of Catanzaro.
Object Oriented Programming, 2nd year of 1st level Laurea course in Management Engineering, School of Engineering, University of Calabria.   Web site (in Italian)
Data and Text Mining, 1nd year of 2st level Laurea course in Computer Science for Humanities, School of Letters and Philosophy, University of Calabria.
Mathematics (Calculus), 1st year of 1st level Laurea courses in Nutritional Sciences and in Environmental Toxicology, School of Pharmacy, University of Calabria.
Mathematics (Calculus), 1st year of 1st level Laurea courses in Scientific Drug Information and in Cosmetic Technology, School of Pharmacy, University of Calabria.
2006/07
Object Oriented Programming, 2nd year of 1st level Laurea course in Management Engineering, School of Engineering, University of Calabria.   Web site (in Italian)
Mathematics (Calculus), 1st year of 1st level Laurea courses in Nutritional Sciences and in Environmental Toxicology, School of Pharmacy, University of Calabria.
Mathematics (Calculus), 1st year of 1st level Laurea courses in Scientific Drug Information and in Cosmetic Technology, School of Pharmacy, University of Calabria.
2005/06
Programming Laboratory, 2nd year of 1st level Laurea course in Computer Engineering, School of Engineering, University of Calabria.   Web site (in Italian)
Introduction to Computer Science (F), 1st year of 1st level Laurea course in Computer Engineering, School of Engineering, University of Calabria.    Web site (in Italian)
Data and Text Mining, 1nd year of 2st level Laurea course in Computer Science for Humanities, School of Letters and Philosophy, University of Calabria.
Web Mining, 2nd year of 2st level Laurea course in Computer Science for Humanities, School of Letters and Philosophy, University of Calabria.
2004/05 (Teaching Assistant)
Formal Languages and Compilers, 2nd year of 1st level Laurea course in Computer Engineering, School of Engineering, University of Calabria.  Web site (in Italian)
Databases, 3rd year of 1st level Laurea course in Computer Engineering, School of Engineering, University of Calabria.
Programming Laboratory, 2nd year of 1st level Laurea course in Computer Engineering, School of Engineering, University of Calabria.
Information Processing, 2nd year of Laurea course in Philosophy and Communication Sciences, School of Letters and Philosophy, University of Calabria.  Web site (in Italian)
2003/04 (Teaching Assistant)
Databases, 5th year of Laurea course in Computer Engineering, School of Engineering, University of Calabria.
Programming Laboratory, 2nd year of 1st level Laurea course in Computer Engineering, School of Engineering, University of Calabria.
2002/03 (Teaching Assistant)
Programming Laboratory, 2nd year of 1st level Laurea course in Computer Engineering, School of Engineering, University of Calabria.
Foundations of Computer Science I, 1st year of 1st level Laurea course in Computer Engineering, School of Engineering, University of Calabria.
Introduction to Computer Science (D, H), 1st year of 1st level Laurea course in Computer Engineering, School of Engineering, University of Calabria.
Foundations of Computer Science II, 2nd year of 1st level Laurea course in Social Service Sciences, School of Political Sciences, University of Calabria.

This page was last updated: