|

Department of
Electronics, Computer and Systems Sciences |
Andrea Tagarelli
Assistant Professor [Ricercatore] of Computer Science
University of Calabria |
 |
|
Via P. Bucci, 41C, 87036
Arcavacata di Rende, Italy |
+39 0984 49 4751
+39 0984 49 4713
tagarelli(at)deis.unical.it |
|
Text Mining
search engine
|
|
Google search engine customized for scholarly resources for
knowledge discovery and data mining from text data.
by Andrea Tagarelli, 2010
|
|
|
Research
interests
|
|
Areas:
Knowledge Discovery and Data Mining - Information Retrieval - Web Databases and Semistructured
Data Management - Artificial Intelligence - Bioinformatics
main Topics:
Clustering
of Unstructured and Semistructured Text Data, Web Wrapping and
Print-oriented Document Wrapping, Similarity Detection in Time Series Databases, Clustering of
Uncertain Data, Subspace Clustering, Web Mining and Personalization
|
Software
tools
|
|
SemXClust A data mining system
prototype for clustering semantically related XML documents, according to
structure as well as content information.
Main Features:
- transactional representation of XML data;
- structure and content XML features enriched with the support of lexical
ontology knowledge;
- flexible clustering goals: structure-driven, content-driven, and structure/content-driven clustering.
SCRAP - SChema-based wRAPper for web data
A wrapping system prototype for extracting information from HTML documents.
SCRAP is based on a novel wrapping approach which exploits both
extraction rules and the desired schema of extracted information
in wrapper definition and evaluation.
See
the SCRAP web page for further details.
PDFWrap
A wrapping system prototype for extracting information from print-oriented
documents such as Acrobat PDF documents.
PDFWrap is based on a novel bottom-up wrapping approach to
extract information tokens and integrate them into groups related
according the logical structure of a PDF document.
See
the
PDFWrap web page for further details.
MSPTool
A tool for preprocessing mass spectrometry data.
Main Features:
- standard MS preprocessing operations, including range cutting, peak smoothing, valid peaks recognition, baseline correction, quantization, normalization;
- step-by-step wizard
See
the
MSPTool web page for further details.
XRepA data mining system
prototype for clustering XML documents by structure based on tree matching and tree merging techniques.
Main Features:
- organization of documents into structurally homogeneous groups/hierarchies (Clustering module);
- incremental visualization of cluster dendrograms
(Visualization
module);
- narrowing of query search space based on the notion of XML cluster
representative (Query Optimization module).
The XRep system was presented at the ?Industrial Day? (University Roma 3,
June 10,
2004), organized by participants to the "Technologies
and Services for Enhanced Content Delivery" project.
AMCo - Automatic Mail Category organizerA data mining system
prototype for classifying email messages, based on clustering and pattern discovery techniques.
Main Features:
- organization of messages into homogeneous groups/hierarchies (Clustering module);
- redirection of further incoming messages according to an initial organization
(Incremental Update
module);
- generation and maintenance of reliable descriptions of the discovered message groups
(Cluster
Labelling module).
The AMCo system was presented at the ?Industrial Day? (University of Pisa,
February 7,
2003), organized by participants to the "Technologies
and Services for Enhanced Content Delivery" project.
|
|
|
Biographical
sketch
|
|
Andrea Tagarelli is an Assistant Professor of Computer Science at the University of Calabria, Italy.
He graduated magna cum laude in Computer Engineering, in 2001, and obtained his Ph.D. in Computer and
Systems Engineering, in 2006. His Ph.D. thesis work focused on information and knowledge extraction from semistructured text data.
He was research fellow at the
Department of Computer Science & Engineering,
University of Minnesota at Minneapolis, USA, working in the George Karypis's Data Mining Lab., from March to September 2007. His research interests include topics in knowledge discovery
and text/data mining, information extraction, Web and semistructured data
management, spatio-temporal databases and applications in biomedicine. On these
topics, he has coauthored journal articles, conference
papers and book chapters and developed practical software tools. He has served as
a reviewer as well as a member of program committee for leading journals and
conferences in the fields of information systems, knowledge and data management,
and artificial intelligence. He has been a
SIAM member since 2008,
and an ACM member since 2009.
His professional activities are in the areas of
business strategy and information technology. He is a cofounder of the
Ithea IT company.
|
|
|
Publications
/ publications listed in DBLP /
|
Journals
|
S. Flesca, E. Masciari, A. Tagarelli.
A Fuzzy Logic Approach to Wrapping PDF Documents.
IEEE Transactions on Knowledge and Data Engineering, Accepted April, 2010
|
| A. Tagarelli, S. Greco. Semantic Clustering of XML Documents.
ACM Transactions on Information Systems 28(1), 2010
|
|
B. Fazzinga, S. Flesca, A. Tagarelli.
Schema-based Web Wrapping.
Knowledge and Information Systems,
published on-line December 8, 2009. DOI
10.1007/s10115-009-0275-2. [In press 2010]
|
| F. Gullo, G. Ponti, A. Tagarelli, S. Greco. A Time Series Representation Model for Accurate and Fast Similarity.
Pattern Recognition
42(11):2998-3014, 2009.
|
| F. Gullo, G. Ponti, A. Tagarelli, G. Tradigo, P. Veltri. MaSDA: A System for Analyzing Mass Spectrometry Data.
Computer Methods and Programs in Biomedicine
95(2):S12-21, 2009. |
| D. Bonofiglio, S. Catalano, A. Perri, M. P. Baldini, S. Marsico, A. Tagarelli, D. Conforti, R. Guido,
S. Ando'. Beneficial effects of iodized salt prophylaxis on thyroid volume in an iodine deficient area of
Southern Italy.
Clinical Endocrinology
71:124-129, 2009.
|
| G. Manco, E. Masciari, A. Tagarelli. Mining Categories for Emails via Clustering and Pattern Discovery.
Journal of Intelligent Information Systems 30(2):153-181, 2008.
|
| S. Flesca , S. Greco, A. Tagarelli, E. Zumpano. Mining User Preferences, Page Content and Usage to Personalize Website Navigation.
World Wide Web: Internet and Web Information Systems 8(3):317-345, 2005.
|
| A. Tagarelli, I. Trubitsyna, S. Greco. Combining Linear Programming and Clustering Techniques for the Classification of Research Centers.
The European Journal on Artificial
Intelligence, AI Communications 17(3):111-122, 2004.
|
| S.
Flesca, G. Manco, E. Masciari, E. Rende, A. Tagarelli. Web Wrapper
Induction: A Brief Survey.
The European Journal on Artificial
Intelligence, AI Communications 17(2):57-61, 2004.
|
|
Conferences
|
F. Gullo, C. Domeniconi, A. Tagarelli.
Projective Clustering Ensembles.
9th IEEE International Conference on Data Mining (ICDM ?09), pp.
794-799. Miami, Florida, USA, December 6-9, 2009. |
|
G. Ponti, A. Tagarelli.
Topic-based Hard Clustering of Documents
using Generative Models.
18th International Symposium on
Methodologies for Intelligent Systems (ISMIS
?09),
pp. 231-240.
Prague, Czech Republic, September 14-17, 2009. |
|
F. Gullo, G. Ponti, A. Tagarelli, S. Greco.
Collaborative XML Document Clustering.
1th International Workshop on Distributed XML Processing.
Vienna, Austria, September 22-25, 2009. |
|
F. Gullo, G. Ponti, A. Tagarelli, S. Iiritano, M. Ruffolo, D.
Labate. Low-voltage Electricity Customer
Profiling based on Load Data Clustering.
13th International Database Engineering and Applications Symposium (IDEAS ?09),
pp. 330-333. Cetraro, Italy, September 16-18, 2009. |
|
F. Gullo, G. Ponti, A. Tagarelli, G. Tradigo, P. Veltri.
Hierarchical Clustering of Microarray Data with Probe-level
Uncertainty.
22nd IEEE International Symposium on Computer-Based Medical Systems (CBMS ?09).
Albuquerque, New Mexico, USA, August 3-4, 2009. |
|
F. Gullo, G. Ponti, A. Tagarelli, Sergio Greco.
Information-Theoretic Hierarchical Clustering of Uncertain Data.
17th Italian Symposium on Advanced Database Systems (SEBD ?09), pp.
273-280. Camogli (Genova), Italy, June 21-24, 2009. |
|
A. Tagarelli, M. Longo, S. Greco. Word Sense Disambiguation for XML Structure Feature Generation.
6th European Semantic Web Conference (ESWC ?09), pp.
143-157.
Heraklion, Greece, May 31-June 4, 2009. |
|
F. Gullo, A. Tagarelli, S. Greco. Diversity-based Weighting Schemes for Clustering Ensembles.
9th SIAM International Conference on Data Mining (SDM
?09), pp. 437-448.
Sparks, Nevada, USA, April 30-May 2, 2009. |
|
F. Gullo, G. Ponti, A. Tagarelli, S. Greco. A Hierarchical Algorithm for Clustering Uncertain Data via an
Information-Theoretic Approach.
8th IEEE International Conference on Data Mining (ICDM ?08), pp. 821-826. Pisa, Italy, December 15-19, 2008. |
|
F. Gullo, G. Ponti, A. Tagarelli. Clustering Uncertain Data Via K-Medoids.
2nd International Conference on Scalable Uncertainty Management (SUM ?08), pp.
229-242, LNAI 5291. Naples, Italy, October 1-3, 2008. |
|
A. Tagarelli, M. Longo, S. Greco. Extracting
Structural Semantic Features for XML Data.
16th Italian Symposium on Advanced Database Systems (SEBD ?08), pp.
144-155. Mondello (Palermo), Italy, June 22-25, 2008. |
|
F. Gullo, G. Ponti, A. Tagarelli, G. Tradigo, P. Veltri.
MSPtool: A versatile tool for Mass Spectrometry Data Preprocessing.
21th IEEE International Symposium on Computer-Based Medical Systems (CBMS ?08), pp.
209-214. Jyväskylä,
Finland, June 17-19, 2008. |
|
A. Tagarelli, G. Karypis.
A
Segment-based Approach To Clustering Multi-Topic Documents.
Workshop on Text Mining, in conjunction with the
8th SIAM International Conference on Data Mining (SDM ?08).
Atlanta, Georgia, USA, April 24-26, 2008. |
|
B. Fazzinga,
S. Flesca, S. Garruzzo, E. Masciari, A. Tagarelli.
A Wrapper Generation System for PDF Documents.
23rd ACM Symposium on Applied Computing (SAC ?08), pp. 442-446.
Fortaleza, Brazil, March 16-20, 2008. |
|
F. Gullo, G. Ponti, A. Tagarelli, G. Tradigo, P. Veltri. A Time Series Based Approach for Classifying Mass Spectrometry Data.
20th IEEE International Symposium on Computer-Based Medical Systems (CBMS ?07),
pp. 412-420.
Maribor, Slovenia, June 20-22, 2007. |
|
F. Gullo, G. Ponti, A. Tagarelli, S. Greco.
Accurate and Fast Similarity Detection in Time Series.
15th Italian Symposium on Advanced Database Systems (SEBD ?07),
pp. 172-183.
Fasano (Brindisi), Italy, June 17-20, 2007. |
|
S. Greco, M. Ruffolo, A. Tagarelli. Effective and Efficient Similarity Search in Time Series.
15th ACM Conference on Information and Knowledge Management (CIKM ?06),
pp. 808-809.
Arlington, VA, USA, November 6-11, 2006. |
|
A. Tagarelli, S. Greco. SemXClust: A System
for Semantic XML Clustering.
14th Italian Symposium on Advanced Database Systems (SEBD ?06),
pp. 72-79.
Portonovo (Ancona), Italy, June 18-21, 2006. |
|
S. Flesca, S. Garruzzo, E. Masciari, A. Tagarelli.
Wrapping PDF Documents Exploiting
Uncertain Knowledge.
18th Conference on Advanced Information Systems Engineering (CAiSE ?06),
pp. 175-189.
Luxembourg, June 5-9, 2006. |
|
A. Tagarelli, S. Greco. Toward Semantic XML Clustering.
6th SIAM International Conference on Data Mining (SDM ?06),
pp. 188-199. Bethesda, Maryland, USA, April 20-22, 2006. |
|
S. Greco, A. Scicchitano, A. Tagarelli, E. Zumpano. A Mobile-Aware System for Website Personalization.
6th International Conference on Web-Age Information Management (WAIM ?05), pp. 810-815. Hangzhou, China, October 11-13, 2005.
|
|
B. Fazzinga, S. Flesca, A. Tagarelli. Learning Robust Web Wrappers.
16th International Conference and Workshop on Database and Expert
Systems Applications (DEXA ?05), pp. 736-745. Copenhagen, Denmark, August 22-26, 2005.
|
|
S. Flesca, S. Garruzzo, E. Masciari, A. Tagarelli. Wrapping PDF Documents: A Preliminary Study.
13th Italian Symposium on Advanced Database Systems (SEBD ?05), pp. 272-283. Brixen-Bressanone, Italy, June 20-22, 2005.
|
|
A. Tagarelli, S. Greco. Clustering Transactional XML Data with Semantically-Enriched Content and
Structural Features.
5th International Conference on Web Information Systems Engineering (WISE ?04),
LNCS 3306, pp. 266-278. Brisbane, Australia, November 22-24, 2004.
|
|
S. Flesca, A. Tagarelli. Schema-based Web
Wrapping.
23rd International Conference on Conceptual
Modeling (ER ?04), LNCS 3288, pp. 286-299. Shangai, China, November 8-12, 2004. |
|
G. Costa, G. Manco, R. Ortale, A. Tagarelli.
A Tree-based Approach to Clustering XML Documents by Structure.
8th European Conference on Principles and Practice of Knowledge Discovery in Databases
(PKDD ?04), LNAI 3202, pp. 137-148.
Pisa, Italy, September 20-24, 2004. |
|
S. Flesca, S. Greco, A. Tagarelli, E. Zumpano. Non-Invasive Support for Personalized
Navigation of Websites.
8th International Database Engineering and Applications Symposium (IDEAS ?04),
pp. 183-192. Coimbra, Portugal, July 7-9, 2004. |
|
G. Costa, G. Manco, R. Ortale, A. Tagarelli. Clustering of XML Documents by Structure based
on Tree Matching and Merging.
12th Italian Symposium on Advanced Database Systems (SEBD ?04),
pp. 314-325.
S. Margherita di Pula (Cagliari), Italy, June 21-23, 2004. |
|
A. Tagarelli, I. Trubitsyna, S. Greco, E. Zumpano. A System Supporting Website
Navigation.
12th Italian Symposium on Advanced Database Systems (SEBD ?04),
pp. 142-149.
S. Margherita di Pula (Cagliari), Italy, June 21-23, 2004. |
|
A. Tagarelli, I. Trubitsyna, S. Greco. Mining scientific results through the
combined use of clustering and linear programming techniques.
6th
International Conference on Enterprise Information Systems (ICEIS ?04),
vol. 2, pp. 84-91.
Porto, Portugal, April 14-17, 2004. |
|
F. De Francesca, G. Gordano, R. Ortale, A. Tagarelli. Distance-based Clustering of XML
Documents.
1st International Workshop on Mining Graphs, Trees and Sequences (MGTS ?03)
- in conjunction with
7th European Conference on Principles and Practice of Knowledge Discovery in Databases
(PKDD ?03), pp. 75-78. Cavtat-Dubrovnik, Croatia, September 22-26, 2003. |
|
E. Cesario, F. Folino, G. Manco, R. Ortale, D. Saccà, A. Tagarelli. Un Sistema Adattativo per Servizi Bancari in Rete basato su Web
Mining. Associazione Italiana per l'Informatica e il Calcolo Automatico
(AICA ?03), pp. 207-216. Trento, Italy, September 2003. |
|
A. Tagarelli, I. Trubitsyna, A. Mecchia, T. Mostardi, R. Pupo. Mining Scientific Results to Measure the Efficiency of Research
Centers. 11th Italian Symposium on Advanced Database Systems
(SEBD ?03), pp. 147-160. Cetraro, Italy, June 2003. |
|
G. Manco, E. Masciari, A. Tagarelli. A Framework for Adaptive Mail
Classification.
14th International Conference on Tools with Artificial
Intelligence (ICTAI ?02), pp. 387-392. Washington DC, USA, November
4-6, 2002. |
|
G. Manco, E. Masciari,
M. Ruffolo, A. Tagarelli. Towards an Adaptive Mail Classifier.
Workshop su "Apprendimento Automatico: Metodi ed
Applicazioni", Ottavo Convegno dell'Associazione Nazionale per
l'Intelligenza Artificiale (AI*IA ?02). Siena, Italy, September
10-13, 2002. |
|
Book Chapters
| A. Tagarelli.
XML Document Clustering. In book: Encyclopedia of Database Technologies and Applications, 2nd edition. Edited by Viviana E. Ferraggine, Jorge H. Doorn, Laura C. Rivero. Accepted
December 2006 [To appear] |
| G. Manco, R.
Ortale, A. Tagarelli. The Scent of a Newsgroup - Providing Personalized
Access to Usenet News through Web Mining. In book:
Handbook of Research on Text and Web Mining Technologies
(2-vols), chap. XXXIV, pp. 393-414. Edited by M. Song
and Y.-F. Brook Wu,
New Jersey Institute of Technology, USA. Published by
Information Science Reference. Copyright 2009.
G. Manco, R.
Ortale, A. Tagarelli. The Scent of a Newsgroup - Providing Personalized
Access to Usenet News through Web Mining. In book: Web Mining:
Applications and Techniques, chap. XIX, pp. 393-414. Edited by A. Scime,
State University of New York, USA. Published by Idea Group Inc., August
2004. Copyright 2005. |
  |
|
|
|
|
Hot Links
|
|
recent or
upcoming Conferences,
Workshops, Schools
|
Communities
|
|
|
Courses
|
|
Orario di ricevimento: Martedì, 15:30-17:30
2009/10
|
Programming Algorithms and Managing Data, 2nd year Laurea course
(DM 270) in Computer Engineering,
School of Engineering, University of Calabria. |
|
Data Mining and Knowledge Discovery in Biology and Medicine,
2nd year of 2nd level Laurea course in Computer and Medical
Systems Engineering, University Magna Graecia of Catanzaro. |
|
Classical Approaches to Data Analysis, 2nd year of the 5-year post-Laurea course
in Clinical Pathology, School of
Pharmacy, University of Calabria. |
|
Mathematics (Calculus), 1st year of 1st level Laurea courses
in Nutritional Sciences and in Environmental Toxicology, School of
Pharmacy, University of Calabria. |
|
Mathematics (Calculus), 1st year of 1st level Laurea courses
in Scientific Drug Information and in Cosmetic Technology, School of
Pharmacy, University of Calabria. |
|
2008/09
|
Data Mining and Knowledge Discovery in Biology and Medicine, 2nd year of
2nd level Laurea course in Computer and Medical Systems
Engineering, University Magna Graecia of Catanzaro. |
|
Object Oriented
Programming, 1nd year of Laurea course (DM 270) in Management Engineering,
School of Engineering, University of Calabria. |
|
Data and Text
Mining, 1nd year of 2st level Laurea course in Computer
Science for Humanities, School of Letters and
Philosophy, University of Calabria. |
|
Classical Approaches to Data Analysis, 2nd year of the 5-year post-Laurea course
in Clinical Pathology, School of
Pharmacy, University of Calabria. |
|
Mathematics (Calculus), 1st year of 1st level Laurea courses
in Nutritional Sciences and in Environmental Toxicology, School of
Pharmacy, University of Calabria. |
|
Mathematics (Calculus), 1st year of 1st level Laurea courses
in Scientific Drug Information and in Cosmetic Technology, School of
Pharmacy, University of Calabria. |
|
2007/08
|
Data Mining and Knowledge Discovery in Biology and Medicine, 2nd year of
2nd level Laurea course in Computer and Medical Systems
Engineering, University Magna Graecia of Catanzaro. |
|
Object Oriented
Programming, 2nd year of 1st level Laurea course in Management Engineering,
School of Engineering, University of Calabria.
Web site (in Italian) |
|
Data and Text
Mining, 1nd year of 2st level Laurea course in Computer
Science for Humanities, School of Letters and
Philosophy, University of Calabria. |
|
Mathematics (Calculus), 1st year of 1st level Laurea courses
in Nutritional Sciences and in Environmental Toxicology, School of
Pharmacy, University of Calabria. |
|
Mathematics (Calculus), 1st year of 1st level Laurea courses
in Scientific Drug Information and in Cosmetic Technology, School of
Pharmacy, University of Calabria. |
|
2006/07
|
Object Oriented
Programming, 2nd year of 1st level Laurea course in Management Engineering,
School of Engineering, University of Calabria.
Web site (in Italian) |
|
Mathematics (Calculus), 1st year of 1st level Laurea courses
in Nutritional Sciences and in Environmental Toxicology, School of
Pharmacy, University of Calabria. |
|
Mathematics (Calculus), 1st year of 1st level Laurea courses
in Scientific Drug Information and in Cosmetic Technology, School of
Pharmacy, University of Calabria. |
|
2005/06
|
Programming Laboratory, 2nd year of 1st level Laurea course in Computer
Engineering, School of Engineering, University of Calabria.
Web site (in Italian) |
|
Introduction to Computer Science (F), 1st year of 1st level Laurea course
in Computer Engineering, School of Engineering, University of
Calabria.
Web site (in Italian) |
|
Data and Text
Mining, 1nd year of 2st level Laurea course in Computer
Science for Humanities, School of Letters and Philosophy,
University of Calabria. |
|
Web Mining,
2nd year of 2st level Laurea course in Computer
Science for Humanities, School of Letters and Philosophy,
University of Calabria. |
|
2004/05 (Teaching
Assistant)
|
Formal Languages
and Compilers, 2nd year of 1st level Laurea course in Computer
Engineering, School of Engineering, University of Calabria.
Web site (in Italian) |
|
Databases, 3rd year of 1st level Laurea course in Computer
Engineering, School of Engineering, University of Calabria. |
|
Programming Laboratory, 2nd year of 1st level Laurea course in Computer
Engineering, School of Engineering, University of Calabria. |
|
Information Processing, 2nd year of Laurea course in Philosophy and Communication Sciences,
School of Letters and Philosophy, University of Calabria. Web site (in Italian) |
|
2003/04 (Teaching
Assistant)
|
Databases, 5th year of Laurea course in Computer Engineering,
School of Engineering, University of Calabria. |
|
Programming Laboratory, 2nd year of 1st level Laurea course in Computer
Engineering, School of Engineering, University of Calabria. |
|
2002/03 (Teaching
Assistant)
|
Programming Laboratory, 2nd year of 1st level Laurea course in Computer
Engineering, School of Engineering, University of Calabria. |
|
Foundations
of Computer Science
I, 1st year of 1st level Laurea course in Computer Engineering,
School of
Engineering, University of Calabria. |
|
Introduction to Computer Science (D, H), 1st year of 1st level Laurea course
in Computer Engineering, School of Engineering, University of
Calabria. |
|
Foundations
of Computer Science II, 2nd year of 1st level Laurea course in Social
Service Sciences, School of Political Sciences, University of
Calabria. |
|
|
|