Publications

2018 Read More...
Classification of large DNA methylation datasets for identifying cancer drivers Fabrizio Celli, Fabio Cumbo, Emanuel Weitschek Big Data Research
Abstract: DNA methylation is a well-studied genetic modification crucial to regulate the functioning of the genome. Its alterations play an important role in tumorigenesis and tumor-suppression. Thus, studying DNA methylation data may help biomarker discovery in cancer. Since public data on DNA methylation become abundant – and considering the high number of methylated sites (features) present in the genome – it is important to have a method for efficiently processing such large datasets. Relying on big data technologies, we propose BIGBIOCL an algorithm that can apply supervised classification methods to datasets with hundreds of thousands of features. It is designed for the extraction of alternative and equivalent classification models through iterative deletion of selected features. We run experiments on DNA methylation datasets extracted from The Cancer Genome Atlas, focusing on three tumor types: breast, kidney, and thyroid carcinomas. We perform classifications extracting several methylated sites and their associated genes with accurate performance (accuracy>97%). Results suggest that BIGBIOCL can perform hundreds of classification iterations on hundreds of thousands of features in few hours. Moreover, we compare the performance of our method with other state-of-the-art classifiers and with a wide-spread DNA methylation analysis method based on network analysis. Finally, we are able to efficiently compute multiple alternative classification models and extract - from DNA-methylation large datasets - a set of candidate genes to be further investigated to determine their active role in cancer. BIGBIOCL, results of experiments, and a guide to carry on new experiments are freely available on GitHub at https://github.com/fcproj/BIGBIOCL. ScienceDirect Article
open_button
2016 Read More...
Enabling Multilingual Search through Controlled Vocabularies: the AGRIS Approach Fabrizio Celli, Johannes Keizer Metadata and Semantics Research, pp.237-248 (Springer) [Best Paper Award, MTSR 2016]
Abstract: AGRIS is a bibliographic database of scientific publications in the food and agricultural domain. The AGRIS web portal is highly visited, reaching peaks of 350,000 visits/month from more than 200 countries and territories. Considering the variety of AGRIS users, the possibility to support cross-language information retrieval is crucial to improve the usefulness of the website. This paper describes a lightweight approach adopted to enable the aforementioned feature in the AGRIS system. The proposed approach relies on the adoption of a controlled vocabulary. Furthermore, we discuss how expanding user queries with synonyms increases the sensitivity of a search engine and how we can use a controlled vocabulary to achieve this result. PDF, e-LIS
open_button
2015 Read More...
Discovering, Indexing and Interlinking Information Resources Fabrizio Celli, Johannes Keizer, Yves Jaques, Stasinos Konstantopoulos, Dušan Vudragović. F1000 Research
Abstract: The social media revolution is having a dramatic effect on the world of scientific publication. Scientists now publish their research interests, theories and outcomes across numerous channels, including personal blogs and other thematic web spaces where ideas, activities and partial results are discussed. Accordingly, information systems that facilitate access to scientific literature must learn to cope with this valuable and varied data, evolving to make this research easily discoverable and available to end users. In this paper we describe the incremental process of discovering web resources in the domain of agricultural science and technology. Making use of Linked Open Data methodologies, we interlink a wide array of custom-crawled resources with the AGRIS bibliographic database in order to enrich the user experience of the AGRIS website. We also discuss the SemaGrow Stack, a query federation and data integration infrastructure used to estimate the semantic distance between crawled web resources and AGRIS. Read Article, PDF
open_button
AGRIS: providing access to agricultural research data exploiting open data on the web Fabrizio Celli, Thembani Malapela, Karna Wegner, Imma Subirats, Elena Kokoliou, Johannes Keizer. F1000 Research
Abstract: AGRIS is the International System for Agricultural Science and Technology. It is supported by a large community of data providers, partners and users. AGRIS is a database that aggregates bibliographic data, and through this core data, related content across online information systems is retrieved by taking advantage of Semantic Web capabilities. AGRIS is a global public good and its vision is to be a responsive service to its user needs by facilitating contributions and feedback regarding the AGRIS core knowledgebase, AGRIS’s future and its continuous development. Periodic AGRIS e-consultations, partner meetings and user feedback are assimilated to the development of the AGRIS application and content coverage. This paper outlines the current AGRIS technical set-up, its network of partners, data providers and users as well as how AGRIS’s responsiveness to clients’ needs inspires the continuous technical development of the application. The paper concludes by providing a use case of how the AGRIS stakeholder input and the subsequent AGRIS e-consultation results influence the development of the AGRIS application, knowledgebase and service delivery. Read Article, PDF
open_button
2014 Read More...
The role of AGRIS in providing global agricultural information to boost productivity and food security Thembani Malapela, Fabrizio Celli, Imma Subirats, Johannes Keizer. IFLA WLIC 2014
Abstract: Access to agricultural information ensures that stakeholders in the farming system can make informed decisions towards increasing agricultural productivity. Agricultural information and data are stored in distributed systems and repositories (using different pieces of software and databases) that expose a variety of research outputs with various metadata formats. Often grey literature, journal articles and technical reports are lost as technologies are not employed to ensure that such resources are online, accessible and shared widely within the agricultural stakeholders. This paper reviews AGRIS (International System for Agricultural Science and Technology) as an international collaboration and partnership in collecting and using agricultural bibliographic data to enable researchers and policy makers retrieve related agricultural and scientific information and data. AGRIS database, therefore, uses bibliographic data as an aggregator of locating related content across information systems available on the Web through taking advantage of the Semantic Web and Linked Open Data technologies. Besides linking related content, AGRIS has a potential to gather like-minded researchers around AGRIS content to discuss and share ideas – this implies creating a social media layer by creating AGRIS social media. Read Article
open_button
2013
Migrating bibliographic datasets to the Semantic Web: The AGRIS case Stefano Anibaldi, Yves Jaques, Fabrizio Celli, Armando Stellato, Johannes Keizer. Semantic Web journal
Abstract: AGRIS is among the most comprehensive online collections of agricultural and related sciences information. It is a growing global catalog of 5 million high-quality structured bibliographic records indexed from a worldwide group of providers. AGRIS relies heavily on the AGROVOC thesaurus for its indexing. Following the conversion of that thesaurus into a SKOS concept-scheme and its publication as Linked Open Data (LOD), the entire set of AGRIS records was also triplified and released as LOD. As part of this exercise, OpenAGRIS, a semantic mashup application, was developed to dynamically combine AGRIS data with external data sources, using a mixture of SPARQL queries and web services. The re-engineering of AGRIS for the Semantic Web raised numerous issues regarding the relative lack of administrative metadata required to compellingly address the proof and trust layers of the Semantic Web stack, both within the AGRIS repository and in the external data pulled into OpenAGRIS. The AGRIS team began a process of disambiguation and enrichment to continue moving toward an entity-based view of its resources, beginning with the tens of thousands of journals attached to its records. The evolution of the system, the issues raised during the triplification process and the steps necessary for publishing the result as LOD content are hereby discussed and evaluated. Link2, Link3
open_button
The agINFRA Science Gateway for Agricultural Sciences Speaker. R Bruno, G Allegri, G Andronico, R Barbera, F Bitelli, A Budano, A Calanducci, F Celli. ISGC2013
Abstract: agINFRA (www.aginfra.eu) is a project co-funded by the European Commission under its Seventh Framework Programme that tries to introduce the agricultural scientific communities into the vision of open and participatory data-intensive science. agINFRA aims to remove existing obstacles concerning the data sharing and open access to scientific information and agriculture‘ data as well as to improve the preparedness of agricultural scientific communities to face, manage and exploit the abundance of relevant data that is available and can support agricultural research.
The agricultural domain includes a wide variety of increasingly complex, multi-disciplinary topics. Subjects vary from plant science and horticulture to agricultural engineering and agricultural economics to the environment generally and include an ever-growing array of interrelated research issues such as the linkages between climate change on the one hand and food security, or the loss of agro-biodiversity, or pressure on individual species on the other.
Scientists from all over the world are extensively researching those different subjects and thereby consuming as well as producing large volumes of data.
The integration process of the services accessing those data requires a registry of all the existing systems, a challenge that has started since the beginning of the project (agINFRA started on the 15th of October 2011 and will last three years). Many of those systems will be efficiently and securely accessed through single web entry points by both end users and system/data maintainers.
This contribution aims to demonstrate how the adoption of the Catania Science Gateway Framework (www.catania-science-gateways.it) can have a key role during and also beyond the agINFRA project lifetime providing a unique environment able to deal with this heterogeneity of systems. This work will describe the Science Gateway (http://aginfra-sg.ct.infn.it/) developed by the INFN Dpt. of Catania and registered as a Service Provider of several Identity Federations, which together with the adoption of the CLEVER cloud middleware, can provide a unique interface able to seamlessly access the different services of the project. Among others, the integration and use of the WebGIS-enabled Italian Soil Information System (ISIS), developed by the Agrobiology and Pedology Research Centre of the Italian Agricultural Research Council, will be shown.
This very challenging target could be reached only thanks to the adoption of widely accepted standards such as SAGA and SAML that ensure the sustainability, reliability and scalability of the proposed architecture.
Read Article
open_button
The role of vocabularies for estimating carbon footprint for food recipes using Linked Open Data. Ahsan Morshed, Fabrizio Celli. SML2OD2013
Abstract: The standard terms with known meanings are often called controlled vocabulary or light weight ontologies. These play vital role in the Linked Open Data cloud. These vocabularies capture a central notion of context for a specific domain in the knowledge cloud. The extra information is co-habited with these controlled vocabularies. This short paper shows the role of these vocabularies in calculation of carbon footprint for food recipes using Linked Open Data.
Read Article
open_button
Pushing, Pulling, Harvesting, Linking - Rethinking Bibliographic Workflows for the Semantic Web. Fabrizio Celli, Yves Jaques, Stefano Anibaldi, Johannes Keizer. EFITA-2013
Abstract: In this paper we describe the ongoing move of the AGRIS repository toward a decentralized approach based on Linked Open Data (LOD) (Bizer, et al., 2008). This move has progressively required modifications and enhancements to data, models and workflows. The growing demand for freely accessible data has brought a rise in data distributed using LOD, which combines Resource Description Framework (RDF) (McBride, 2004a) and RDF Schema (McBride, 2004b) with vocabularies such as Dublin Core (DC) (Miles, et al., 2009) and Simple Knowledge Organisation System, together with interfaces such as SPARQL query language for RDF (Prud'hommeaux, et al., 2008). While LOD implementations are by now a well-established pattern, the impacts that such approaches have on underlying business processes is less well understood. The openness of the LOD paradigm can expose flaws in information management workflows. Poor metadata, lack of metrics, vague provenance; all can contribute to the inability of an LOD-enabled system to satisfy the demands of the Semantic Web.
Read Article
open_button
2012
Proof and Trust in the OpenAGRIS Implementation. Yves Jaques, Stefano Anibaldi, Fabrizio Celli, Imma Subirats, Armando Stellato, Johannes Keizer. DC-2012
Abstract: The AGRIS repository is a bibliographic database covering almost forty years of agricultural research. Following the conversion of its indexing thesaurus AGROVOC into a concept-based vocabulary, the decision was made to express the entire AGRIS repository in RDF as Linked Open Data. As part of this exercise, a semantic mashup named OpenAGRIS was developed in order to access the records and use them to dynamically display related data from external systems through both SPARQL queries and traditional web services. The overall process raised numerous issues regarding the relative lack of administrative metadata required to compellingly address the top proof and trust layers of the semantic web stack, both within the AGRIS repository and in external data dynamically pulled into OpenAGRIS. The team began by disambiguating the journals in which the articles were published and converting them into RDF but quickly realized this was only the beginning of a series of necessary steps in moving from a closed to an open world paradigm. Further disambiguation of institutions, authors and AGRIS Centres as well as the use of the VoiD vocabulary and of quality indicator models are discussed and evaluated.
Read Article
open_button
[POSTER] The OpenAGRIS (s)mashup : Using linked data and web services to augment AGRIS repository content. Fabrizio Celli, Stefano Anibaldi, Yves Jaques, Imma Subirats, Armando Stellato, Lim Ying-Sean, Johannes Keizer. The 7th International Conference on Open Repositories (2012)
Abstract: “The AGRIS Network is an international initiative based on a collaborative network of institutions, whose aim is to promote free access to information on science and technology in agriculture and related subjects” [1]. It has for over 35 years indexed and given access to bibliographic metadata used in the Food and Agriculture Organization of the United Nations (FAO) efforts to end world hunger and today contains almost three million records from over a hundred centres.
The quickly improving precision of commercial search engines coupled with the increasing availability of full-text documents was a mixed blessing for AGRIS. While on the one hand site traffic has never been higher, much of it is indexing activity from an army of web-bots and it remains difficult to increase user’s average time-on-site. Thus, there was the need to re-evaluate the possible uses to which bibliographic metadata may in the 21st century be put.
The team decided to transform AGRIS into a concept-based repository, requiring concept disambiguation and leading to the publication of records as Linked Data by carefully selecting RDF properties to allow reuse and interoperability. The process of disambiguation was challenging but within 2011 all keywords and journals were disambiguated and 60 million triples were published. A set of challenges remain such as disambiguating institutions and authors. After this conversion step, the team built a semantic mashup over the data, named OpenAGRIS [2]. This application uses AGRIS records to dynamically determine when it can retrieve related content from other providers such as the World Bank [3], Global Biodiversity Information Facility [4] and DBPedia [5], which are then automatically displayed with the bibliographic record.
The process of converting toward a concept-based repository and building services that exploit the additional capabilities such models provide has proven invaluable in seeing with a fresh eye the repository itself. Revisiting the data not only helped the team to clean, disambiguate and improve the content, it also brought an updated selection of metadata properties that insured the content would be as widely interpretable as possible. Finally, moving towards a machine-readable language like RDF brought up interesting questions involving the proof and trust layers of the semantic web stack. These questions are now helping to fuel another round of improvements and extensions to AGRIS repository data as the team considers how best to disseminate administrative metadata such as provenance, licensing and quality assurance. The implementation of a mashup also made it clear how few repositories and other information systems are truly ready for the Semantic Web. The promise of dynamic discovery and dissemination of data through standard models and interfaces like RDF is still a dream for all but a handful of pioneers. While OpenAGRIS was in the end constrained to go beyond SPARQL and add internal ad-hoc metadata alignments and custom Web Services, it may encourage data providers to publish their data as Linked Data, providing SPARQL endpoints and services to access, reason and extract implicit knowledge.
References:
[1] Subirats, I., Onyancha, I., Salokhe, G., et al. Towards an architecture for open archive networks in Agricultural Sciences and Technology, 2007. In International Conference on Semantic Web & Digital Libraries, Bangalore (India).
[2] Celli, F., Anibaldi, S., Folch, M., Jaques, Y., and Keizer, J. (2011). OpenAGRIS: using bibliographical data for linking into the agricultural knowledge web. AOS 2011.
[3] http://data.worldbank.org/indicator
[4] http://data.gbif.org/welcome.htm
[5] http://dbpedia.org/About
Read Article
open_button
A runtime approach to model-generic translation of schema and data. Paolo Atzeni, Luigi Bellomarini, Francesca Bugiotti, Fabrizio Celli, Giorgio Gianforme. Inf. Syst. 37(3): 269-287 (2012)
Abstract: To support heterogeneity is a major requirement in current approaches to integration and transformation of data. This paper proposes a new approach to the translation of schema and data from one data model to another, and we illustrate its implementation in the tool MIDST-RT. We leverage on our previous work on MIDST, a platform conceived to perform translations in an off-line fashion. In such an approach, the source database (both schema and data) is imported into a repository, where it is stored in a universal model. Then, the translation is applied within the tool as a composition of elementary transformation steps, specified as Datalog programs. Finally, the result (again both schema and data) is exported into the operational system.
Here we illustrate a new, lightweight approach where the database is not imported. MIDST-RT needs only to know the schema of the source database and the model of the target one, and generates views on the operational system that expose the underlying data according to the corresponding schema in the target model. Views are generated in an almost automatic way, on the basis of the Datalog rules for schema translation.
The proposed solution can be applied to different scenarios, which include data and application migration, data interchange, and object-to-relational mapping between applications and databases. Read Article , Link2
open_button
2011
OpenAGRIS: using bibliographical data for linking into the agricultural knowledge web. Fabrizio Celli, Stefano Anibaldi, Maria Folch, Yves Jaques, Johannes Keizer.
AOS 2011
Abstract: Spreading and exchanging agricultural information is a critical issue to allow researchers to access and use the knowledge in this sector. As a contribution to this goal, we propose a new approach that allows merging and integrating all information available on the Web about a specific agricultural topic by the usage of the most modern Linked Open Data technologies. We leverage on our previous work on AGRIS, a public domain database with nearly 3 million structured bibliographical records on agricultural science and technology. In this paper we illustrate a new Semantic Web platform, OpenAgris, which aggregates information stored in various sources available on the Web, providing much data as possible about a topic or a bibliographical resource. Read Article
open_button
Design and implementation of a Solr plug-in for Chinese-English cross-language query expansion based on SKOS thesauri. Wei Sun, Fabrizio Celli, Ahsan Morshed, Yves Jaques, Johannes Keizer.
Informatics in Control, Automation and Robotics, Volume 2, LNEE 133, pp. 359-367, Springer 2011
Abstract: Given that existing studies for query expansion techniques for Chinese-English are relatively few and their level of standardization low, in order to improve efficiency of Chinese-English cross-language retrieval, this paper discusses the design and implementation of a SOLR plug-in for Chinese-English cross-language query expansion based on SKOS thesauri and used within the AGRIS agricultural bibliographic system. The paper also elaborates the key techniques involved in the plug-in. Finally, taking the AGRIS data resources as an example, the paper shows application examples for segmentation of mixed Chinese and English, user query parsing and AGRIS retrieval system etc., techniques that have improved the Chinese-English cross-language retrieval efficiency to a certain extent, and laid a technical foundation for research about knowledge retrieval and discovery in related fields. Read Article
open_button
2010
Where you stop is who you are: understanding people’s activities by places visited. Laura Spinsanti, Fabrizio Celli, Chiara Renso.
BMI 2010.
Abstract: The increasing availability of people traces - collected by portable devices - poses new possibilities and challenges for the study of people mobile behaviour. However, the raw data produced by such portable devices is poor from a semantic point of view, thus the gap between the person’s activity and the raw collected data generated by the activity is still too wide. The work presented in this paper aims to define an algorithm to understand the activity of a moving person from the sequence of places she visited. The contribution is twofold. On one hand, an algorithm to associate each stop of the traveling person to a list of probable visited places is introduced. On the other hand, the obtained sequence of places is classified into a possible activity performed by the moving person. Preliminary experimental results on a dataset of people moving by car in the city of Milan are reported.
Where you stop is who you are (BMI 2010 presentation)

Download
open_button

award.pngAwards and Achievements

  • MTSR-2016 Best Paper Award [PDF]: "Enabling Multilingual Search through Controlled Vocabularies: the AGRIS Approach", Fabrizio Celli, Johannes Keizer.
  • The Third Place at the Linked Up Vici competition, ISWC 2014 (19-23/10/2014), with the entry: "AGRIS, the hub for agricultural science".