2018 | Read More... |
---|---|
Classification of large DNA methylation datasets for identifying cancer
drivers
Fabrizio Celli, Fabio Cumbo, Emanuel Weitschek
Big Data Research
Abstract:
DNA methylation is a well-studied genetic modification crucial to regulate the
functioning of the genome. Its alterations
play an important role in tumorigenesis and tumor-suppression. Thus, studying DNA
methylation data may help
biomarker discovery in cancer. Since public data on DNA methylation become abundant
– and considering the high
number of methylated sites (features) present in the genome – it is important to
have a method for efficiently
processing such large datasets. Relying on big data technologies, we propose
BIGBIOCL an algorithm that can apply
supervised classification methods to datasets with hundreds of thousands of
features. It is designed for the extraction
of alternative and equivalent classification models through iterative deletion of
selected features. We run experiments
on DNA methylation datasets extracted from The Cancer Genome Atlas, focusing on
three tumor types: breast,
kidney, and thyroid carcinomas. We perform classifications extracting several
methylated sites and their associated
genes with accurate performance (accuracy>97%). Results suggest that BIGBIOCL can
perform hundreds of
classification iterations on hundreds of thousands of features in few hours.
Moreover, we compare the performance of
our method with other state-of-the-art classifiers and with a wide-spread DNA
methylation analysis method based on
network analysis. Finally, we are able to efficiently compute multiple alternative
classification models and extract - from
DNA-methylation large datasets - a set of candidate genes to be further investigated
to determine their active role in
cancer. BIGBIOCL, results of experiments, and a guide to carry on new experiments
are freely available on GitHub at
https://github.com/fcproj/BIGBIOCL.
ScienceDirect Article
|
|
2016 | Read More... |
Enabling Multilingual Search through Controlled Vocabularies: the AGRIS
Approach
Fabrizio Celli, Johannes Keizer
Metadata and Semantics Research, pp.237-248
(Springer)
[Best Paper Award, MTSR 2016]
Abstract:
AGRIS is a bibliographic database of scientific publications in the food and
agricultural domain.
The AGRIS web portal is highly visited, reaching peaks of 350,000 visits/month from
more than 200
countries and territories. Considering the variety of AGRIS users, the possibility
to support
cross-language information retrieval is crucial to improve the usefulness of the
website.
This paper describes a lightweight approach adopted to enable the aforementioned
feature in
the AGRIS system. The proposed approach relies on the adoption of a controlled
vocabulary.
Furthermore, we discuss how expanding user queries with synonyms increases the
sensitivity of
a search engine and how we can use a controlled vocabulary to achieve this result.
PDF,
e-LIS
|
|
2015 | Read More... |
Discovering, Indexing and Interlinking Information Resources
Fabrizio Celli, Johannes Keizer, Yves Jaques, Stasinos Konstantopoulos, Dušan
Vudragović.
F1000 Research
Abstract:
The social media revolution is having a dramatic effect on the world of scientific
publication.
Scientists now publish their research interests, theories and outcomes across
numerous channels,
including personal blogs and other thematic web spaces where ideas, activities and
partial
results are discussed. Accordingly, information systems that facilitate access to
scientific
literature must learn to cope with this valuable and varied data, evolving to make
this research
easily discoverable and available to end users. In this paper we describe the
incremental process
of discovering web resources in the domain of agricultural science and technology.
Making use of
Linked Open Data methodologies, we interlink a wide array of custom-crawled
resources with the AGRIS
bibliographic database in order to enrich the user experience of the AGRIS website.
We also discuss
the SemaGrow Stack, a query federation and data integration infrastructure used to
estimate the
semantic distance between crawled web resources and AGRIS.
Read Article,
PDF
|
|
AGRIS: providing access to agricultural research data exploiting open data on
the web
Fabrizio Celli, Thembani Malapela, Karna Wegner, Imma Subirats, Elena Kokoliou, Johannes
Keizer.
F1000 Research
Abstract:
AGRIS is the International System for Agricultural Science and Technology.
It is supported by a large community of data providers, partners and users.
AGRIS is a database that aggregates bibliographic data, and through this core data,
related content across online information systems is retrieved by taking advantage
of Semantic Web capabilities. AGRIS is a global public good and its vision is to be
a responsive service to its user needs by facilitating contributions and feedback
regarding the AGRIS core knowledgebase, AGRIS’s future and its continuous
development.
Periodic AGRIS e-consultations, partner meetings and user feedback are assimilated
to the development of the AGRIS application and content coverage. This paper
outlines
the current AGRIS technical set-up, its network of partners, data providers and
users
as well as how AGRIS’s responsiveness to clients’ needs inspires the continuous
technical
development of the application. The paper concludes by providing a use case of how
the
AGRIS stakeholder input and the subsequent AGRIS e-consultation results influence
the
development of the AGRIS application, knowledgebase and service delivery.
Read Article,
PDF
|
|
2014 | Read More... |
The role of AGRIS in providing global agricultural information to boost
productivity and food security
Thembani Malapela, Fabrizio Celli, Imma Subirats, Johannes Keizer.
IFLA WLIC 2014
Abstract:
Access to agricultural information ensures that stakeholders in the farming system
can make informed
decisions towards increasing agricultural productivity. Agricultural information and
data are stored
in distributed systems and repositories (using different pieces of software and
databases) that expose
a variety of research outputs with various metadata formats. Often grey literature,
journal articles
and technical reports are lost as technologies are not employed to ensure that such
resources are online,
accessible and shared widely within the agricultural stakeholders. This paper
reviews AGRIS (International
System for Agricultural Science and Technology) as an international collaboration
and partnership in
collecting and using agricultural bibliographic data to enable researchers and
policy makers retrieve
related agricultural and scientific information and data. AGRIS database, therefore,
uses bibliographic
data as an aggregator of locating related content across information systems
available on the Web through
taking advantage of the Semantic Web and Linked Open Data technologies. Besides
linking related content,
AGRIS has a potential to gather like-minded researchers around AGRIS content to
discuss and share ideas –
this implies creating a social media layer by creating AGRIS social media.
Read Article
|
|
2013 | |
Migrating bibliographic datasets to the Semantic Web: The AGRIS case
Stefano Anibaldi, Yves Jaques, Fabrizio Celli, Armando Stellato, Johannes Keizer.
Semantic Web journal
Abstract:
AGRIS is among the most comprehensive online collections of agricultural and related
sciences information. It is a growing global catalog of 5 million high-quality
structured bibliographic records indexed from a worldwide group of providers. AGRIS
relies heavily on the AGROVOC thesaurus for its indexing. Following the conversion
of that thesaurus into a SKOS concept-scheme and its publication as Linked Open Data
(LOD), the entire set of AGRIS records was also triplified and released as LOD. As
part
of this exercise, OpenAGRIS, a semantic mashup application, was developed to
dynamically
combine AGRIS data with external data sources, using a mixture of SPARQL queries and
web services. The re-engineering of AGRIS for the Semantic Web raised numerous
issues
regarding the relative lack of administrative metadata required to compellingly
address
the proof and trust layers of the Semantic Web stack, both within the AGRIS
repository
and in the external data pulled into OpenAGRIS. The AGRIS team began a process of
disambiguation and enrichment to continue moving toward an entity-based view of its
resources, beginning with the tens of thousands of journals attached to its records.
The evolution of the system, the issues raised during the triplification process and
the
steps necessary for publishing the result as LOD content are hereby discussed and
evaluated.
Link2,
Link3
|
|
The agINFRA Science Gateway for Agricultural Sciences Speaker.
R Bruno, G Allegri, G Andronico, R Barbera, F Bitelli, A Budano, A Calanducci, F Celli.
ISGC2013
Abstract:
agINFRA (www.aginfra.eu) is a project co-funded by the European Commission under its
Seventh Framework Programme that tries to introduce the agricultural scientific
communities
into the vision of open and participatory data-intensive science. agINFRA aims to
remove
existing obstacles concerning the data sharing and open access to scientific
information and
agriculture‘ data as well as to improve the preparedness of agricultural scientific
communities to
face, manage and exploit the abundance of relevant data that is available and can
support
agricultural research.
The agricultural domain includes a wide variety of increasingly complex, multi-disciplinary topics. Subjects vary from plant science and horticulture to agricultural engineering and agricultural economics to the environment generally and include an ever-growing array of interrelated research issues such as the linkages between climate change on the one hand and food security, or the loss of agro-biodiversity, or pressure on individual species on the other. Scientists from all over the world are extensively researching those different subjects and thereby consuming as well as producing large volumes of data. The integration process of the services accessing those data requires a registry of all the existing systems, a challenge that has started since the beginning of the project (agINFRA started on the 15th of October 2011 and will last three years). Many of those systems will be efficiently and securely accessed through single web entry points by both end users and system/data maintainers. This contribution aims to demonstrate how the adoption of the Catania Science Gateway Framework (www.catania-science-gateways.it) can have a key role during and also beyond the agINFRA project lifetime providing a unique environment able to deal with this heterogeneity of systems. This work will describe the Science Gateway (http://aginfra-sg.ct.infn.it/) developed by the INFN Dpt. of Catania and registered as a Service Provider of several Identity Federations, which together with the adoption of the CLEVER cloud middleware, can provide a unique interface able to seamlessly access the different services of the project. Among others, the integration and use of the WebGIS-enabled Italian Soil Information System (ISIS), developed by the Agrobiology and Pedology Research Centre of the Italian Agricultural Research Council, will be shown. This very challenging target could be reached only thanks to the adoption of widely accepted standards such as SAGA and SAML that ensure the sustainability, reliability and scalability of the proposed architecture. Read Article |
|
The role of vocabularies for estimating carbon footprint for food recipes using
Linked Open Data.
Ahsan Morshed, Fabrizio Celli.
SML2OD2013
Abstract:
The standard terms with known meanings are often called controlled vocabulary or
light
weight ontologies. These play vital role in the Linked Open Data cloud. These
vocabularies
capture a central notion of context for a specific domain in the knowledge cloud.
The extra
information is co-habited with these controlled vocabularies. This short paper shows
the role of these
vocabularies in calculation of carbon footprint for food recipes using Linked Open
Data.
Read Article |
|
Pushing, Pulling, Harvesting, Linking - Rethinking Bibliographic Workflows for
the Semantic Web.
Fabrizio Celli, Yves Jaques, Stefano Anibaldi, Johannes Keizer.
EFITA-2013
Abstract:
In this paper we describe the ongoing move of the AGRIS repository toward a
decentralized
approach based on Linked Open Data (LOD) (Bizer, et al., 2008). This move has
progressively
required modifications and enhancements to data, models and workflows. The growing
demand
for freely accessible data has brought a rise in data distributed using LOD, which
combines
Resource Description Framework (RDF) (McBride, 2004a) and RDF Schema (McBride,
2004b)
with vocabularies such as Dublin Core (DC) (Miles, et al., 2009) and Simple
Knowledge
Organisation System, together with interfaces such as SPARQL query language for RDF
(Prud'hommeaux, et al., 2008). While LOD implementations are by now a
well-established
pattern, the impacts that such approaches have on underlying business processes is
less
well understood. The openness of the LOD paradigm can expose flaws in information
management
workflows. Poor metadata, lack of metrics, vague provenance; all can contribute to
the
inability of an LOD-enabled system to satisfy the demands of the Semantic Web.
Read Article |
|
2012 | |
Proof and Trust in the OpenAGRIS Implementation.
Yves Jaques, Stefano Anibaldi, Fabrizio Celli, Imma Subirats, Armando Stellato, Johannes
Keizer.
DC-2012
Abstract:
The AGRIS repository is a bibliographic database covering almost forty years of
agricultural
research. Following the conversion of its indexing thesaurus AGROVOC into a
concept-based
vocabulary, the decision was made to express the entire AGRIS repository in RDF as
Linked
Open Data. As part of this exercise, a semantic mashup named OpenAGRIS was developed
in
order to access the records and use them to dynamically display related data from
external
systems through both SPARQL queries and traditional web services. The overall
process raised
numerous issues regarding the relative lack of administrative metadata required to
compellingly
address the top proof and trust layers of the semantic web stack, both within the
AGRIS
repository and in external data dynamically pulled into OpenAGRIS. The team began by
disambiguating the journals in which the articles were published and converting them
into RDF
but quickly realized this was only the beginning of a series of necessary steps in
moving from a
closed to an open world paradigm. Further disambiguation of institutions, authors
and AGRIS
Centres as well as the use of the VoiD vocabulary and of quality indicator models
are discussed
and evaluated.
Read Article |
|
[POSTER] The OpenAGRIS (s)mashup : Using linked data and web services to augment
AGRIS repository content.
Fabrizio Celli, Stefano Anibaldi, Yves Jaques, Imma Subirats, Armando Stellato, Lim
Ying-Sean, Johannes Keizer.
The 7th International Conference on Open Repositories
(2012)
Abstract:
“The AGRIS Network is an international initiative based on a collaborative network
of institutions, whose aim is to
promote free access to information on science and technology in agriculture and
related subjects” [1]. It has for
over 35 years indexed and given access to bibliographic metadata used in the Food
and Agriculture Organization of
the United Nations (FAO) efforts to end world hunger and today contains almost three
million records from over a
hundred centres.
The quickly improving precision of commercial search engines coupled with the increasing availability of full-text documents was a mixed blessing for AGRIS. While on the one hand site traffic has never been higher, much of it is indexing activity from an army of web-bots and it remains difficult to increase user’s average time-on-site. Thus, there was the need to re-evaluate the possible uses to which bibliographic metadata may in the 21st century be put. The team decided to transform AGRIS into a concept-based repository, requiring concept disambiguation and leading to the publication of records as Linked Data by carefully selecting RDF properties to allow reuse and interoperability. The process of disambiguation was challenging but within 2011 all keywords and journals were disambiguated and 60 million triples were published. A set of challenges remain such as disambiguating institutions and authors. After this conversion step, the team built a semantic mashup over the data, named OpenAGRIS [2]. This application uses AGRIS records to dynamically determine when it can retrieve related content from other providers such as the World Bank [3], Global Biodiversity Information Facility [4] and DBPedia [5], which are then automatically displayed with the bibliographic record. The process of converting toward a concept-based repository and building services that exploit the additional capabilities such models provide has proven invaluable in seeing with a fresh eye the repository itself. Revisiting the data not only helped the team to clean, disambiguate and improve the content, it also brought an updated selection of metadata properties that insured the content would be as widely interpretable as possible. Finally, moving towards a machine-readable language like RDF brought up interesting questions involving the proof and trust layers of the semantic web stack. These questions are now helping to fuel another round of improvements and extensions to AGRIS repository data as the team considers how best to disseminate administrative metadata such as provenance, licensing and quality assurance. The implementation of a mashup also made it clear how few repositories and other information systems are truly ready for the Semantic Web. The promise of dynamic discovery and dissemination of data through standard models and interfaces like RDF is still a dream for all but a handful of pioneers. While OpenAGRIS was in the end constrained to go beyond SPARQL and add internal ad-hoc metadata alignments and custom Web Services, it may encourage data providers to publish their data as Linked Data, providing SPARQL endpoints and services to access, reason and extract implicit knowledge. References: [1] Subirats, I., Onyancha, I., Salokhe, G., et al. Towards an architecture for open archive networks in Agricultural Sciences and Technology, 2007. In International Conference on Semantic Web & Digital Libraries, Bangalore (India). [2] Celli, F., Anibaldi, S., Folch, M., Jaques, Y., and Keizer, J. (2011). OpenAGRIS: using bibliographical data for linking into the agricultural knowledge web. AOS 2011. [3] http://data.worldbank.org/indicator [4] http://data.gbif.org/welcome.htm [5] http://dbpedia.org/About Read Article |
|
A runtime approach to model-generic translation of schema and data.
Paolo Atzeni, Luigi Bellomarini, Francesca Bugiotti, Fabrizio Celli, Giorgio Gianforme.
Inf. Syst. 37(3): 269-287 (2012)
Abstract: To support heterogeneity is a major requirement in current
approaches to integration and transformation
of data. This paper proposes a new approach to the translation of schema and data
from one data model to another,
and we illustrate its implementation in the tool MIDST-RT.
We leverage on our previous work on MIDST, a platform conceived to perform
translations in an off-line fashion.
In such an approach, the source database (both schema and data) is imported into a
repository, where it is stored
in a universal model. Then, the translation is applied within the tool as a
composition of elementary
transformation steps, specified as Datalog programs. Finally, the result (again both
schema and data)
is exported into the operational system.
Here we illustrate a new, lightweight approach where the database is not imported. MIDST-RT needs only to know the schema of the source database and the model of the target one, and generates views on the operational system that expose the underlying data according to the corresponding schema in the target model. Views are generated in an almost automatic way, on the basis of the Datalog rules for schema translation. The proposed solution can be applied to different scenarios, which include data and application migration, data interchange, and object-to-relational mapping between applications and databases. Read Article , Link2 |
|
2011 | |
OpenAGRIS: using bibliographical data for linking into the agricultural
knowledge web.
Fabrizio Celli, Stefano Anibaldi, Maria Folch, Yves Jaques, Johannes Keizer. AOS 2011
Abstract: Spreading and exchanging agricultural information is a critical
issue to allow researchers to
access and use the knowledge in this sector. As a contribution to this goal, we
propose a
new approach that allows merging and integrating all information available on the
Web about a
specific agricultural topic by the usage of the most modern Linked Open Data
technologies. We
leverage on our previous work on AGRIS, a public domain database with nearly 3
million structured
bibliographical records on agricultural science and technology. In this paper we
illustrate a new
Semantic Web platform, OpenAgris, which aggregates information stored in various
sources available on
the Web, providing much data as possible about a topic or a bibliographical
resource.
Read Article
|
|
Design and implementation of a Solr plug-in for Chinese-English cross-language
query expansion based on SKOS thesauri.
Wei Sun, Fabrizio Celli, Ahsan Morshed, Yves Jaques, Johannes Keizer. Informatics in Control, Automation and Robotics, Volume 2, LNEE 133, pp. 359-367, Springer 2011
Abstract: Given that existing studies for query expansion techniques for
Chinese-English
are relatively few and their level of standardization low, in order to improve
efficiency of
Chinese-English cross-language retrieval, this paper discusses the design and
implementation of
a SOLR plug-in for Chinese-English cross-language query expansion based on SKOS
thesauri and used
within the AGRIS agricultural bibliographic system. The paper also elaborates the
key techniques
involved in the plug-in. Finally, taking the AGRIS data resources as an example, the
paper shows
application examples for segmentation of mixed Chinese and English, user query
parsing and AGRIS
retrieval system etc., techniques that have improved the Chinese-English
cross-language retrieval
efficiency to a certain extent, and laid a technical foundation for research about
knowledge
retrieval and discovery in related fields.
Read Article
|
|
2010 | |
Where you stop is who you are: understanding people’s activities by places
visited.
Laura Spinsanti, Fabrizio Celli, Chiara Renso. BMI 2010.
Abstract: The increasing availability of people traces - collected by
portable devices -
poses new possibilities and challenges for the study of people mobile behaviour.
However, the
raw data produced by such portable devices is poor from a semantic point of view,
thus the gap
between the person’s activity and the raw collected data generated by the activity
is still too
wide. The work presented in this paper aims to define an algorithm to understand the
activity of a
moving person from the sequence of places she visited. The contribution is twofold.
On one hand, an
algorithm to associate each stop of the traveling person to a list of probable
visited places is
introduced. On the other hand, the obtained sequence of places is classified into a
possible activity
performed by the moving person. Preliminary experimental results on a dataset of
people moving by car
in the city of Milan are reported.
Where you stop is who you are (BMI 2010 presentation) Download |