Dr Anusuriya Devaraju1, Uwe Schindler1, Dr Michael Diepenbroek1
1MARUM, University Of Bremen, , Germany
With an increasing number of open research data in data portals, users found it difficult to discover new, relevant, and interesting datasets from these portals. Conventional data search, for example using a full text search may produce either too broad or too narrow results. It depends on the richness of metadata and how well a data search engine is fine-tuned to generate desired results. Further, this search mechanism is primarily suitable for users who can clearly express their information needs, but is inadequate for users who simply want to browse or discover potential datasets for developing or testing their applications. The PANGAEA data portal holds more than 380000 environmental datasets published with DOIs. We present a data recommender system to improve the discovery of PANGAEA datasets. The system was developed based on text analytics and usage mining. It produces two types of recommendations, ‘Datasets with similar metadata’ and ‘Users that were interested in this dataset were also interested in’. The first type of recommendation resembles text-matching search; however, it utilizes the metadata of a target dataset (e.g., title, author, location, time, publication, etc.) to identify its similar datasets. The second type of recommendation uses user interactions such as data page views, clicks and downloads to produce ‘new’ data recommendations. We evaluated the system online, its results highlight the effectiveness of the system in improving user engagement (e.g., click-through rates) on the data portal.
Dr. Anusuriya Devaraju is a data scientist holding a PhD in Geoinformatics from the University of Muenster, with a specialization in semantic integration of geospatial information. She has a Master’s degree in System Design for Internet Applications from the Newcastle University, UK. Currently, she is working at the Centre for Marine Environmental Sciences (MARUM), University Bremen, where she is responsible for developing and implementing techniques based on data mining and machine learning to improve the discovery of PANGAEA datasets. Prior joining the research centre, she worked at Commonwealth Scientific and Industrial Research Organisation (CSIRO) Australia and developed information models and tools to improve the discovery of CSIRO research assets. During her postdoctoral time at Forschungszentrum Jülich, she contributed significantly to TERENO data management and integration.