Ideas submitted - Swiss Open Research Data Hackathon

Thank you for the ideas submitted.

We will vote on the first day of the hackathon which ideas we’ll hack.

1. Open science indicators

How to measure open science? Coming up with measurable indicators that cover all research outputs (publications, data, software, hardware, teaching material, as well as workflow) and address all aspects (FAIR) of open science is critical in order to compare progress on the topic.
Step 1) Definition of key indicators.
Step 2) Identification of data sources and automation of collection and monitoring process where feasible.

2. ORD graph

Some attempts to visualize research labs connections have proven useful in identifying connections (graphsearch.epfl.ch, CERN’s collaboration Spotting tool…).

Could we automate creating a graph of ORD? Each node is a dataset’s version, a new use of the data might combine it with another dataset… This would allow to better gage the use of major datasets.

3. Quick data explorer tool
Before downloading large datasets from a data repository (e.g. Zenodo), wouldn’t it be nice if one could skim through it? We envision a tool that takes data in a common format (say tabular, can be expanded to hierarchical and spatial..) and outputs quick key metrics: missing data, distributions, data types for each columns…

4. Tool to display datasets published on repos
An online tool allowing institutions to display the datasets published by their researchers on repositories.

The use case would be:
– a librarian or other manager finds a relevant dataset on a FAIR repository (Zenodo, Figshare, Dataverse, Yareta, FORS, etc.)
– they enter the DOI in the tool
– the tool scrapes metadata from the FAIR repository (authors, title, description, publication date, license, etc.)
– the metadata is added to a database and displayed on a list that can be embedded in the institution’s website

This tool would enhance the visibility of the institution’s open data efforts and of their researchers› datasets.

5. Web-based platform to build FAIR structured archive files

Goal: Facilitate the generation of structured archived files.
Solution: Create a web-based platform to build FAIR structured archive files. The field-specific information would be coded by the different communities.
Key features: The «archive forger» would structure pre-defined drop area, check the format of the files, include litterature reference, and facilitate the generation of links between files (workflow, pairs, etc.) and generate the .zip file.
Possible extension: file conversion (open format), previews, connection to validator and submission to data repository.

6. Legal bitTorrent Tracker for research data
To connect data between them from different repositories request classification, tokenization,
stemming, tagging, parsing, and semantic reasoning of each data. Provide a tool to do that, store the classification then provide a master Search API to look for data. It’s a legal bitTorrent Tracker for research data !!!

7. Open research data queriable by location
Humanities data is full of information about locations (city names, buildings, etc.) given in various languages. Researchers need powerful analysis tools to perform queries for or all events that happened in a certain place or to retrieve data in which a certain place is mentioned. To provide such powerful analysis tools, my idea is to automatically extract geolocation information (names in different languages, coordinates, etc.) from textual data and metadata in order to homogenize and store them in RDF format with a unique identifier such as those provided by GeoNames API. Let’s make open research data queriable by location.

8. DMLawTool

The web-based DMLawTool guides researchers through the most relevant legal issues related to research data management. As the legal liability of research data usually lies with the researchers, the tool empowers them to deal with their data in a legally compliant way throughout the whole data life-cycle and thus favors open access and re-use of research data.

As potential users of the DMLawTool, we invite you to try it out and provide your feedback. Does it answer your questions? Is it comprehensible and easy to use? What could be further improved? Is there something missing?

9. Guidelines to ease DMP template creation

DMP Generators are a great help nowadays for researchers and DMP in general could help to promote Open Science culture. However they need to be adapted to each domain and institutions. An idea would be to use the DMP Canvas Generator from https://dmp.vital-it.ch/ and start brainstorming about providing a process with guidelines to ease DMP template creation for specific domains and institutions.
We could build a team constituted of Vital-IT members, Specific researchers, Open Science consultants, Developers, … to come with a solution proposal.

10. SWISSUbase – Taking Quality to the Next Level

Imagine a researcher who needs to deposit research data. She wants easy access to a list of widely used, high-quality, and well-documented repositories so that she can choose which one is the best for her needs.

An overview of available repositories, including their maturity level, is essential to assess the quality of the open research data infrastructure. Additionally, it would act like a central hub to provide guidance on standards. Let’s hack on this!

11. Research Data Connectome

Scientific data is stored in many repositories which are not necessarily interoperable. Integrating metadata into a Linked Data Knowledge Graph solves this, which is a goal of the Connectome project.

We are looking for diverse teams of researchers and developers with Text-Data-Mining or Natural-Language-Processing expertise. The aim is to implement features for automatic entity-extraction from abstracts and their integration into the Connectome’s Linked Data Pipeline.

12. App to import all metadata in a reference management software

Research data are often not correctly cited in scientific articles and often omitted in the bibliographic section, despite the fact that repositories suggest a data citation and attribute a PID. The idea is to build an application that allows the import in a reference management software (e.g. Zotero, EndNote) of all the metadata needed to build the data citation in different bibliographic standards so that the researchers could save and use data citations as they would do for any other types of document (article, chapter, etc.). This would ease in-text referencing and the correct data citation in the bibliographic section.

13. Machine learning to cluster images based on their visual content

Image databases are mostly text databases with attached images. However, there are many image collections that have not yet been cataloged and are as such not searchable. Providing descriptive metadata to images is cumbersome and tedious. It would be extremely helpful if the images can be pre-ordered, that is clustered, based on image content. E.g providing clusters of similar images regarding the visual appearance would be great. Let’s explore how machine learning can help to cluster images according to shape, color, forms, etc. The DaSCH will provide some large collections of digitized photographic images for this purpose.