Data management: data publishing
When your research is complete and you are in the process of publishing your research output and research data. You publish the data for future reuse, for example for further research or educational purposes. The Hague University of Applied Sciences advises researchers to deposit their research data in DANS Data Stations. DANS is the national expertise centre for research data in the Netherlands.
Findable
Data are findable if:
- The data set is provided with a persistent identifier
- The data set provides data citation
- The data set is also linked to the persistent identifier of the author(s) (ORCID, ISNI)
- Provide the data set with complete metadata according to the appropriate standards
- The data set is linked to the publication based on the data set
Accessible
Data are accessible if:
- The data set has been included in and made available through a reliable data archive
- The metadata is publicly accessible, even if the data set itself is not
- Access conditions (access protocols, contact details, embargo) are clearly stated
Interoperable
Data are interoperable if:
- The (meta)data have been made accessible via an API (Application Programming Interface)
- The (meta)data contain correct terms and relevant vocabulary according to the standards of your profession
- The (meta)data of the highest quality are
- The data are available in open standardised data formats
Reusable
Data are reusable if:
- The dataset provides complete documentation (see Data Management (storage and organisation)
The dataset is licensed (view below this page section sharing data – licensing data).
Data publishing
The public sharing of data of completed (parts of) research supports transparency and openness of research. You will meet any requirements of subsidy providers or publishers and you will respect codes of conduct and declarations. The impact of your research will be increased within and outside your specialisation and it benefits the visibility of you as a researcher.
In essence, four things are important when sharing data: (1) place the reusable data in a data archive that uses (2) a metadata standard and (3) a persistent identifier and (4) licence the data. (1) is necessary for accessibility, (2) is necessary for exchangeability, (3) is necessary for (re)findability and citability and (4) is necessary to make actual reuse possible (and can also be used for exchangeability).
DANS , the data archive recommended by The Hague University of Applied Sciences, offers all options. Metadata standard is discussed below in the Data Archiving section. In the following, we look at licensing and quoting. But first we will discuss the reasons for restricting data sharing.
Dans Data Stations and domain-specific archives
The Hague University of Applied Sciences advises researchers to archive their research data in the DANS Data Stations data archive after completion of their research. DANS has placed a number of points of attention concerning the deposit of data in their archive in a manual. Please read it carefully. For the deposit of your data, you enter into an agreement with the archive:
- DANS is granted the right to include the data set in its archive and to make it available under the conditions indicated.
- The agreement is a "non-exclusive" licence. This means that the owner of the data set remains free to deposit it and/or make it available elsewhere.
- You declare that you are the rightful claimant, or that you have permission from possible rightful claimants to deposit and make available the data set. Think of copyright, database right or patent right.
- You do not waive database rights or any copyrights; unless you choose to place the data set in the public domain.
DANS Data Stations is a general data archive but has delivery specifications for certain disciplines, including social and behavioural sciences. You can also choose a domain-specific data archive within your field, if available. The advantage of a discipline-specific data archive is that the possibilities are even more tailored to the respective research community. The data can be described in a richer way by using discipline-specific metadata standards. The re3data.org website offers an overview of general and disciplinary data archives worldwide. You can filter by subject, specialisation or country. You also have the option to search for data archives with a data quality mark. Such a quality mark indicates that the archive is a Trusted Digital Repository according to third parties and that the research data deposited there can also be found and shared in the future. A data archive with a data quality mark such as the CoreTrustSeal puts long-term access, security, findability and standardisation first.
File Formats
The file format in which data is stored is of great importance for long-term access to research data. DANS Data Stations, the preferred data archive for researchers of The Hague University of Applied Sciences, works with various preferred formats for different types of research data. The deposit of research data in these preferred formats will be accepted by DANS without question. Therefore, please read their table of preferred formats carefully.
As a general guideline, DANS states that the file formats that are best suited for sustainability and accessibility, in the long term are the ones that:
- are widely used;
- have open specifications;
- are independent of specific software, developers or suppliers.
Metadata
By describing your data and providing accompanying metadata (data about data, characteristics or properties of data), you ensure the findability of the data. Search engines make use of these metadata fields.
DANS Data Stations, the preferred data archive for researchers of The Hague University of Applied Sciences, uses the Dublin Core metadata standard. This is a very common and general international metadata standard, it describes aprox. 15 fields such as, title, author, date, subject, description, publisher, etc. about your research. Here you can find an overview and explanation of the metadata fields offered by DANS Data Stations. The more fields you fill in, the greater the findability of the data. The metadata is public. The fields should therefore only contain personal data to justify the data set and no personal data of research subjects. There are different types of metadata and metadata is classified on both file and data level. Read more about the types and levels in this guide. The use of a metadata standard also guarantees interchangeability between systems. This makes your data more widely accessible.
In addition to general metadata standards such as Dublin Core, there are also domain-specific metadata standards. These are metadata fields that relate to, for example, numerical data (social sciences), material objects and their visualisations (archaeology), primary biodiversity data (biology) or tools to capture data (engineering).
Domain-specific data archives naturally use domain-specific metadata standards, but domain-specific metadata can also serve as a supplement to a general metadata standard (used by a general data archive). For example, DANS Data Stations contains specific fields for archaeology data that refer to the Archaeological Basis Register.
There are many different domain-specific metadata standards, depending on the research community, the purpose and the function in the domain. The English Digital Curation Centre provides a good overview. The Research Data Alliance established by the European Commission also maintains a list.
All or nothing?
Sharing research data is not an all or nothing choice. It ranges from making data completely open on the one hand to keeping it completely closed on the other, with various possible forms of restricted/controlled access in between.
Open research data are data that 'can be freely used, modified and shared by anyone for any purpose' (opendefinition.org). Closed research data are data that are temporarily embargoed or cannot be shared at all. Restricted/controlled research data are data that are not shared in a fully open manner, but made available under more restricted access and use conditions. This means that there are limits to who can access and use the data, how and/or for what purpose. Access to data can be restricted in various ways:
- First of all, a login or authentication related to a certain institution/organisation or with a membership can be used.
- You can also choose to work with an agreement between you as the provider of the data and those who want to reuse your data, a Data Use Agreement. You agree on the conditions and the ways in which your data may be reused.
- A data archive such as DANS also offers the possibility to (temporarily) embargo your data: during the embargo period, the description of the data set is often published, but the data itself is not available for reuse by others. If you want to configure an embargo on the data in DANS (maximum of two years), you can do so in the field 'Date available'.
At DANS you can choose for the access category 'Restricted Access'. Others can then request permission from you to view and download your data via the data archive. They have to justify their application. Before granting access, you can impose additional conditions on the other person. Such as a review by the institution's ethics committee requesting consent.
Whether you choose open, closed or restricted/controlled depends largely on what is appropriate given the nature of the data and ownership (whether you have the right permissions). Reasons to limit data sharing:
- The data constitute or contain personal data, i.e. any information relating to an identified or identifiable living individual (directly or indirectly). If possible, anonymise this data.
- You otherwise have a duty or have agreed to keep the data confidential (for example, by signing a confidentiality agreement or an agreement with a confidentiality clause).
- The data could potentially cause damage (e.g. to endangered species, vulnerable locations or groups, public health, national security, etc.) if made public.
- The data are not generated in the course of your own research project, but are provided by another party (e.g. commercial provider, government agency, etc.).
- Research data – or rather the form in which they are expressed – may, under certain circumstances, be protected by copyright and/or database law.
- The research data may constitute a patentable invention or contain commercially valuable know-how. If they are shared (prematurely), this could jeopardise your valorisation efforts.
Is there a legitimate reason as described here? Then subsidy providers, institutions and reputable journals/publishers will deviate from their conditions of data sharing. However, as researchers, you are expected to provide the appropriate justification, for example in the data management plan or in a data accessibility statement that you include in your publication. A data accessibility statement is usually included in the 'Acknowledgment' section of your article. Such a declaration indicates where and how the data on which the article is based can be consulted. And if the data cannot be made available, why.
De redenen om het delen van data te beperken
Of je kiest voor open, gesloten, beperkt/gecontroleerd, hangt grotendeels af wat passend is gezien de aard van de data en van het eigendomsrecht (beschik je over de juiste machtigingen / toestemming). Redenen om het delen van data te beperken:
- De data vormen of bevatten persoonsgegevens, d.w.z. alle informatie over een (direct of indirect) geïdentificeerde of identificeerbare levende natuurlijk persoon. Indien mogelijk anonimiseer je deze data.
- Je hebt anderszins de plicht of bent overeengekomen om de data vertrouwelijk te houden (bijvoorbeeld door een geheimhoudingsverklaring te ondertekenen of een overeenkomst met een vertrouwelijkheidsclausule).
- De data kunnen mogelijk schade toebrengen (bijvoorbeeld aan bedreigde diersoorten, kwetsbare locaties of groepen, volksgezondheid, nationale veiligheid, …) als ze openbaar worden gemaakt.
- De data worden niet gegenereerd in de loop van je eigen onderzoeksproject, maar worden aangeleverd door een andere partij (bijvoorbeeld een commerciële aanbieder, overheidsinstantie, …).
- Onderzoeksdata- of liever de vorm waarin ze tot uitdrukking komen – kunnen onder bepaalde omstandigheden worden beschermd door auteursrecht en/of databankenrecht.
- De onderzoeksdata kunnen een octrooieerbare uitvinding zijn, of commercieel waardevolle knowhow bevatten. Als ze (voortijdig) worden gedeeld, kan dit jouw valorisatie-inspanningen in gevaar brengen.
Is er sprake van een legitieme reden zoals we hier beschreven hebben? Dan zullen subsidieverstrekkers, instellingen en gerenommeerde tijdschriften/uitgevers afwijken van de voorwaarden die zij hanteren met betrekking tot het delen van data. Er wordt wel van je verwacht dat je als onderzoeker de juiste rechtvaardiging opgeeft, bijvoorbeeld in het datamanagementplan of in een datatoegankelijkheidsverklaring die je opneemt in je publicatie. Een datatoegankelijkheidsverklaring wordt meestal opgenomen in het onderdeel ‘Acknowledgment’ van je artikel. Zo’n verklaring geeft aan waar en hoe de data waarop het artikel is gebaseerd kunnen worden geraadpleegd. En als de data niet beschikbaar kunnen worden gemaakt, waarom dat het geval is.
Licensing data
When publishing research data, it is important to let potential users know in advance what they are allowed to do with the data. Licences are an effective way of doing this. A good data archive will normally apply a licence to each data set it contains. Usually, you can make a choice when you deposit data. DANS Data Stations offers the following licenceslist. Each licence is linked with more information about that specific licence.
Good practice is to apply a standard and open licence for open research data, as this ensures legal interchangeability and the widest possible reuse. One of the standard licences that is widely used for research data is the series of Creative Commons (CC) licences.
- The CC Attribution Licence (CC BY) gives others maximum freedom to reuse the data (i.e. copy, redistribute, adapt), provided they give proper acknowledgement.
- THE CC Attribution-ShareAlike (CC BY-SA) licence gives the same freedom to others as CC BY, but requires redistribution of derivative works (based on your data) under the same licence.
- You could use the CC licence Attribution-NonCommercial when applying for a patent or otherwise commercialising your research. But in the setting of research at The Hague University of Applied Sciences, where the research is done with public funds, using your data commercially yourself is not common.
- The CC Attribution-NoDerivatives licence allows others to use the data and share it as is but not to modify or transform it in any way. However, with data, this is equal to 'All rights reserved' and others can only verify results already derived from the data. This is a more restricted license.
The CC licences are general licences. Very well suited for data, but also for publications, among other things. There are licences that apply specifically to data. These are so-called Open Data Commons, subdivided into three licences:
- Public Domain Dedication and Licence (PDDL)
- Attribution Licence (ODC-By)
- Open Database Licence (ODC-ODbL)
And in case your data set contains software, you can use the Open Source Licences
Need help selecting a suitable standard licence? Check out this EUDAT licence selection tool.
Outreach
There are various ways of increasing the awareness and accessibility of your research data. When you create research output, you can add your data as additional material. This works for posters, papers or other publications. An enhanced publication is an online publication in which an article is accompanied by e.g. (links to) research data, illustrations, visualisations, internet sources and comments.
Have you created or collected a special data set? Or is the methodology used innovative and worthy of more extensive discussion than just a short paragraph in an article? Then you might consider publishing an article in a data journal. This is valuable for the following reasons:
- Together with your article, your data set is peer-reviewed and thus receives scientific accreditation for reuse.
- With the article about your data set, you make your methodology and results even more transparent.
- Publication in a data journal is another accessible route leading to your data set. This will increase the awareness of the publication.
Examples of data magazines:
Data citation
The publication (public sharing) of data sets increasingly counts as a citable contribution to the research curriculum. The citation of research data is part of the Altmetrics movement (alternative metrics), which states that the impact of your research is determined by (the references to) a wide range of research outputs such as data sets, software, blog posts, presentations, etc.
Data citation:
- makes data easier to find;
- promotes reproducibility;
- promotes the reuse of data;
- makes it possible to track the impact of the research data;
- creates a publication structure that allows for long-term availability of data;
- provides a structure within which the impact of the data can be traced back to the researchers who created it.
To be citable, a data set needs a persistent identifier (PID). When publishing data to a data archive such as DANS Data Stations, a PID is automatically assigned to the data set. The PID means that your data can always be found, even if you change the name and location. Broken links or 'page not found' messages are prevented by the use of a PID in data retrieval. Digital Object Identifiers (DOIs) are widely accepted as the persistent identifier for data citations. DANS Data Stations also uses DOI.
The publication (public sharing) of data sets increasingly counts as a citable contribution to the research curriculum. The citation of research data is part of the Altmetrics movement (alternative metrics), which states that the impact of your research is determined by (the references to) a wide range of research outputs such as data sets, software, blog posts, presentations, etc.
An example of a standard data citation to a data set in DANS Data Stations:
Coenen, M.J.H. (Radboud University) (1) (2022) (2): Data from: Genome-wide association study of nociceptive musculoskeletal pain treatment response in UK Biobank. (3) DANS. (4) https://doi.org/10.17026/dans-xns-un6c(5)
- Author: the person who has created the data set, individuals and/or organisation.
- Date: the year or exact date the data set was published.
- Title: the name given to the data set or the name of the research project.
- Publisher: the archive responsible for making the data set available.
- Online location: DOI or other persistent identifier.
Other possible elements in data citation are:
- Editor: person or persons (other than the author) responsible for the compilation, editing and correction of the data set.
- The format of the data files.
- Version, if more than one version of the data set has been deposited.
Data Archiving
If you have already applied good data management during your research, transferring your research data to an archive is not complicated. You make the final considerations about which data should or can be kept. You archive the research data that you were not able to publish in a public data archive. In addition to the data you archive project documents, informed consent or privacy-sensitive information.
Data selection
Which data are suitable for archiving?
- To decide on this, first consider the following general criteria:
- Research results that have a high social impact must always be verifiable (e.g. clinical trials)
- If applicable, observe the requirements, stipulations or conditions of your subsidy provider and publisher
- Archive data with a high potential reuse value
- Consider the scientific, cultural or historical significance of data. Data that are valuable for scientific-historical research, for example, are eligible for archiving
- When the value of data comes from the complexity of recreating the data, this data also qualifies for archiving. The value of the data retention is then greater than the cost of creating the data
- Look at the usability of the data: data format, sufficient documentation and metadata, clarity of ownership
- Then you can make a detailed choice with the help of the following points:
- In general, the best practice is to archive the raw data as much as possible. But there are reasons to share the processed data, for example, if your research is intended to demonstrate a new method. Archiving interviews in audio or video formats is also discouraged because it is difficult to anonymise such data
- In simulations, it is better to archive the data used for the simulation (instead of the data resulting from the simulation)
- In the case of experiments, it is always wise to archive all the data necessary to repeat your experiment
- When archiving completed questionnaires, it is not necessary to archive the empty questionnaires as well. But if you do not want to share the answers because of the sensitivity of the data, archive the empty questionnaires so that they can be shared
- Raw data from interviews containing personal data and sensitive information should be destroyed immediately after they have been anonymised
- Software and code are important to archive so that you can repeat the simulations yourself and so that other researchers can validate and further develop your code
- It is usually not necessary to archive intermediate data or auxiliary data. The final data, on the other hand, are important when they form the basis of your results. These data are crucial for the integrity and verification of your results
- Archive data from an external party only when the licence or the conditions allow you to archive the data. If not, make sure you document this data properly and archive this documentation. If the external party has archived the data themselves, you can refer to these data
Deleting data
For the data that you do not archive, you must take follow-up action. This includes, for example, deleting the data carefully. Pay special attention to sensitive data. When deleting data, you must prevent it from being restored and you must ensure that the data are deleted from all your storage locations. The most reliable way of destroying data is to render the carrier completely physically unusable. For device-independent storage or if you want to keep your device, files must be overwritten to make them inaccessible. This is called data erasure, data clearing, data wiping or data destruction.
Data retention period
The retention period of data depends on the field of study, its developments, the costs of storage and access, and the expected (re)use. Data sets that are considered to be heritage, such as the results of archaeological research, are generally preserved for eternity. In some cases, it is legally stipulated how long data must be kept. The General Data Protection Regulation (GDPR) does not specify a concrete retention period for personal data, but it does state that such data may not be kept longer than necessary. According to the Netherlands Code of Conduct for Research Integrity, ten years is a minimum retention period for raw research data in the Netherlands.
Consortium
When cooperating with other institutions or organisations, it will be necessary to examine together which institutions archive which data and where, and whether and how the sharing of data is facilitated. These agreements must be included in the (joint) data management plan but also laid down in writing in a consortium agreement. Periodically check that all parties continue to observe the procedures that have been agreed upon.
Support by a Data Steward
Researchers can receive support in research data management. The research data steward(s) of THUAS can be contacted at [email protected].