Personal tools

Information zum Seitenaufbau und Sprungmarken fuer Screenreader-Benutzer: Ganz oben links auf jeder Seite befindet sich das Logo der JLU, verlinkt mit der Startseite. Neben dem Logo kann sich rechts daneben das Bannerbild anschließen. Rechts daneben kann sich ein weiteres Bild/Schriftzug befinden. Es folgt die Suche. Unterhalb dieser oberen Leiste schliesst sich die Hauptnavigation an. Unterhalb der Hauptnavigation befindet sich der Inhaltsbereich. Die Feinnavigation findet sich - sofern vorhanden - in der linken Spalte. In der rechten Spalte finden Sie ueblicherweise Kontaktdaten. Als Abschluss der Seite findet sich die Brotkrumennavigation und im Fussbereich Links zu Barrierefreiheit, Impressum, Hilfe und das Login fuer Redakteure. Barrierefreiheit JLU - Logo, Link zur Startseite der JLU-Gießen Direkt zur Navigation vertikale linke Navigationsleiste vor Sie sind hier Direkt zum Inhalt vor rechter Kolumne mit zusaetzlichen Informationen vor Suche vor Fußbereich mit Impressum

Document Actions

FAQ


Introduction

 


 

Introduction

What is research data?

 

The term “research data” generally refers to all kinds of (digital) data that represent the result of scientific work or that serve as a basis for such work. Research data is generated using a wide variety of methods, such as measurements, source research or surveys. Therefore, it is always subject- and project-specific. For additional information on defining research data, see here.

 

Back to top

 

 

What is research data management?

 

Research data management includes measures that create and preserve sustainable data. Thus, it is relevant throughout the entire data lifecycle (fig. 1).

 

Fig. 1: Research data lifecycle

Ideally, such planning commences at the beginning of a research project and is regularly updated. Research data management does not only refer to data storage and archiving. It also enables you to find, access and comprehend data, and – as a result – use your data well into the future. For further information on research data management see here.


Back to top

 

 

Why should research data management be important to me?

 

There are several good reasons for systematically tackling research data management, which likewise stress the importance of good scientific practice:

 

In the figure below, you can see different possible aims of research data management:

Fig. 2: Aims of research data management

 

Back to top

 

 

What do I have to keep in mind when planning my project?

 

    1. Appoint the people responsible for setting up and controlling research data management at your institution.
    2. Check whether there are institute- and discipline-specific or general requirements and recommendations on how to handle research data.
    3. Always check which requirements on archiving and publishing research data you have to meet as early as possible. (Which specific demands do sponsors, publishers, and universities have?)
    4. Check which research data is collected during your research project.
    5. Think about which research data is to be published and provided for reuse.
    6. Think about how to store and archive your research data. (Data storage and digital preservation)
    7. What options do you have for storing and archiving research data? Could you use a generalist or discipline-specific repository? (How can I find a suitable repository?)
    8. Clear up all legal questions on storing and sharing research data. You might have to consider data protection and copyright.
    9. Create a data management plan to document your decisions. It will also serve as validation of your progress and project implementation. (How do I create a good data management plan?)
    10. Update your data management plan regularly during the course of your research.

 

Back to top

 

 

How do I create a good data management plan?

 

Examples and Templates:

Wizards:

  • CLARIN-D (Common Language Resources and Technology Infrastructure)
  • KomFor (The centre of competence for research data in the earth and environmental science)

 

Exemplary Data Management Plans:

 There is also Humboldt-University’s video tutorial (Ger) to give you a brief introduction to DMPs.

 

Back to top

 

 

Which specific demands do sponsors, publishers, and universities have?

 

  • Deutsche Forschungsgemeinschaft (DFG) (= German Research Foundation)

In its “Proposals for Safeguarding Good Scientific Practice“ (Eng + Ger) the DFG states that “Primary data as the basis for publications shall be securely stored for ten years in a durable form in the institution of their origin“. It goes on with:

“In the interest of transparency and to enable research to be referred to and reused by others, whenever possible researchers make the research data and principal materials on which a publication is based available in recognised archives and repositories in accordance with the FAIR principles (Findable, Accessible, Interoperable, Reusable).”

In 2015, the DFG Guidelines on the Handling of Research Data were passed. They state further recommendations on providing data and planning data driven projects, like:

„Applicants should consider during the planning stage whether and how much of the research data resulting from a project could be relevant for other research contexts and how this data can be made available to other researchers for reuse. Applicants should therefore detail in the proposal what research data will be generated or evaluated during a scientific research project. Concepts and considerations appropriate to the specific discipline for quality assurance and the handling and long-term archiving of research data should be taken as a basis.“

 

  • European Commission (EC)

The EC’s “Open Research Data Pilot” is part of the EU research and innovation program Horizon 2020. Its aim is to improve the access and reusability of research data originating in Horizon 2020 projects. The Open Research Data Pilot’s basic principle is the motto “as open as possible, as restricted as necessary”. (EC Guidelines on FAIR Data Management in Horizon 2020, p. 4)

During 2014-2016, only selected aspects of Horizon 2020 have been included into the project, but since the revised 2017 version of the program was released, all aspects are covered now.

There are three main obligations:

You have to create a data management plan according to the template. It has to be handed in within the first six months and updated according to relevant adjustments (or at least at interim and final evaluations).

Data storage: Your research data has to be stored in an institutional, project-specific or discipline-specific data repository as early as possible (‘underlying data’) or according to the data management plan (‘other data’).

Publication: If possible, your data should be published using an open license (preferably CC-BY or CC-O) without use restrictions. The publication has to include the necessary contextual information and tools.

However, if there are legitimate reasons, a partial or complete waiver of these requirements is possible

You can find further information by looking at the following:

Guidelines on FAIR Data Management in Horizon 2020

Guidelines on Open Access to Scientific Publications and Research Data in Horizon 2020

Horizon 2020 Online Manual: Open Access and Data Management

Horizon 2020: Annotated Model Grant Agreement (AGA)

OpenAIRE Research Data Management Briefing Paper

 

  • Publishers

Public Library of Science (PLOS): Data Availability Policy / Materials and Software Sharing Policy

Nature Publishing Group: Availability of Data, Material and Methods Policy

Science: Data and Materials Availability Policy and Preparing Your Supplementary Materials

BioMed Central: Availability of Supporting Data

Elsevier:  Text and Data Mining; Research Data Policy

 

  • Justus-Liebig-University Giessen

 Back to top

 

 

Data Storage and Digital Preservation

How can I structure my data?

 

At the different stages of modifying your data (e.g. raw data, cleaned data, data ready for analysis) you should create write-protected versions. You should only use copies of those original files for further processes. 

Naming conventions can vary widely depending on your discipline and your kind of data. However, names should always reflect the kind of data (raw data, cleaned data, analytical data) and the data format (work file, result file etc.).

The file name should always include the date of storage (follow the YYYYMMDD format), and appear at the beginning or end of the file name to ease sorting. Do not use special characters, umlauts or spaces – use underscores instead. The names should always be uniform, clear and meaningful.  

Examples for naming data files (see also: HU Berlin: Structure files):

  • \ [sediment] \ [sample] \ [instrument] \ [YYYYMMDD].dat
  • \ [experiment] \ [reagent]\[instrument]\ [YYYYMMDD].csv
  • \ [experiment] \ [experiment_set-up]\ [test_subject]\ [YYYYMMDD].sav
  • \ [observation] \ [location] \ [YYYYMMDD].mp4
  • \ [interview_partner] \ [interviewer] \ [YYYYMMDD].mp3

You can specify the file version in the file name in order to easily identify changes to your data. A well-known concept of versioning based on the DDI (Data Documentation Initiative) standard is: Major.Minor.Revision.

Starting from version “1.0.0” the following is changed:

1. the first position, if cases, variables, waves or samples are added or deleted

2. the second position, if data are corrected in a way that affects the analysis of your data

3. the third position, if there are minor revisions only that are of no consequence to interpreting your data

Versioning can also be supported by using appropriate software (Free Version Control Software, e.g. Git).

 

Back to top

 

 

 Which file formats should I choose?

Data Format

Recommended Formats

Less Suitable / Unsuitable

Audio

.wav / .flac

.mp3

Computer-Aided Design (CAD)

.dwg / .dxf / .x3d / .x3db / .x3dv

---

Databases

.sql / .xml

.accdb / .mdb

Raster Graphics

.tif (uncompressed) / .jp2 / .jpg2 / .png

.gif / .jpeg / .jpg / .psd

Statistical Data

.por

.sav (IBM®SPSS)

Tables

.csv / .tsv / .tab

.xls / .xlsx /.xlx

Texts

.odf / .rtf / .txt / PDF/A

.docx / .doc / PDF

Vector Graphics

.svg / .svgz

.cdr

Videos

.mp4 / .mkv / .mj2 / 
.avi (uncompressed)

.mov / .wmv

 

Back to top

 

 

Where do I store my data during my work process?

 

It is of the utmost importance to back up your data regularly in case of technical or human errors. It is the responsibility of the researcher to secure data. The Hochschulrechenzentrum (HRZ) (= university computer center) offers several possibilities for data storage:


 

 

 

In case you need more data storage for lager research projects, please contact the HRZ () at an early stage.

 

Back to top

 

 

What should I consider when backing up my data?

 

Good research data management is also characterized by the fact that you, as a researcher, are prepared as best as possible for a possible data loss. Therefore, you should already create a backup plan at the beginning of your research project, which should ideally also include regular backup routines. The following questions should be answered in a backup plan:

  • Which backup tool do you use?
  • Which data should be backed up?
  • Where do you backup your data?
  • How often do you backup your data?

You should also follow the so-called 3-2-1 backup rule (s. Fig. 3). This rule states that you should always keep at least 3 copies of your data on 2 different data devices (e.g. a USB stick and an external hard disk) as well as 1 copy on a decentralized storage location (e.g. the JLUbox or winfile). It is important that all 3 copies are always up to date with the original file, which is why automated backup routines are best. Instructions on how to create automated backup routines using Windows Task Scheduler can be found here.

 FAQ_FDM_Abb_00z_Backup_Rule 

Fig. 3: 3-2-1 Backup Rule

If you are working with personal data or other legally sensitive data, keep in mind that at least backing up to a decentralized storage location involves backing up to a tape that you no longer have any control over. For example, if you back up your data to the JLUbox, there will also be backups made at the IT Service Centre (HRZ). It is then difficult for you to comply with a possible request for deletion of the data.  So please encrypt such legally sensitive data before storing it at a decentralized storage location.  To do this, you can either create a zip folder that you password-protect, or you can use the VeraCrypt or Rohos MiniDrive tools. (Which restrictions by data protection laws do I have to consider?). 

 

Back to top

 

 

Where can I archive my data on a long-term basis?

 

According to good scientific practice, research data should be stored for a minimum of 10 years. In order to do that, there are several discipline-specific and generalist repositories. (How can I find a suitable repository?)

Keep in mind that uploading your data into a repository is not the same as a publishing it. For example, you can define a period of time during which a data package is not yet accessible, but the metadata is already visible. Such embargo periods can be extended by a curator. For further information on ‘embargos’ see: Embargo (Ger). If you decide to publish your data, data access and editing rights can also be regulated in contracts or licenses. (Can I control the use of my data? / Which license should I choose?)

Always note the respective requirements of research funders and publishers and data protection regulations. (Who can decide on whether to share or to publish data? / Which restrictions by data protection laws do I have to consider?)

 

Back to top

 

 

Publishing and Sharing Research Data

Why should I publish my data?

 

There are personal as well as scientific benefits to publishing your data. Firstly, published data is citable as independent scientific work, which increases the visibility of your own research. As studies have shown, publications will be quoted more often if the underlying data has been published (see Piwowar / Vision 2013).

Secondly, data sharing enables you to re-use already existing data. Therefore, new types of research questions can be investigated, without having to duplicate work or adding unnecessary costs.

 

Back to top

 

 

Is there any reason against publishing?

 

Moreover, your data could be confidential, personal data that can only be published anonymizedly or with consent of the persons affected. (Which restrictions by data protection laws do I have to consider?)

If you decide to publish with a publisher, make sure to choose the publisher carefully and to not fall for predatory publishing. This short presentation by Werner Dees (Ger) will give you a brief overview on how to recognize predatory publishers.

 

Back to top

 

 

Which restrictions by data protection laws do I have to consider?

 

If you want to process personal data, you usually need the consent of the persons affected. The aim of your research has to be clearly defined and the persons affected have to be able to estimate the consequences.

Moreover, research data such as company data can contain confidential information (protection of undisclosed know-how and trade secrets). Additionally, non-disclosure agreements might prohibit a data publication.

 

Back to top

 

 

Who can decide on whether to share or publish data?

 

 

Back to top

 

 

Do I own the copyright to my data?

 

Research objects (some research data, too) can be protected by the copyright act as creative works. This includes literary works, computer programs, musical works, pantomimes (including choreographic works), works of the fine arts (including architecture and applied arts), photographic works, cinematographic works, and scientific and technical representations.

Usually, research data lacks the necessary threshold of originality, which is why they are not creative works. Nevertheless, there are some exceptions, such as data protected by ancillary copyright, e.g. photographs, moving pictures or sound carriers.

But often research data is protected by copyright as part of a databank or by the ancillary copyright for databanks.

Research data not protected by property rights can normally be used by anyone for any purpose without permission or obligation to pay for it.

 

Back to top

 

 

Can I control the use of my data?

 

If you own the copyright or ancillary copyright to your data, you can contractually stipulate several aspects of using your data, such as how to use it, who uses it, the period and purpose of use etc. Since individual case regulations by contract are very complex, there are several solutions for standardizing regulations on rights of use. E.g. Leibniz Institute for Psychology Information (ZPID) offers standardized contracts for using data that has been gained in psychological research. Another example are GESIS user contracts (access restrictions for particularly sensitive social science data). If your data is not to be subject to any specific access or use restrictions, it is advisable to use standardized licenses such as Creative Commons or Open Data Commons. (Which license should I choose?)

 

Back to top

 

 

Which license should I choose?

 

Publishing data under a specific license allows you to specify how your data can be used in detail. This creates legal certainty for both data provider and data user. Therefore, in case of no restrictions, it is important to document this waiver clearly.

Although data are usually not subject to copyright law, you should nevertheless treat them as potentially worth protecting. Therefore, there are various license models. The most popular one is Creative Commons . CC-licenses are independent of the licensed content and cover copyright, ancillary copyright and - if existent - sui generis database rights. 

The Open Knowledge Foundation’s “Open Data Commons“ license package has been specifically created for publishing data. Apart from the unconditional license (Open Data Commons Public Domain Dedication and License (PDDL)), it offers three other packages:

Regardless of its legal liability, the CC-BY license best fulfills the idea of Open Access and Open Science, whereas “Share-Alike“ can lead to compatibility issues with other licenses, and the prohibition of processing can lead to restrictions on use (e.g. data mining, issues regarding long term storage). Prohibiting commercial use will complicate using commercial databases, which reduces the potential visibility of your research.

Whichever license you choose – choose wisely. An in depth analysis on legal issues can be found here: Andreas Wiebe & Lucie Guibault (2013). Frank Waldschmidt-Dietz’ presentation (Ger) and video on Open Educational Resources (OER) will give you further information on Creative Common licenses, licensing in general and the benefits of Open Access licenses for education.

Regardless of the terms of use, of course you have to meet the rules of good scientific practice, which require citing your sources.

 

Back to top

 

 

How can I publish my data?

 

 

Back to top

 

 

How can I find a suitable repository?

 

This set of questions can help you decide which repository to choose:

 In order to find a suitable repository, you can use the Registry of Research Data Repositories (re3data.org). Re3data is a web-based directory in which repositories are made accessible. You can simply search for a suitable repository. Numerous filters allow you to narrow down your search, e.g. by subject area or data type.

 

Back to top

 

 

What do I have to consider when uploading data into a repository?

 

  • File format:

It is important to use the right file format. Some repositories have strict requirements on which format to use, while others only make recommendations or are open to all formats. Therefore, you should decide on the right format at an early stage of your research process (How do I create a good data management plan?). General information and specific links on file formats can be found here: Which file formats should I choose?

 

  • Metadata:

Metadata has to be documented precisely in order to make your data traceable and usable. (What are Metadata, Metadata Schemes, Controlled Vocabularies and Documentations)

 

  • Publication:

Uploading data into a repository does not equal instant publication. There might be reasons for an embargo period or a partial publication only. Embargos are especially common in business-related academic fields. Thus, you have to consider possible reasons accounting against an immediate publication (Is there any reason against publishing?).

 

  • Conditions:

Contemplate the conditions you want to publish your data under. There are different types of licensing models to choose from (Which license should I choose?).

 

Back to top

 

 

What are Metadata, Metadata Schemes and Documentations?

 

Metadata is data of other data or resources – in this case, of research data. It describes research data in order to

  • optimize data findability,
  • ensure that the data can be understood by subsequent users,
  • enable linking similar research data using the same standardized metadata schema.

The most basic information includes Title, Author/Main Researcher, Institution, Persistent Identifier, Location & Time, Topic, Rights, File Name, Format etc.

Metadata schemata (i.e. metadata standards) are compilations of categories describing data. There are interdisciplinary / independent as well as discipline-specific / dependent standards. Metadata schemata are to ensure that every researcher uses the same vocabulary to describe their data, and thus to guarantee the interoperability and comparability of data sets.

The table below will give you an overview on exemplary metadata standards for several disciplines. If your specific discipline is not listed, you can have a look at the Digital Curation Center's list on disciplinary metadata.

Discipline

Metadata Standard

Interdisciplinary standards

DataCite SchemaDublinCoreMARC21 (Ger),  RADAR

Humanities

EADTEI P5TEI Lex-0

Earth sciences

AgMESCSDGM, ISO 19115

Climate science

CF Conventions

Arts & cultural studies

CDWAMIDAS-Heritage

Natural sciences

CIFCSMDDarwin CoreEMLICAT Schema

X-ray, neutron and muon research

NeXus

Social and economic sciences

DDI

Before starting to document your data, you should search for existing metadata schemata. This will improve the interoperability of the research data to be created with already existing data, and it will save you the effort of developing your own metadata schema. If there is no existing metadata standard to provide the description categories necessary for your research, it is still worth using a renowned, already existing subject-specific standard as a basis to build on, e.g. by including additional categories and informing those responsible for the standard so that they can extend the schema. Metadata standards are living entities that can be adapted and enriched with new categories according to the needs of researchers. Be careful not to make any changes to existing elements or attributes, so as not to jeopardize interoperability.

It is possible, usually even necessary, to use several metadata schemata. You should always use at least one subject-independent metadata schema (preferably Dublin Core) to describe your data, because this way you can cover the general description categories as mentioned in the first paragraph of this section. Fig. 4 depicts a metadata section using Dublin Core Standard. Subject-specific metadata standards, on the other hand, allow you to structure your data using descriptive strategies, which are more based on content and may differ from discipline to discipline.

FAQ_FDM_Abb_03_Metadaten_Aussehen

Fig. 4: Example of metadata in the Dublin Core Standard

Consequently, metadata determine which information is to be presented. To get the best possible results when searching for and using data, this information has to be presented using a uniform vocabulary. In order to do so, there are several discipline-specific and interdisciplinary controlled vocabularies, like thesauri, classifications and authority controls.

Types of controlled vocabularies

Names

Unambiguous identification of persons, items or places

The Integrated Authority File  (GND)

GeoNames /

International Standard Name Identifier (ISNI, ISO 27729) /

Open Researcher and Contributor ID (ORCID)

General, interdisciplinary classification systems

Dewey Decimal Classification  (DDC) /

Library of Congress Classification (LCC)

Disciplinary classification systems

Social Sciences (Ger)

Mathematics Subject Classification (MSC)

Disciplinary vocabularies

Agricultural Information Management Standard (AGROVOC)

Getty Vocabularies

STW Thesaurus for Economics

Thesaurus for the Social Sciences (TheSoz)

Thesaurus Technology and Management (Ger) (TEMA)

 

You can find an overview on different systems with Basel Register of Thesauri, Ontologies & Classifications (BARTOC) and Taxonomy Warehouse.

A documentation is usually more than a mere metadata description. It is a more profound (subject-specific) indexing, in the context of which e.g. variables, instruments, methods etc. are described in detail and thus the origin of the data becomes apparent. Often such a description is essential to understand, verify and use the data.

The JISC Guide on Metadata and the University of Edinburgh’s interactive Mantra Course on Documentation, Metadata, Citation will offer you further introductory information on metadata.

 

Back to top

 

 

What are Persistent Identifiers?

 

Reputable publication platforms, such as Zenodo and Figshare, automatically reserve a DOI for your data set, if you publish your data. In case you publish in a different, discipline-specific repository, you should make sure that this repository also offers DOIs or another kind of PID. (How can I publish my data? / How can I find a suitable data repository?)

 

Back to top

 

 

Finding and Using Research Data

Where can I find research data?

 

Not only due to requirements and recommendations by funders, publishers and institutions to make data accessible, research data is increasingly available for reuse. To find suitable research data for your own research, you should first have a look at relevant offers originating from your own discipline. There can be institutional or specialized repositories as well as Data Journals (List). Repositories assorted by discipline can be found here: re3data.org

Furthermore, you can do research using generic search engines. However, to their disadvantage, they often cannot depict the detailed metadata schemata of their sources adequately. Moreover, the respective metadata differ greatly to what makes them identifiable – single data, data sets or data collections.

When reusing data, the respective rights (licenses, license agreements) are binding. They can i.a. determine who can use the data, for which purpose and for which period of time.

If you cannot or do not want to use already existing research data, of course you can collect data using reputable research practices. For example Dr. Samuel de Haas’ and Jan Thomas Schäfer’s Coffee Lecture (Ger) will give you information on how to collect data using web scraping and text mining and on how to handle big data.

 

Back to top

 

 

How do I cite research data?


You have to cite your data correctly to meet good scientific practice and make research data usable and reusable.

Author(s), Year, Dataset Title, Data Repository or Archive, Version, Global Persistent Identifier.

Further, possibly useful, optional additions are Edition, URI, Resource Type, Publisher, Unique Numeric Fingerprint (UNF) and Location (see Alex Ball & Monica Duke 2015: How to Cite Datasets and Link to Publications).

 

Back to top