Personal tools

Information zum Seitenaufbau und Sprungmarken fuer Screenreader-Benutzer: Ganz oben links auf jeder Seite befindet sich das Logo der JLU, verlinkt mit der Startseite. Neben dem Logo kann sich rechts daneben das Bannerbild anschließen. Rechts daneben kann sich ein weiteres Bild/Schriftzug befinden. Es folgt die Suche. Unterhalb dieser oberen Leiste schliesst sich die Hauptnavigation an. Unterhalb der Hauptnavigation befindet sich der Inhaltsbereich. Die Feinnavigation findet sich - sofern vorhanden - in der linken Spalte. In der rechten Spalte finden Sie ueblicherweise Kontaktdaten. Als Abschluss der Seite findet sich die Brotkrumennavigation und im Fussbereich Links zu Barrierefreiheit, Impressum, Hilfe und das Login fuer Redakteure. Barrierefreiheit JLU - Logo, Link zur Startseite der JLU-Gießen Direkt zur Navigation vertikale linke Navigationsleiste vor Sie sind hier Direkt zum Inhalt vor rechter Kolumne mit zusaetzlichen Informationen vor Suche vor Fußbereich mit Impressum

Document Actions






What is research data?


The term “research data” generally refers to all kinds of (digital) data that represent the result of scientific work or that serve as a basis for such work. Research data is generated using a wide variety of methods, such as measurements, source research or surveys. Therefore, it is always subject- and project-specific. For additional information on defining research data, see here.


Back to top



What is research data management?


Research data management includes measures that create and preserve sustainable data. Thus, it is relevant throughout the entire data lifecycle (fig. 1).


Fig. 1: Research data lifecycle

Ideally, such planning commences at the beginning of a research project and is regularly updated. Research data management does not only refer to data storage and archiving. It also enables you to find, access and comprehend data, and – as a result – use your data well into the future. For further information on research data management see here.

Back to top



Why should research data management be important to me?


There are several good reasons for systematically tackling research data management, which likewise stress the importance of good scientific practice:


In the figure below, you can see different possible aims of research data management:

Fig. 2: Aims of research data management


Back to top



What do I have to keep in mind when planning my project?


    1. Appoint the people responsible for setting up and controlling research data management at your institution.
    2. Check whether there are institute- and discipline-specific or general requirements and recommendations on how to handle research data.
    3. Always check which requirements on archiving and publishing research data you have to meet as early as possible. (Which specific demands do sponsors, publishers, and universities have?)
    4. Check which research data is collected during your research project.
    5. Think about which research data is to be published and provided for reuse.
    6. Think about how to store and archive your research data. (Data storage and digital preservation)
    7. What options do you have for storing and archiving research data? Could you use a generalist or discipline-specific repository? (How can I find a suitable repository?)
    8. Clear up all legal questions on storing and sharing research data. You might have to consider data protection and copyright.
    9. Create a data management plan to document your decisions. It will also serve as validation of your progress and project implementation. (How do I create a good data management plan?)
    10. Update your data management plan regularly during the course of your research.


Back to top



How do I create a good data management plan?


Examples and Templates:


  • CLARIN-D (Common Language Resources and Technology Infrastructure)
  • KomFor (The centre of competence for research data in the earth and environmental science)


Exemplary Data Management Plans:

 There is also Humboldt-University’s video tutorial (Ger) to give you a brief introduction to DMPs.


Back to top



Which specific demands do sponsors, publishers, and universities have?


  • Deutsche Forschungsgemeinschaft (DFG) (= German Research Foundation)

In its “Proposals for Safeguarding Good Scientific Practice“ (Eng + Ger) the DFG states that “Primary data as the basis for publications shall be securely stored for ten years in a durable form in the institution of their origin“. It goes on with:

“In the interest of transparency and to enable research to be referred to and reused by others, whenever possible researchers make the research data and principal materials on which a publication is based available in recognised archives and repositories in accordance with the FAIR principles (Findable, Accessible, Interoperable, Reusable).”

In 2015, the DFG Guidelines on the Handling of Research Data were passed. They state further recommendations on providing data and planning data driven projects, like:

„Applicants should consider during the planning stage whether and how much of the research data resulting from a project could be relevant for other research contexts and how this data can be made available to other researchers for reuse. Applicants should therefore detail in the proposal what research data will be generated or evaluated during a scientific research project. Concepts and considerations appropriate to the specific discipline for quality assurance and the handling and long-term archiving of research data should be taken as a basis.“


  • European Commission (EC)

The EC’s “Open Research Data Pilot” is part of the EU research and innovation program Horizon 2020. Its aim is to improve the access and reusability of research data originating in Horizon 2020 projects. The Open Research Data Pilot’s basic principle is the motto “as open as possible, as restricted as necessary”. (EC Guidelines on FAIR Data Management in Horizon 2020, p. 4)

During 2014-2016, only selected aspects of Horizon 2020 have been included into the project, but since the revised 2017 version of the program was released, all aspects are covered now.

There are three main obligations:

You have to create a data management plan according to the template. It has to be handed in within the first six months and updated according to relevant adjustments (or at least at interim and final evaluations).

Data storage: Your research data has to be stored in an institutional, project-specific or discipline-specific data repository as early as possible (‘underlying data’) or according to the data management plan (‘other data’).

Publication: If possible, your data should be published using an open license (preferably CC-BY or CC-O) without use restrictions. The publication has to include the necessary contextual information and tools.

However, if there are legitimate reasons, a partial or complete waiver of these requirements is possible

You can find further information by looking at the following:

Guidelines on FAIR Data Management in Horizon 2020

Guidelines on Open Access to Scientific Publications and Research Data in Horizon 2020

Horizon 2020 Online Manual: Open Access and Data Management

Horizon 2020: Annotated Model Grant Agreement (AGA)

OpenAIRE Research Data Management Briefing Paper


  • Publishers

Public Library of Science (PLOS): Data Availability Policy / Materials and Software Sharing Policy

Nature Publishing Group: Availability of Data, Material and Methods Policy

Science: Data and Materials Availability Policy and Preparing Your Supplementary Materials

BioMed Central: Availability of Supporting Data

Elsevier:  Text and Data Mining; Research Data Policy


  • Justus-Liebig-University Giessen

 Back to top



Data Storage and Digital Preservation

How can I structure my data?


At the different stages of modifying your data (e.g. raw data, cleaned data, data ready for analysis) you should create write-protected versions. You should only use copies of those original files for further processes. 

Naming conventions can vary widely depending on your discipline and your kind of data. However, names should always reflect the kind of data (raw data, cleaned data, analytical data) and the data format (work file, result file etc.).

The file name should always include the date of storage (follow the YYYYMMDD format), and appear at the beginning or end of the file name to ease sorting. Do not use special characters, umlauts or spaces – use underscores instead. The names should always be uniform, clear and meaningful.  

Examples for naming data files (see also: HU Berlin: Structure files):

  • \ [sediment] \ [sample] \ [instrument] \ [YYYYMMDD].dat
  • \ [experiment] \ [reagent]\[instrument]\ [YYYYMMDD].csv
  • \ [experiment] \ [experiment_set-up]\ [test_subject]\ [YYYYMMDD].sav
  • \ [observation] \ [location] \ [YYYYMMDD].mp4
  • \ [interview_partner] \ [interviewer] \ [YYYYMMDD].mp3

You can specify the file version in the file name in order to easily identify changes to your data. A well-known concept of versioning based on the DDI (Data Documentation Initiative) standard is: Major.Minor.Revision.

Starting from version “1.0.0” the following is changed:

1. the first position, if cases, variables, waves or samples are added or deleted

2. the second position, if data are corrected in a way that affects the analysis of your data

3. the third position, if there are minor revisions only that are of no consequence to interpreting your data

Versioning can also be supported by using appropriate software (Free Version Control Software, e.g. Git).


Back to top



 Which file formats should I choose?


Data FormatRecommendedLess SuitableUnsuitable
Audio, Sound *.flac / *.wav *.mp3
Computer-aided Design (CAD) *.dwg / *.dxf / *.x3d / *.x3db / *.x3dv
Databases *.sql / *.xml *.accdb *.mdb
Raster Graphics & Images *.dng / *.jp2 (lossless compression) / *.jpg2 (lossless compression) / *.png / *.tif (uncompressed) *.bmp / *.gif / *.jp2 (lossy compression) / *.jpeg / *.jpg / *.jpg2 (lossy compression) / *.tif (compressed) *.psd
Raw Data and Workspace *.cdf (NetCDF) / *.h5 / *.hdf5 / *.he5 / *.mat (from version 7.3) / *.nc (NetCDF) *.mat (binary) / *.rdata
Spreadsheets and Tables *.csv / *.tsv / *.tab *.odc / *.odf / *.odg / *.odm / *.odt / *.xlsx *.xls / *.xlsb
Statistical Data *.por *.sav (IBM®SPSS)
Text *.txt / *.pdf (PDF/A) / *.rtf / *.tex / *.xml *.docx / *.odf / *.pdf .doc
Vector Graphic

*.svg / *.svgz

*.ait / *.cdr / *.eps / *.indd / *.psd
Video1 *.mkv *.avi / *.mp4 / *.mpeg / *.mpg

*.mov / *.wmv

  1. Besides the file format (or container format), the codec used and the compression type also play an important role.

Back to top



Where do I store my data during my work process?


It is of the utmost importance to back up your data regularly in case of technical or human errors. It is the responsibility of the researcher to secure data. The Hochschulrechenzentrum (HRZ) (= university computer center) offers several possibilities for data storage:




In case you need more data storage for lager research projects, please contact the HRZ () at an early stage.


Back to top



What should I consider when backing up my data?


Good research data management is also characterized by the fact that you, as a researcher, are prepared as best as possible for a possible data loss. Therefore, you should already create a backup plan at the beginning of your research project, which should ideally also include regular backup routines. The following questions should be answered in a backup plan:

  • Which backup tool do you use?
  • Which data should be backed up?
  • Where do you backup your data?
  • How often do you backup your data?

You should also follow the so-called 3-2-1 backup rule (s. Fig. 3). This rule states that you should always keep at least 3 copies of your data on 2 different data devices (e.g. a USB stick and an external hard disk) as well as 1 copy on a decentralized storage location (e.g. the JLUbox or winfile). It is important that all 3 copies are always up to date with the original file, which is why automated backup routines are best. Instructions on how to create automated backup routines using Windows Task Scheduler can be found here.


Fig. 3: 3-2-1 Backup Rule

If you are working with personal data or other legally sensitive data, keep in mind that at least backing up to a decentralized storage location involves backing up to a tape that you no longer have any control over. For example, if you back up your data to the JLUbox, there will also be backups made at the IT Service Centre (HRZ). It is then difficult for you to comply with a possible request for deletion of the data.  So please encrypt such legally sensitive data before storing it at a decentralized storage location.  To do this, you can either create a zip folder that you password-protect, or you can use the VeraCrypt or Rohos MiniDrive tools. (Which restrictions by data protection laws do I have to consider?). 


Back to top



Where can I archive my data on a long-term basis?


According to good scientific practice, research data should be stored for a minimum of 10 years. In order to do that, there are several discipline-specific and generalist repositories. (How can I find a suitable repository?)

Keep in mind that uploading your data into a repository is not the same as a publishing it. For example, you can define a period of time during which a data package is not yet accessible, but the metadata is already visible. Such embargo periods can be extended by a curator. For further information on ‘embargos’ see: Embargo (Ger). If you decide to publish your data, data access and editing rights can also be regulated in contracts or licenses. (Can I control the use of my data? / Which license should I choose?)

Always note the respective requirements of research funders and publishers and data protection regulations. (Who can decide on whether to share or to publish data? / Which restrictions by data protection laws do I have to consider?)


Back to top



Publishing and Sharing Research Data

Why should I publish my data?


There are personal as well as scientific benefits to publishing your data. Firstly, published data is citable as independent scientific work, which increases the visibility of your own research. As studies have shown, publications will be quoted more often if the underlying data has been published (see Piwowar / Vision 2013).

Secondly, data sharing enables you to re-use already existing data. Therefore, new types of research questions can be investigated, without having to duplicate work or adding unnecessary costs.


Back to top



Is there any reason against publishing?


Moreover, your data could be confidential, personal data that can only be published anonymizedly or with consent of the persons affected. (Which restrictions by data protection laws do I have to consider?)

If you decide to publish with a publisher, make sure to choose the publisher carefully and to not fall for predatory publishing. This short presentation by Werner Dees (Ger) will give you a brief overview on how to recognize predatory publishers.


Back to top



Which restrictions by data protection laws do I have to consider?


If you want to process personal data, you usually need the consent of the persons affected. The aim of your research has to be clearly defined and the persons affected have to be able to estimate the consequences.

Moreover, research data such as company data can contain confidential information (protection of undisclosed know-how and trade secrets). Additionally, non-disclosure agreements might prohibit a data publication.


Back to top



Who can decide on whether to share or publish data?



Back to top



Do I own the copyright to my data?


Research objects (some research data, too) can be protected by the copyright act as creative works. This includes literary works, computer programs, musical works, pantomimes (including choreographic works), works of the fine arts (including architecture and applied arts), photographic works, cinematographic works, and scientific and technical representations.

Usually, research data lacks the necessary threshold of originality, which is why they are not creative works. Nevertheless, there are some exceptions, such as data protected by ancillary copyright, e.g. photographs, moving pictures or sound carriers.

But often research data is protected by copyright as part of a databank or by the ancillary copyright for databanks.

Research data not protected by property rights can normally be used by anyone for any purpose without permission or obligation to pay for it.


Back to top



Can I control the use of my data?


If you own the copyright or ancillary copyright to your data, you can contractually stipulate several aspects of using your data, such as how to use it, who uses it, the period and purpose of use etc. Since individual case regulations by contract are very complex, there are several solutions for standardizing regulations on rights of use. E.g. Leibniz Institute for Psychology Information (ZPID) offers standardized contracts for using data that has been gained in psychological research. Another example are GESIS user contracts (access restrictions for particularly sensitive social science data). If your data is not to be subject to any specific access or use restrictions, it is advisable to use standardized licenses such as Creative Commons or Open Data Commons. (Which license should I choose?)


Back to top



Which license should I choose?


Publishing data under a specific license allows you to specify how your data can be used in detail. This creates legal certainty for both data provider and data user. Therefore, in case of no restrictions, it is important to document this waiver clearly.

Although data are usually not subject to copyright law, you should nevertheless treat them as potentially worth protecting. Therefore, there are various license models. The most popular one is Creative Commons . CC-licenses are independent of the licensed content and cover copyright, ancillary copyright and - if existent - sui generis database rights. 

The Open Knowledge Foundation’s “Open Data Commons“ license package has been specifically created for publishing data. Apart from the unconditional license (Open Data Commons Public Domain Dedication and License (PDDL)), it offers three other packages:

Regardless of its legal liability, the CC-BY license best fulfills the idea of Open Access and Open Science, whereas “Share-Alike“ can lead to compatibility issues with other licenses, and the prohibition of processing can lead to restrictions on use (e.g. data mining, issues regarding long term storage). Prohibiting commercial use will complicate using commercial databases, which reduces the potential visibility of your research.

Whichever license you choose – choose wisely. An in depth analysis on legal issues can be found here: Andreas Wiebe & Lucie Guibault (2013). Frank Waldschmidt-Dietz’ presentation (Ger) and video on Open Educational Resources (OER) will give you further information on Creative Common licenses, licensing in general and the benefits of Open Access licenses for education.

Regardless of the terms of use, of course you have to meet the rules of good scientific practice, which require citing your sources.


Back to top



How can I publish my data?



Back to top



How can I find a suitable repository?


This set of questions can help you decide which repository to choose:

 In order to find a suitable repository, you can use the Registry of Research Data Repositories ( Re3data is a web-based directory in which repositories are made accessible. You can simply search for a suitable repository. Numerous filters allow you to narrow down your search, e.g. by subject area or data type.


Back to top



What do I have to consider when uploading data into a repository?


  • File format:

It is important to use the right file format. Some repositories have strict requirements on which format to use, while others only make recommendations or are open to all formats. Therefore, you should decide on the right format at an early stage of your research process (How do I create a good data management plan?). General information and specific links on file formats can be found here: Which file formats should I choose?


  • Metadata:

Metadata has to be documented precisely in order to make your data traceable and usable. (What are Metadata, Metadata Schemes, Controlled Vocabularies and Documentations)


  • Publication:

Uploading data into a repository does not equal instant publication. There might be reasons for an embargo period or a partial publication only. Embargos are especially common in business-related academic fields. Thus, you have to consider possible reasons accounting against an immediate publication (Is there any reason against publishing?).


  • Conditions:

Contemplate the conditions you want to publish your data under. There are different types of licensing models to choose from (Which license should I choose?).


Back to top



What are Metadata, Metadata Schemes and Documentations?


Metadata is data of other data or resources – in this case, of research data. It describes research data in order to

  • optimize data findability,
  • ensure that the data can be understood by subsequent users,
  • enable linking similar research data using the same standardized metadata schema.

The most basic information includes Title, Author/Main Researcher, Institution, Persistent Identifier, Location & Time, Topic, Rights, File Name, Format etc.

Metadata schemata (i.e. metadata standards) are compilations of categories describing data. There are interdisciplinary / independent as well as discipline-specific / dependent standards. Metadata schemata are to ensure that every researcher uses the same vocabulary to describe their data, and thus to guarantee the interoperability and comparability of data sets.

The table below will give you an overview on exemplary metadata standards for several disciplines. If your specific discipline is not listed, you can have a look at the Digital Curation Center's list on disciplinary metadata.


Metadata Standard

Interdisciplinary standards

DataCite SchemaDublinCoreMARC21 (Ger),  RADAR



Earth sciences


Climate science

CF Conventions

Arts & cultural studies


Natural sciences


X-ray, neutron and muon research


Social and economic sciences


Before starting to document your data, you should search for existing metadata schemata. This will improve the interoperability of the research data to be created with already existing data, and it will save you the effort of developing your own metadata schema. If there is no existing metadata standard to provide the description categories necessary for your research, it is still worth using a renowned, already existing subject-specific standard as a basis to build on, e.g. by including additional categories and informing those responsible for the standard so that they can extend the schema. Metadata standards are living entities that can be adapted and enriched with new categories according to the needs of researchers. Be careful not to make any changes to existing elements or attributes, so as not to jeopardize interoperability.

It is possible, usually even necessary, to use several metadata schemata. You should always use at least one subject-independent metadata schema (preferably Dublin Core) to describe your data, because this way you can cover the general description categories as mentioned in the first paragraph of this section. Fig. 4 depicts a metadata section using Dublin Core Standard. Subject-specific metadata standards, on the other hand, allow you to structure your data using descriptive strategies, which are more based on content and may differ from discipline to discipline.


Fig. 4: Example of metadata in the Dublin Core Standard

Consequently, metadata determine which information is to be presented. To get the best possible results when searching for and using data, this information has to be presented using a uniform vocabulary. In order to do so, there are several discipline-specific and interdisciplinary controlled vocabularies, like thesauri, classifications and authority controls.

Types of controlled vocabularies


Unambiguous identification of persons, items or places

The Integrated Authority File  (GND)

GeoNames /

International Standard Name Identifier (ISNI, ISO 27729) /

Open Researcher and Contributor ID (ORCID)

General, interdisciplinary classification systems

Dewey Decimal Classification  (DDC) /

Library of Congress Classification (LCC)

Disciplinary classification systems

Social Sciences (Ger)

Mathematics Subject Classification (MSC)

Disciplinary vocabularies

Agricultural Information Management Standard (AGROVOC)

Getty Vocabularies

STW Thesaurus for Economics

Thesaurus for the Social Sciences (TheSoz)

Thesaurus Technology and Management (Ger) (TEMA)


You can find an overview on different systems with Basel Register of Thesauri, Ontologies & Classifications (BARTOC) and Taxonomy Warehouse.

A documentation is usually more than a mere metadata description. It is a more profound (subject-specific) indexing, in the context of which e.g. variables, instruments, methods etc. are described in detail and thus the origin of the data becomes apparent. Often such a description is essential to understand, verify and use the data.

The JISC Guide on Metadata and the University of Edinburgh’s interactive Mantra Course on Documentation, Metadata, Citation will offer you further introductory information on metadata.


Back to top



What are Persistent Identifiers?


Reputable publication platforms, such as Zenodo and Figshare, automatically reserve a DOI for your data set, if you publish your data. In case you publish in a different, discipline-specific repository, you should make sure that this repository also offers DOIs or another kind of PID. (How can I publish my data? / How can I find a suitable data repository?)


Back to top



Finding and Using Research Data

Where can I find research data?


Not only due to requirements and recommendations by funders, publishers and institutions to make data accessible, research data is increasingly available for reuse. To find suitable research data for your own research, you should first have a look at relevant offers originating from your own discipline. There can be institutional or specialized repositories as well as Data Journals (List). Repositories assorted by discipline can be found here:

Furthermore, you can do research using generic search engines. However, to their disadvantage, they often cannot depict the detailed metadata schemata of their sources adequately. Moreover, the respective metadata differ greatly to what makes them identifiable – single data, data sets or data collections.

When reusing data, the respective rights (licenses, license agreements) are binding. They can i.a. determine who can use the data, for which purpose and for which period of time.

If you cannot or do not want to use already existing research data, of course you can collect data using reputable research practices. For example Dr. Samuel de Haas’ and Jan Thomas Schäfer’s Coffee Lecture (Ger) will give you information on how to collect data using web scraping and text mining and on how to handle big data.


Back to top



How do I cite research data?

You have to cite your data correctly to meet good scientific practice and make research data usable and reusable.

Author(s), Year, Dataset Title, Data Repository or Archive, Version, Global Persistent Identifier.

Further, possibly useful, optional additions are Edition, URI, Resource Type, Publisher, Unique Numeric Fingerprint (UNF) and Location (see Alex Ball & Monica Duke 2015: How to Cite Datasets and Link to Publications).


Back to top