for Employees, Students

Joint NHR Data Management Training

Data ManagementEmployeesStudents Online

Event content

This is a joint NHR Training held by 5 different NHR centers. It consists of different sessions related to a diverse set of challenges arising when doing proper data management within HPC workloads. Although the different sessions will build up on each other, they can still be taken individually. However, to efficiently participate in selected sessions, participants are recommended to have a reasonable familiarity with previously taught concepts. The entire course will take place online and will span over a period of two days.

This course will start with a basic introduction to data management on HPC system and their specific challenges. This includes the concept of storage tiering, and how HPC workflows can be designed to optimally utilize them. Important permission concepts to efficiently organize larger consortia and isolate different users within their own, well-defined space along with further techniques for data sharing and data cataloging are also explained. All of these concepts are supplemented by hands-on sessions.

Then, further details on metadata and their extraction are given, followed by the introduction of dedicated data management systems, with a specific focus on Coscine.

The second day starts with a deep dive into the Research Data Management Organizer (RDMO), a well-established tool for creating Data Management Plans (DMP).

The course concludes with a detailed and holistic overview of storage systems. It starts with explanations on the meaning of I/O, inodes, and files. Differences between local file systems (like ext4) and parallel filesystems (like BeeGFS or Lustre) and their implications are stated. Then different access patterns for parallel I/O are introduced, and tools like Darshan and Score-P to for analysis are demonstrated. This session concludes with a summary of I/O best practices.

Everyone can join this course for free.

Learning goal

  • Understand the basic functionality of a storage system
  • Applying tiered storage systems to optimize the I/O of an HPC workflow
  • Classify metadata and use it efficiently to organize data sets
  • Design data management plans based on the Research Data Management Organizer
  • Assess the efficiency of different storage access patterns


Information about the event

Max. participants

50

Requirements

  • Linux-Grundkenntnisse
  • HPC-Account und Nutzungserfahrungen

Speakers
Trainer picture
Hendrik Nolte
Trainer picture
Sebastian Oeste

Details

Number
10005
Format
Block Course
Language
English

Location

Online (BigBlueButton)


Contact

GWDG Academy
support@gwdg.de

Registration

Registration is not possible via this portal.
Link to the provider

Dates

This event includes following dates:

Date Location
1. 05.11.2024 08:05 - 17:05 Online (BigBlueButton)
2. 06.11.2024 09:00 - 17:00 Online (BigBlueButton)