Analytics and Insights About Cultivating a Software Engineering Community at DLR

Software development increasingly became part of the daily work of many researchers in science and engineering. They are faced with software engineering challenges for which they are not trained. In 2005, the German Aerospace Center (DLR) started the “DLR Software Engineering Initiative” to support their researchers addressing these challenges. One of the initiative's core element is to setup and establish an active software engineering community within DLR. Improving the activities of the DLR software engineering initiative is an ongoing challenge. For this purpose, $a$ good understanding of the software engineering community within DLR is required. We present insights about the DLR software engineering community through an analysis of the participation at the annual software engineering knowledge exchange workshops. These workshops can be considered as the annual software engineering community event and offer therefore a good starting point to analyze the community. The results show that the analyzed active part of the community consists of a small, stable core group, some non-regular visitors, and one-time participants. Participants from nearly all DLR locations attend the workshops. Most participants of the workshop series originate from large research-oriented DLR sites. The workshop topic seems to influence the workshop attendance. The results indicate that we need to do the ongoing topic specialization carefully to cultivate the established core member group.


INTRODUCTION
Usage and development of software increasingly became part of the daily work of most researchers [1], [2].While research and the reproducibility of its results rely on software [3], [4], [5], the development of sustainable software is still not recognized as an important goal in many research organizations.Software is often seen as a tool which should fulfill its task.
In this context, researchers are increasingly faced with software engineering challenges for which they are not trained.Since 2005, the German Aerospace Center (DLR) started the DLR software engineering initiative [6] to support their researchers addressing these challenges.A core element of the initiative is the creation of an active software engineering community [7].
Improving the activities of the DLR software engineering initiative is an on-going challenge.For this purpose, a good understanding of the DLR software engineering community is required.We start this process by analyzing the annual software engineering knowledge exchange workshops.These workshops can be considered as the annual DLR software engineering community event and offer therefore a good starting point to analyze the community.
The remaining paper is structured as follows: • We characterize software development at DLR to illustrate the context and introduce the concept of the knowledge exchange workshops (Sect.2).
• We describe the specific research questions and explain our analysis approach (Sect.3).
• We present the results (Sect.4) and discuss the research questions in their context (Sect.5).
• We summarize the major findings and indicate future work directions (Sect.6).

SOFTWARE DEVELOPMENT AT DLR
DLR is a large research organization in Germany with over 8.000 employees.It conducts research in aeronautics, space, energy, transportation, and security.Overall, software development plays an increasing role in DLR's research activities, just like it does for most research organizations nowadays.Based on an internal survey from 2005, we know that within DLR more than 25% of the personnel costs are spent on this topic.
Overall, the range of developed software includes various areas such as simulation and modeling, flight control, signal and data processing, knowledge and data management, visualization, communication, and administration.The diversity is also reflected by the programming languages and frameworks in use.They include Python, R, Perl, C, C++, Fortran, IDL, Matlab, LabView, Ada, Java, .Net, and others.
Particularly, research software is still often developed using an ad-hoc, code-and-fix approach without documentation, source code version control, or issue tracking.This approach is useful for learning and experimenting but fails when aiming to develop sustainable2 software.

DLR Software Engineering Initiative
In 2005, the DLR software engineering initiative has been started to improve the quality and sustainability of research software at DLR on a broad level [6].The initiative is integrated into DLR's quality management policy via the DLR software engineering directive.The directive defines the overall software engineering policy and is mandatory for all DLR research institutes.Another core aspect of the initiative is the DLR software engineering network.It consists of representatives from the different DLR research institutes and forms DLR's central exchange forum about software engineering.
The software engineering initiative consists of the following supplementary activities which focus on direct support for researchers developing software as part of their job at DLR: 1) The DLR software engineering guidelines [8] have been published as part of the directive.The guidelines focus on supporting DLR researchers to find out about the status of their developed software and to improve it with better software development and documentation practice.In addition, essential software development tools are provided centrally to make it easy to implement the guidelines.3) Consulting is focused on researchers with limited background in software development.In this context, we provide a pool of experienced software engineers that participate in concrete projects.Their primary role is usually to build and configure the development environment, to propose a software development process, and to support researchers in creating high quality software.
4) The DLR SoftwareEngineering.Wiki forms the single point of access for software engineering related information at DLR and offers collaboration possibilities due to its nature.5) Building a self-reliant software engineering community is an important aspect of the DLR software engineering initiative [7].Therefore, annual knowledge exchange workshops have been introduced which focus on networking, face-toface knowledge exchange and discussions.They are described more detailed in the following sections.
The DLR software engineering network drives the direction of the different activities while taking into account the demands and opinions of the DLR research institutes.

Concept of Knowledge Exchange Workshops
Since 2013, knowledge exchange workshops have been established at DLR.These workshop series are interdisciplinary and focus on long-term exchange of knowledge and experiences on overlapping topics.The goal is to establish professional networks at DLR (e.g., for Open Source software [9]).
Every year a new workshop series is started.Topics are proposed by DLR employees and every employee has the chance to vote for a proposal.A selection committee-including the chair of the DLR Executive Board-makes the final decision based on these votes.Topics can be anything related to work of researchers within DLR, for example, data visualization or unmanned aircraft.The initial workshop is organized by the proposing employees with support of the central knowledge management department.The organization of potential follow-up workshops is performed community-driven.In 2018, eight workshops have been organized-one regularly selected workshop and seven follow-up workshops.Over the years, three additional workshop series have been established by committed employees although their proposed workshop topic has not been officially selected.The knowledge exchange workshop on software engineering is one of these workshop series.
The knowledge exchange workshop on software engineering started in 2014 and is one of the best visited workshop series.Each year the workshop focuses on a different software engineering related main topic and takes place at a different DLR location in Germany.These workshops are the annual DLR software engineering community event.Therefore, they offer a good starting point to analyze the community.
Each workshop takes one and a half days and is highly interactive to create an active network and a living community.Its program typically consists of the following elements: 1) External expert talks: We usually try to find and invite external experts, particularly, to support the introduction of the main workshop topic.Besides their knowledge, external experts help to get new views and ideas on the given topic.
2) Interactive sessions: Interactive sessions are one of the core elements of the workshop to support direct interactions among the participants.Typically, they are used during the first workshop day to involve the participants.We experiment a lot with the concrete format to improve on this crucial aspect constantly.
3) Social event: In the evening of the first workshop day, a social event is organized.It sets a positive environment to get together and supports direct interactions as well as networking between the participants.4) Participant talks: Participant talks are another characteristic workshop element to support knowledge transfer and to present new ideas.The talks are usually about the main workshop topic, but we encourage participants to submit talks about every software engineering related topic.There are two established talk formats.Classical talks either present a technical topic or focus on sharing experiences.Lightning talks are very focused and short talks to briefly show new ideas or directions.
The workshops are organized by members of the community.The organization committee is typically formed directly after the current workshop.It consists of persons who regularly acted as workshop organizers as well as volunteers who are interested to help and get experience in this regard.The committee is responsible for organizing all aspects of the next workshop including the selection of its main topic and location.

RESEARCH QUESTIONS AND APPROACH
Good insights into the DLR software engineering community are essential to steer the support activities of the DLR software engineering initiative (Sect.2).We want to start this process by analyzing the participants of the knowledge exchange workshop series on software engineering.The series represents the annual DLR software engineering community event.Therefore, their participants represent the active part of the DLR software engineering community.For this purpose, we analyze the attendance data of the workshop series regarding the following questions: 1) How stable is the represented part of the DLR software engineering community?2) What is the influence of the workshop main topic on the workshop attendance? 3) Where are the workshop participants coming from?These questions shall give us a better understanding of the current status of the community as well as a basis for designing future workshops of the series.

Analysis Approach
We analyze the participant attendance data of the five knowledge exchange workshops which took place between 2014 and 2018.Particularly, we want to consider the development of the following two groups of participants: 1) Core group: They are defined as participants who attended more than one workshop and did not skip more than one workshop in a row while still working at DLR.The core group includes researchers that continually attend the workshops, showing constant interest and therefore contribute to the DLR software engineering community.
2) One-time participants: They are defined as participants attending only one workshop.One-time participants are interested in the topic of software engineering but do not contribute to the DLR software engineering community by taking part in this workshop series.
The development of both groups shall give us a better understanding about the stability of the active part of the DLR software engineering community.Participants not belonging to any of these groups are still part of the analysis, but are not considered especially.
In addition, we consider the work location of the participants.Thereby we focus especially on the workshop locations and how they influence the attendance rate of the group of onetime participants.

Data Preparation
The data sources for this analysis consist of the registration and attendance lists of each workshop that have been collected during the organization of the workshops.However, the first workshop is an exception because these lists are not available.For this workshop, we reconstructed the attendance data on the basis of the organizational emails that have been sent to all registered participants.While we were able to reconstruct the data, there is some degree of uncertainty that the derived data set fully represents the actual participants.
The data sources have been prepared for analysis as follows: 1) First, a data set consisting of participant name and its work location has been created for each workshop.
2) Then, every data set has been checked for multiple entries and the work location has been double-checked for correctness.In addition, we checked participant names across all data sets for different spelling and name changes.This data forms the basis for our analysis.The participant names have been stripped from all data sets to anonymize the data.In addition, the work location and participant data have been separated to avoid allowing cross identification of the participants.The derived data sets have been further analyzed using Python Pandas3 [10] and Jupyter Notebook 4 .The data sets and the notebook containing the analysis details have been published separately [11].

RESULTS
Table 1 provides an overview about all workshops performed between 2014 and 2018.It includes information about the workshop main topic, the total number of participants, the workshop date, the workshop location, and the current number of employees working at the workshop location.
The kick-off workshop has been used to select the topics of interest for the DLR software engineering community.Those topics have been addressed in the following workshops by focusing on the top priority topics first.In context of the location, the biggest research-oriented DLR locations in every region of Germany have been intentionally selected first.Braunschweig represents the northern region, Cologne represents the western region, Oberpfaffenhofen represents the southern region, and Berlin represents the eastern region.
The last workshop in Bremen is an exception in both aspects.Its topic is not based on the kick-off workshop.In addition, significantly less DLR employees are working at that location.

General Attendance Development
In total, the workshop series had 265 participants.There are 189 unique participants [11] among them.In average the workshops have 53 participants.Over time the total number of participants decreased slightly.
The maximum participant number has been 60 for all workshops due to organizational constraints like the capacity of available meeting rooms.While the number of participants never reached this limit, the number of applications has been slightly above the limit in the first two years.However, these workshops did not reach the participant limit due to last minute cancellations which could not be compensated with persons from the waiting list.
Figure 1 shows the return rates of participants of successive workshops.For example, the return rate of 35% for the first workshop indicates that 35% of its participants also attended the second workshop.In addition, it indicates the total number of workshop participants as well as the total number of those participants who attended the direct followup workshop.Data for the fifth workshop is not shown because its follow-up workshop has not been run yet.
Figure 1.Return rates of participants of successive workshops for the first four workshops.The return rate of 35% for the first workshop indicates that 35% of its participants also attended the second workshop.
In average 16 workshop participants also attended the direct follow-up workshop.The returns rates are in a 10% margin.The development of the return rates over the workshops shows a first larger drop between the second and the third workshop (26%, 15 participants).Between the third and the fourth workshop it stabilizes on a similar absolute level (30%, 16 participants) but reaches its current minimum between the fourth and the fifth workshop (25%, 13 participants).
Overall, the first workshop has the highest return rate (35%, 20 participants) and the fourth workshop has the lowest (25%, 13 participants).

Work Location of the Participants
Overall, DLR is located at 20 different sites across Germany.In total, researchers from 16 of these DLR locations participated in the workshops [11].
Figure 2 shows the distribution of participants over the locations sorted descending by the number of participants.We only included locations with at least five participants for clarity.Oberpfaffenhofen (OP) and Braunschweig (BS) clearly represent the work location of most workshop participants.Overall, about 55% of all workshop participants work in one of these locations.Berlin (BA) and Cologne (KP) represent the work location of 35 and 22 participants.Finally, the locations Stuttgart (ST), Göttingen (GO), Bremen (HB), Bonn-Oberkassel (BO), and Neustrelitz (NZ) follow with not more than 13 participants each.In the following, we focus on the five locations that already hosted a workshop.We want to consider which workshops the participants from these locations attended.Figure 3 shows the distribution of the participants working at a location which hosted a workshop.
Overall, the number of participants of a location stands out if the workshop is located there.In this context, we want to consider how "strong" this peak is.For that purpose, we calculate a scaling factor which indicates how much the average participation differs from the peak [11].For example, the scaling factor of 2.5 for Braunschweig means that the number of participants from Braunschweig is about 2.5 times higher if the workshop is located in Braunschweig.
The scaling factors for Braunschweig, Oberpfaffenhofen, and Berlin are quite similar and in a range of 2.5 to 3.4.The scaling factor for Bremen is 9.3 and Cologne has the highest scaling factor with 13.4.
Figure 3 also indicates that the most participants of a specific workshop originates from the workshop location, except for Bremen.So far, only ten participants of the whole workshop series originated from there.

Participant Groups
Concerning the core group and the group of one-time participants, we find the following results: 1) Core group: Overall, we identified 30 participants as members of the core group, that is about 16% of the unique participants.10 participants did not fulfill the requirement to not have skipped more than one workshop in a row.Finally, the 29 one-time participants of the last workshop have the potential to become members of the core group.
2) One-time participants: Overall, we identified 149 participants as one-time participants, that is about 79% of the unique participants.The 29 potentially new members of the core groups are counted as one-time participants.
Figure 4 shows the development of the attendance rates of the core group members and one-time participants for all workshops.1) Core group: The attendance rate of the core group members increases steadily for the first four workshops.However, for the last workshop, the attendance rate drops about 10% in comparison to the preceding workshop.Overall, the fourth workshop has the highest core group attendance rate (44%, 23 participants) and the first workshop has the lowest (28%, 16 participants).The core group attendance rates are in a 16% margin.
2) One-time participants: The attendance rate of onetime participants slowly decreases initially and then starts constantly increasing since the third workshop.However, the absolute number of one-time participants after the third workshop is constantly at 29 participants while the total number of workshop participants decreases.Overall, the fifth workshop has the highest one-time participant attendance rate (62%, 29 participants) and the third workshop has the lowest (53%, 28 participants).Over all workshops the one-time participant attendance rates are in a 9% margin.
Finally, we consider the influence of the work location on the group of the one-time participants.Figure 5 shows the number of local one-time participants, local participants, and one-time participants for each workshop.Generally, most of the local participants belong to the group of local one-time participants.Particularly, the second and last workshop show this aspect.

DISCUSSION
In the following, we discuss the results in context of the research questions (Sect.3).

Stability of the DLR Software Engineering Community
The results show that the active and visible part of the software engineering community consists of a small, stable core group (30 persons), some non-regular visitors (10 persons), and one-time participants (149 persons).The workshops are in average attended by 36% community core members, 8% non-regular visitors, and 56% one-time visitors.
The members of the core group play an important role because they constantly contribute to the workshops and the community.In addition, they potentially function as multipliers for the topics software development and engineering at their work location.Their workshop attendance rate constantly increases over the workshops but indicates a larger decrease for the fifth workshop.We assume that this drop of the attendance rate is caused by the very specific topic "Embedded Systems" that has been less associated with the general software engineering theme of the series.This assumption is partially supported by the overall lowest participant return rate between the fourth and the fifth workshop (25%, 13 participants).In addition, at least one core member stated to the authors that the topic has been the main reason for not attending the last workshop.
One-time participants only attended one workshop.In context of the last workshop, this group represents the persons that are new to the workshop series.The attendance rate of the one-time participants and their overall number for each workshop are quite constant and do not indicate much difference over the workshops (i.e., the workshop series constantly attracts new participants).The slight increase of the one-time participants for the last three workshops seems to be more an effect of the decreasing total number of participants rather than indicating a substantial development.The concrete reasons for attending only one workshop are still unclear and require further investigation.We assume that the workshop location and topic play an important role.But further explanations, like unmatched expectations of first-time attendees and also minor aspects like appointment conflicts, may play a role as well.
The group of non-regular visitors is rather small.They might be only interested in selected specific topics.In addition, aspects like change of responsibilities, or even an employment change might have caused the observed non-regular visits.However, we need to further investigate this group to better understand the concrete causes.

Influence of the Workshop Main Topic
The main workshop topic seems to influence the participant attendance.We found some evidence in the results for such an influence for two cases.
In context of the first two workshops, we assume that many participants experienced the second workshop as a direct continuation of the first workshop.On other words, the special topic combination of both workshops-the kick-off via the first workshop and the direct in-depth focus on the most important topic in the second workshop-motivated many participants to attend the second as well.The short time span of only six months between these workshops might have even increased this effect.The high participation return rate between the first and the second workshop (35%, 20 participants) supports this assumption.
Another example of this influence is the last workshop on the topic "Embedded Systems."This workshop particularly demonstrates the trend of the ongoing topic specialization of the workshop series.Software engineering related questions with regard to embedded systems are from interest for DLR researchers working in the aeronautics and space domains.Although these domains are highly relevant for DLR, the decision for this topic apparently moved the workshop series away from its core software engineering themes.The results show a relatively high number of (potential) one-time participants for this workshop.Thus, the workshop attracted many new participants.We assume that many of them participated because of the workshop topic.This assumption is supported by the fact that the majority of these new participants does not originate from Bremen and had to travel to attend the workshop.Another indicator for the influence of the topic change is the relevant attendance rate drop of the identified core group.

Origin of the Workshop Participants
Overall, participants from nearly all DLR locations attended the workshop series.Thus, it can be said that software development is relevant at nearly all DLR locations.This aspect illustrates the increasing importance of software development at DLR.The number of workshop participants of a specific location stands out if the workshop takes place at this location.The effect occurred for all workshops.In addition, for large DLR locations, the local participants also represent the majority of all participants.Finally, the group of local participants consists mainly of one-time participants.Therefore, participants more likely attend workshops of the series that take place at their work location.The obvious reasons are that participants need to invest less time and do not require a travel budget.In our experience, this aspect lowers the barrier to participate a lot as travel budgets are usually a quite limited resource.
When considering how much the number of workshop participants of a specific location stands out if the workshop takes place there, we could find interesting peaks for Bremen and Cologne.Normally, the results indicate an about three times higher number of participants in comparison to the average number of participants.For Bremen and Cologne we found an about nine times and an about thirteen times higher participation.Thus, nearly all participants from these locations only attended the workshop if it took place at their work location.For Bremen we assume that the specific topic in combination with the matching research interest caused this peak.For Cologne, we assume that its administration focus influenced this finding but further investigations are required.

CONCLUSIONS AND FUTURE WORK
We analyzed the participation data of the DLR knowledge exchange workshop series on software engineering to get insights into the DLR software engineering community.The workshop series represents the annual DLR software engineering community event.Therefore, their participants represent the active part of the DLR software engineering community.The results show that this part of the community consists of a small, stable core group, some non-regular visitors, and as main group one-time participants.While we think that the analysis provides good initial hints on the community, a closer investigation of the motivations of being part of it is required.Particularly, we need to further analyze the unknown group of persons which is concerned with software development but does not attend these workshops.Basis for such an analysis can be the other activities of our software engineering initiative (Sect.2).
The main workshop topic seems to influence the workshop attendance.The results indicate a positive influence for the kick-off and the direct follow-up workshop, which focused on topics identified in the kick-off workshop.In addition, we found that we need to do the currently ongoing topic specialization carefully.While more specialized topics attract new participants, we might risk losing the interest of the core group members, and therefore destabilizing our community.
Participants from nearly all DLR locations attended the workshop series.Most participants of the workshop series originate from large research-oriented DLR locations.The results show that the number of workshop participants of a specific location stands out if the workshop takes place at this location.In this regard, our initial preference for large locations and the regular switch between DLR locations seems to be reasonable.However, we have to carefully consider the effects of the current switch to smaller workshop locations.
In future, we will take the results into account when conceptualizing new workshops of the series and try to further involve our community into this process.In a next step, we plan to do interviews and surveys among the previous participants to better understand observed effects.In addition, we want to complement this analysis by investigating the remaining unknown part of the DLR software engineering community.Finally, we hope that our results can help other organizations to form active internal communities.

2 )
Regular trainings are offered as part of DLR's education program.They focus on structured software development and practical application of typically used software development tools.Target group of the training are researchers working alone or in small development teams.Main goal of the training is to provide knowledge and hands-on experience about development of sustainable software.

Figure 2 .
Figure 2. Distribution of participants over the locations sorted descending by the number of participants.For clarity, only locations with at least five participants are shown.

Figure 3 .
Figure 3. Distribution of the participants working at a location which previously hosted a workshop.It shows which workshops participants from these locations attended.

Figure 4 .
Figure 4. Participant attendance rates of the two main groups for each workshop.The core group consists of participants attending at least two workshops and not dropping out while still working at DLR. One-time participants are participants attending only one workshop.

Figure 5 .
Figure 5. Number of local one-time participants, local participants, and one-time participants for each workshop.

Table 1 .
Overview of the knowledge exchange workshops series on software engineering since 2014.For each workshop topic, total number of participants, date (month, year), location, and the current number of employees working at the workshop location (as of September 2018) are listed.