Getting to Know NF-Core (Open Source Project Tours)

Open Source Project Tours are recorded sessions where an open source maintainer functions as a tour guide to their project, showing me the lay of the land and answering my questions about how their design reflects the relationships and needs of contributors and users. You can watch the full tour here. I've also created five clips of this interview featuring discussions about open source project lifecycles, a custom bot that rolls out syntax updates to over a hundred repos, the importance of reaching out to a project before making big contributions, learning from BDFL burnout, and stickers.

This transcript was automatically generated and is *not* a completely accurate transcription of the video. It does cover all discussion points and I've read through it while listening to the interview to make sure there are no major misrepresentations. If you need exact quotes, please get them from the video itself.

Intros & History of nf-core

Shauna: Hi there. I’m Shauna Gordon-McKeon and I'm here today for the very first ever open source project tour. So let me just do a little bit of an intro to what I'm envisioning these project tours to be, with the caveat that because this is the first one, I kind of have no idea. But the general goal of the project tours is to really explore how open source projects are structured specifically with an eye to how they facilitate collaboration or don't facilitate collaboration. Basically thinking about open source projects as sort of living communities of people who are interacting with code, with documentation, with other things. And just seeing how the structure of the projects as a whole meets the goals of the project or some in some cases doesn't.

So we're going to start with a bit of a discussion about today's project which is called NF core - the history and goals of the project. We'll talk a little bit about the different stakeholders involved and the specific spaces that the project lives in online or maybe offline, but we can only explore the ones that are online. Maybe someday we can do a project tour that involves like going to a conference or something. Then we'll go into the tour itself and then do a little bit of a wrap-up at the end. So that's the plan. We'll see how it goes.

Let's get started with an intro. Do you want to introduce yourself and NF core?

Maxime: Yes. Hello, my name is Maxime Garcia. I'm a bioinformatician. I've been working with Nexflow for almost 10 years now and I've been developing Nexflow pipelines. First at the Caroliska Institute here in Sweden and then at SEAR who is the company that was built on by the that was created by the developers that created Nexflow. Nexflow is a workflow manager that really simplifies the way that we write analysis pipelines and a whole community was built around that which is nf-core.

Shauna: What kind of pipelines? I know nothing about this project, which I think is maybe helpful for anyone watching this who also doesn't know anything about the project. Because I will ask very simple questions like “what kind of pipelines”, “what kind of workflows”, “is there a specific programming language or domain area this is arising out of”.

Maxime: At the beginning it was mainly bioinformatics, targeted toward genomics. So basically analyzing DNA RNA or anything that goes outside of a sequencer.

Shauna: Gotcha. And so was it coming out of academia or a private lab or…?

Maxime: The people behind Nextflow were organizing events for other people that were developing for Nextflow, so they could meet and exchange ideas. Phil Yul, who had been working with MulticQ and like other open source projects, met two guys in Barcelona like at one of these events: Andreas and Alex. And they decided “okay we are doing like some stuff that is pretty similar, we are working on the same kind of data, we have the same kind of pipeline. It would be much easier for everyone if we just join forces.”

Phil came back from Barcelona. I was sitting next to him at the time and he just asked me, "Oh, Maxime, by the way, there's this guy in Germany who wants to collaborate on some pipeline." That's how I started working on it and that’s how we started everying.

Shauna: So you were involved from pretty early on.

Maxime: I was involved before the community was even started but I was not part of the original like team that created the idea. So I'm not one of the founders of nf-core. I'm part of the building blocks, I think.

Shauna: Gotcha. And that was about 10 years ago?

Maxime: No, I started developing with NextFlow in 2016. NGI, which is the sequencing platform that Phil was sitting at, started developing pipelines a couple year priors to that and the idea for NF-core was started in late 2017. We really started uh to do some proper stuff in 2018. So seven years ago more or less.

Shauna: Let me try to repeat back to you and you can correct the parts that I've inevitably gotten wrong. So you founders of the project were sort of separately working on pipelines. They get together, they decide, oh, let's collaborate on pipelines. And you get looped in early on. Is Nextflow a separate project or like a different iteration of the project, or what's the distinction?

Maxime: So Nextflow is the programming language that is the workflow manager that we are all using to write our pipeline with. Basically it allow us to interface with executors that we want so that people can work locally on their laptop and develop their pipeline quite easily. It can work on your own HPC cluster at your university. It can work on the cloud like AWS, Microsoft Azure or Google Life Science and it's pretty simple. It allows you to work with Conda, Docker, Singularity or any other kind of virtual environment or containerization that you would want. So that way we are sure that whatever is doing is reproducible because for me that's one of the key point into doing science. If no one can reproduce it, what are we doing?

Shauna: Amazing. Yeah, I have a background in the sciences and in psychology which has had some real extreme issues with reproducibility. So I totally get the importance of that.

What about the name NF-core?

Maxime: NF-core comes from Nextflow - pretty simple there - and core is because it was started by core facilities sequencing platform. We were just doing basic analysis pipelines and all of the pipelines that we had at NGI at the time started as quality control. And that way we help kickstart for our users their analysis.

Shauna: What does NGI stand for?

Maxime: National Genomic Infrastructure. That's one of the multiple infrastructures within SC Lab Science for Life Laboratory which is the main support of research here in Sweden.

Shauna: It sounds like reproducibility is a big goal for the project. What are some other high level goals for the project?

Maxime: For me the main stuff is reproducibility. Second would be collaboration, because it all started by collaborating - not trying to do the same stuff at different locations but really trying to collaborate together. So for me reproducibility is the main thing, collaboration is the second.

Shauna: And who would you say are the different stakeholders in the project, whether that's users, contributors, funders, and also with a broad brush, because you could say people who benefit from bioinformatics research are in some sense a stakeholder.

Maxime: The people that are interested in this project are the people that are using this project. So either researchers or sequencing platform or even some clinicians are using some of the pipelines that we're writing.

We decided when we launched NF core to have all of our pipeline work in MIT license. That way if people want to uh commercialize just running the pipeline they could. So in such a sense private companies running NF core pipeline are sometimes participating into it, sometimes not, but we are fully aware of that and we embrace that fully. So I would say even some big pharma company interested in NF core and there are some collaborations ongoing.

Shauna: Do you have a sense of how the companies that are using NF core, how they're using it?

Maxime: Not much because it’s all on GitHub. So every pipeline that we have that is public, we know how many times that it's been forked or cloned. But apart from that, we don't know. I know for sure that there is a lot of stuff happening that we are not aware of. But I think it's good.

Shauna: That's something that's come up in some of the projects that I've participated in or maintained. We know how many downloads we have, we don't know anything about what anyone is doing with those downloads and it feels sort of like a black hole sometimes.

So you said that researchers, platforms, clinicians and some companies are using it. What of those stakeholders do you think is the largest group and then what stakeholders would you say you interact with the most or have the most direct impact on you?

Maxime: Personally, I interact mostly with either the people that develop the pipeline, so that will be the researcher or the maintainer of the pipeline. And then I do interact quite a lot with users and definitely I would say users are the main group of people.

Shauna: Are there any other stakeholders that we haven't talked about?

Maxime: I think we did cover that we have researchers that are using it, could be sequencing platforms or labs sometimes, even university depending on how it's layered in the different countries. Maybe some governmental agencies. But I think it's pretty broad. Usually we know when stuff isn’t working because people came back to us but we don't know when stuff is working because - well sometimes people come back to us saying “okay what you've done is great and we're using it” but sometimes we just don't know.

Shauna: Are there any tensions between stakeholders? Not necessarily in terms of like, active drama, but just places where the needs or desires of different groups of stakeholders might conflict.

Maxime: No, I haven't noticed that. Usually the the main issue I think we have in terms of conflict will be trying to come up with some decision at some point and getting people to try to collaborate together. We have like one major rule in NF core which is collaborate. Trying to get people to collaborate could be sometimes difficult especially in clinical settings where people have some specific logic that they need to to use in their pipeline locally and it's hard to collaborate with someone that has a different set of logic imposed onto them.

Shauna: It sounds like you’re describing a technical conflict. If some people need certain kinds of logic and other people needs other kinds of logic. I'm curious about whether this is something to discuss when we get to like looking at the repository.

Maxime: For me it's more about, let's say there’s a certain tool another infrastructure needs to to use, like another set of tools and trying to put both sets of tools together could be something that could be difficult.

Shauna: Okay. And it isn't always trivially easy to be like, oh, we'll design a thing so that people can decide which of the tools they use because like designing like an interface can be sometimes it's easy, but sometimes it's actually way more complicated.

Maxime: Yeah. I think it’s important to mention that all of the stuff that we are producing is on the command line. So some company can provide a graphical interface for this kind of thing but most of the work that we do is just on the command line side.

Shauna: Gotcha. One last question before we dive into the tour itself, is around spaces. What kind of spaces does the project exist in? Feel free to name things that we can't actually explore, like “oh we get together once a year at this particular conference” or “we have our own convening” or whatever. Any sort of space where people in the project are interacting.

Maxime: I will start with the in person meetings. So we have a steering committee of, I think it's five people at the moment, that helps us drive the direction where we are going. We have a group of admins, I think 12 to 15 people that really try to administrate the project, like day to day, and solve all of the issues and figure out like how to help people, how to decide stuff and things like that. We have three people that are our safety group that are there whenever there is an issue. They don't reply to us, they directly reply to the steering committee. So I think that's good because like I think it's important for a community to be aware that stuff can happen badly and if that happens, that it's handled well.

We have one group that is dedicated to outreach, our social network and helping organizing the events that we organize.

One group of mainteners that is slightly bigger that help maintain the world community and then everyone else is a contributor.

So we take advantage of the NextFlow community quite a lot. NextFlow exists on the community side mainly on Slack with the NextFlow Slack and on Discourse, which is hosted by Sakara on community.io which is basically a forum that is great for asking question. They also organize a summit once a year in Europe (this year it will be virtual) and once a year in the US and it will probably be in Boston again.

Maxime: Just prior to the summit, there’s a hackathon that’s organized, and that’s where people can meet up and work together on improving whatever we’re doing with NF-Core. Or we help people work on their own projects using the NF-Core framework or tooling that we’ve developed.

Shauna: How long is that? Is that like a day or a week?

Maxime: Usually, the summit is for about two to three days, and the hackathon would be around two days. In parallel with the hackathon, there’s also a team from Seqera that organizes some training as well.

Shauna: Got it. And this is like a stand-alone event?

Maxime: Yes, it’s a conference. In terms of participants, there should be around a hundred people for the hackathon, more or less, and something similar for the training. For the actual summit, there will be way more. This year, since it’s virtual, I’m not fully aware of the numbers. We also organize a distributed hackathon all over the world so people can contribute together, and we try to organize that nicely. Also, every 15 to 18 months, all of the admins try to get together so we can discuss, collaborate, and sync on things. We just met three weeks ago here in Stockholm.

Shauna: Cool. Do you find that having that focused time is helpful? Because it seems like, if you’re also putting on a big summit with a hackathon and trainings, having a separate time to really get heads down might help.

Maxime: Yes, definitely. That’s useful. We talk to each other quite a lot on NF-Core. We use Slack as our main communication network. We have multiple channels, including one for administrators, and we talk there frequently. Since COVID, we’ve learned how great it can be to work remotely and communicate through Zoom and other tools. But sometimes, nothing beats face-to-face discussions.

Shauna: Yeah, I was thinking about that. I’m always looking for patterns in what seems to work for projects. NF-Core seems like a relatively large open-source project. For example, Python used to do a lot of language work during PyCon, the flagship conference, but later decided to separate that work because the conference itself got too big. They realized they needed a space just to focus on technical discussions or working through issues and conflicts. When I say “conflicts,” I don’t mean drama—just situations where people have different needs and they have to work out how to get those needs met. So, I think there’s a kind of life cycle. Smaller projects often start by co-locating at another onference—like, “We’re all going to be at PyCon or some scientific meeting, so let’s join the hackathon or sprint or let’s all meet a day before.” But as projects grow, they start hosting their own things. But eventually that might grow big enough that “their own thing” isn’t really the best place to get stuff done anymore and instead it’s more a space for community as a whole. So it’s like the life-cycle for projects as they get bigger and bigger.

Maxime: Yeah, I think at the moment, we’re still very tied to Nextflow. The ties between Nextflow and NF-Core are pretty strong. Years ago, we used to handle training within NF-Core, but now training is managed by Seqera, so that’s one less thing for us. What’s fully independent now is the distributed hackathon, which is disconnected from the main Nextflow conference. We still have local hackathons tied to conferences with smaller focused groups, but the global distributed hackathon allows local groups to self-organize. So yes, I agree—it’s part of the life cycle of large projects.

Start Actual Tour

Shauna: Do you have a sense of what space would be best to start touring?

Maxime: I think the best place to start is just our website.

Shauna: All right, go ahead and start sharing.

Maxime: Okay, let me share my screen. So, the best place to begin is our website. That’s where all information is presented about NF-Core.

We’re a community that collaborates on open-source Nextflow components and pipelines. We help organizations, users, and developers. Everything we do is community-owned, and for us, that’s really important—reproducibility, collaboration, and openness, with openness being one of the biggest values.

We list our events on the site. We have bi-weekly “Bytesize” talks where we present anything that happens within the community - new pipelines, community updates. We also have a community blog for sharing news.

Upcoming events include hackathons, and some trainings organized in collaboration with the Nextflow community, since many of us are part of both.

We also host special interest groups where members collaborate on particular aspects of NF-Core—like animal genomics, which was the first special interest group.

Shauna: Cool. How many of these special interest groups are there? It looks like there are a few domain-specific ones.

Maxime: Yes, very domain-specific. We have groups for cancer, core facilities, education, immunology. We have a process for creating new ones on GitHub.

[Goes to github repo]

Our community is quite large—around 1,600 members and just under 200 repositories. One of the newest repos is for proposals, where anyone can suggest new pipelines, special interest groups, or community improvements.

Shauna: I saw about a dozen special interest groups. How many pipelines do you have?

Maxime: Currently, we have 84 released pipelines, 43 in development, and 12 archived. Pipelines are archived if they’ve been merged into another or if the technology is no longer available. Originally, we focused on bioinformatics and genomics, but we’ve branched out into other areas like proteomics, imaging, satellite imagery, and even economics.

Shauna: Very cool. I noticed you had a neuroimaging group—that’s actually how I learned to code back in 2008 or 2009 while working in a neuroimaging lab.

Maxime: We have a pipeline being developed for cell painting to help analyze imaging data. It’s quite interesting.

Shauna: Neat. It’s been about 15 years since I’ve done any neuroimaging, but it’s fascinating how much has changed in open-source tooling and scientific software since 2010.

Maxime: Oh, yes. The field evolves quite a lot and quite fast.

Shauna: Where are your RFCs?

Maxime: At the moment, we have 13 open RFCs. I’ve created a few since I’m quite involved in the community. The latest one that was merged introduced an advisory system cataloging regression, incompatibility, and security issues across pipelines.

Shauna: That’s great. It can be hard to pinpoint upstream problems, so having a dedicated area for those is really useful.

Maxime: Definitely. That RFC was proposed in June this year—shortly after we started working on RFCs in May—and it was accepted quickly because it filled a real need.

Shauna: Sometimes a good idea just clicks, and everyone agrees to do it right away.

Maxime: Yes, though other times it takes longer.

Shauna: Yeah, that’s true, and it’s not necessarily because it’s a worse idea, sometimes you just need to take longer discussing something. Can we go back to the proposals? There was one about “code owners for pipelines” that intrigued me.

Maxime: Sure. All of our pipeline code is hosted on GitHub under NF-Core.

We use Slack daily, GitHub for hosting code, and communicate via Mastodon, BlueSky, LinkedIn, and YouTube. For meetings, we used to use GatherTown but are transitioning to something else, possibly WorkAdventure.

Regarding the “code owner” RFC—it’s about assigning responsibility. For example, I’m a maintainer on a pipeline called “Sarek.” (At NF-Core, we name pipelines descriptively—like “RNAseq” for RNA sequencing. “Sarek” is an exception because it predates NF-Core.)

This RFC proposes that certain people (like me and my co-developer, Rick) be marked as responsible for reviewing pull requests. It’s mostly a way to formalize what some pipelines already do—ensuring maintainers can safeguard the pipeline’s direction. I’m not sure this RFC is really necessary because some pipelines do this already, but for me it was a way to formalize this idea.

Shauna: I guess I’m curious what kinds of problems having a designated code owner solves. For example if you have like 80 pipelines and 40 more in progress, that’s got to be a lot of different lead maintainers of those pipelines, and if something needs updating and those maintainers are not available, maybe there’s not another person who can do it. So maybe it makes sense to have code owners from a reliability perspective, and maybe require a certain amount of responsiveness from them? But that’s a little bit me projecting onto you and the project, based on my experiences, so I’m curious if that’s something you’ve thought about, or are there different goals.

Maxime: It’s more about safe-keeping. It’s about making sure that the people who are responsible for the pipeline can safeguard the pipeline. For me it’s more a way to make sure that nothing too extraordinary is happening in the pipeline without the main maintainer of the pipeline being aware of it. Everything is community-owned, so whatever people want to do in the pipeline is completely open, and even I often make small pull requests to pipelines I’m not responsible for. But as an administrator I have permissions to do this.

Shauna: So is each pipeline its own GitHub repository within the organization?

Maxime: Exactly. Most repositories are pipelines. For example, “MC-Micro” is another pipeline, then we have our website and configuration repos. The configuration repository is useful even outside NF-Core—it helps users parameterize their Nextflow pipelines. We also have modules, which are smaller building blocks of pipelines.

Shauna: Got it. It looked like your repositories get updated frequently.

Maxime: Yes, some of them—like the modules repository—are updated very often because they’re used by everyone. NF-Core provides both pipelines and the components and tooling that support them. Modules are the smallest units, and subworkflows are assemblies of modules.

Shauna: Can we look at the list of repositories again?

Maxime: Sure.

Shauna: So, for example, “differential-abundance” is a pipeline, and “tools” is a Python helper package?

Maxime: Exactly. The “tools” repo contains helper scripts, and the “template” provides a standardized skeleton for pipelines to ensure structural consistency and ease of review, even if you don’t know the language.

Shauna: Can we open one of the pipelines? Maybe “differential-abundance.”

Maxime: Sure. On the master branch, you’ll see the latest release. The dev branch shows current work—some commits were made just yesterday. You can see it’s one of our top pipelines.

Shauna: Does each pipeline have its own release process? Is it standardized across the pipelines?

Maxime: Yes.

Shauna: And do you have it delegated - is there someone who is a maintainer on each repository that has the ability to click the merge button on the pull requests, or how is that structured?

Maxime: The way that we work now, and this is something the core team is thinking about, we have a PR that’s doing the release that’s merging all of the commits to master. Release frequency depends on pipeline size and activity.

Shauna: And the modules and tools repositories—are they used within the pipelines?

Maxime: Yes. Each pipeline imports modules from the module repo. They’re copied into the pipeline with a JSON file (modules.json) tracking where they came from and which commit. We decided to copy rather than use symlinks or git submodules because it’s more efficient.

Shauna: When did you come up with this modules.json syntax? This feels like a solution to something that wouldn’t have been there at the very beginning.

Maxime: It came with the shift from Nextflow DSL1 to DSL2. With DSL2, we introduced shared modules. That required a lot of discussion on naming, structure, and organization. This started around 2020.

Shauna: I assume DSL1 and DSL2 weren’t compatible? And switching from one to another required a whole migration process?

Maxime: Correct. Migrating from DSL1 to DSL2 was a major effort—it took several months for some pipelines. If a pipeline still uses DSL1, it must use an older Nextflow version, but that’s manageable.

Shauna: Migrations like that can be tough. Any lessons learned or best practices?

Maxime: Yes, we’ve gone through several migrations. DSL1 to DSL2 was the biggest. For “Sarek,” it took months since everything had to be migrated at once. Other migrations were easier but it was complex to put into fuller usage. We emphasize testing. Every module is unit-tested independently before being merged. A couple of years ago, we migrated our testing framework from Pytest to NF-Test—a testing tool developed by community members specifically for Nextflow. That migration took about two years and finished recently.

Shauna: Interesting. What did NF-Test add that Pytest didn’t have?

Maxime: Previously, with Pytest, we wrapped Nextflow modules as single units and manually checked output MD5 hashes. With NF-Test, tests are written directly in the framework, and results are stored as snapshots. It’s much easier to maintain and update since we no longer hardcode checksums—it’s all handled automatically.

Shauna: Gotcha. Switching gears back to the RFCs for a moment — I’m curious about one of the most controversial or most-discussed community decisions you’ve had to make. Maybe that’s an RFC with the most comments, or maybe it’s not actually an RFC at all. It could be something that just required a lot of discussion. It doesn’t necessarily have to have been full of drama, but something that people really had to talk through.

Maxime: Yeah. I think if I look at something that might have been more controversial, it would probably be a pipeline proposal.

This one was accepted, but the discussion went on for quite a while — people debated whether there was overlap with other pipelines, and if so, what that overlap was. We asked questions like, “Can we contribute to the existing pipeline instead?” and, “What exactly is this new pipeline doing?” We also discussed how it should be organized.

In the end, we decided that while there were some similarities with existing pipelines, it was different enough to justify being its own pipeline. The discussion involved multiple community members, and it was a good example of collaborative decision-making in NF-Core.

Also, we really like making a subway map of all the pipelines.

Shauna: Oh, that’s so cute. So, what was the end result? It got created as a separate pipeline?

Maxime: Yes, the end result was that it was accepted as its own pipeline — you can even see the “accepted” label. I think we had just started to roll out a new system for approving things, and at that time, we were still working out the quorum requirements for approval. The bot wasn’t functioning properly on that particular pull request, so there were some technical issues to fix.

Shauna: Is that a custom bot you created?

Maxime: Yeah, we created a custom bot. That’s one of the additional repositories we maintain in NF-Core. A lot of our repos are pipelines, but we also have others that support the pipelines — like for infrastructure, automation, and documentation.

Shauna: Right, you also had a website repo and some other non-pipeline repos, right?

Maxime: Exactly. We have GitHub Actions that help us set up tests, specific scripts for infrastructure, and an infrastructure team that supports everything. We also recently created a documentation team dedicated to improving our written resources.

Shauna: Cool. What kinds of things does the bot do in general?

Maxime: The bot can help us make new PRs updating the skeleton template. The bot creates the PR, and when the PR is ready I can have a look into it and fix anything that needs fixing.

Shauna: Can you scroll up to the original post — just to see what the bot created?

Maxime: Sure. Here you can see that the bot created all these commits. We’ve been using this pipeline as a testing repo, but another example — like the “differential-abundance” pipeline — might show the bot’s work more clearly. There’s an open pull request there, though some merge conflicts still need to be resolved.

Shauna: And the “template” — is that like a PR template, or an NF-Core template?

Maxime: It’s what we call the pipeline skeleton — the structure we use for every NF-Core pipeline. We keep it updated with new Nextflow syntax and features. Basically, we’re usually about six months behind every Nextflow release. Once new features are stable, we incorporate them into NF-Core.

Shauna: So when there’s a syntax update to the templates that impacts all the pipelines - once the decision’s been made, and I don’t know if that’s always an RFC or whether one of the other committees or teams that you talked about earlier does that, once that decision gets made you can roll that out to all of the pipelines through this bot that opens up a pull request on every repo with the syntax changes.

Maxime: Exactly. We try to automate as much as possible. Most of us come from bioinformatics, and I like to joke that we’re “lazy” — we automate because it’s efficient.

Shauna: In the U.S., there’s something called a “lazy Susan,” which is a rotating tray you put on a table for things like salt, pepper, or condiments. Instead of passing items around, you just spin it. I like to call them “efficient Susans.”

Maxime: [laughs] I like that! So, in this repository, we have a lot of Python code and a pipeline template that anyone can use to build their own pipelines — whether for NF-Core or personal use.

Many people outside NF-Core use this template because it’s built with useful features and designed to be highly modular. You can enable or disable features as needed.

[demonstrates CLI for template]

For example, when you create a new pipeline, you can specify whether it’s for NF-Core or custom use, choose options like badges, licenses, changelogs, or tests, and configure everything interactively. The template is modular and user-friendly — great for newcomers.

Shauna: Yeah, this looks incredible. I also really appreciate that the very first page says, “If you want this pipeline to be an NF-Core pipeline, please talk to us first.”

Maxime: Yes, that’s one of the main issues we face. Often people come to us with a fully completed pipeline. But since NF-Core is a collaboration-first community, we prefer that people discuss their ideas early so it can be a true collaboration from the beginning. That’s why we emphasize that message everywhere — we want to encourage discussion before development starts.

Shauna: That makes perfect sense. I think that’s a near-universal best practice. Talking to the community and maintainers early helps avoid big reworks later — issues that could have been prevented by simply having a conversation upfront.

Maxime: Exactly. And one of the big benefits of making a pipeline proposal early is that you can find collaborators and contributors right away. That really improves the code quality in the long run.

Shauna: Yeah. All right, we’re getting close to time — I booked about 90 minutes for this, and I think there’s only so long anyone could keep watching. So before we wrap up, is there any area or topic you think we should cover that we haven’t yet?

Maxime: Yes — one last thing I wanted to mention is documentation. We have detailed documentation on community contributions, coding guidelines, tutorials, and component usage. We also collaborate frequently with the Nextflow community and with Seqera. There’s a great training program they’ve made that we highly recommend for newcomers to Nextflow. We also have documentation on graphic design — especially workflow diagrams and subway maps. Here’s an example — this animated subway map is one of the things I’m most proud of.

Shauna: Oh wow, that’s incredible. I love it.

Maxime: Yes! It’s made entirely in pure SVG, so it works flawlessly.

Shauna: Incredible. Sorry, I’m completely distracted by this animation. I feel like I could just watch it for hours.

Maxime: [laughs] Sorry!

Shauna: No, it’s great. I might come back later just to stare at it and meditate.

Maxime: So this shows how we’re organized as a maintenance and overall community. I think we also have some statistics as well. Obviously, Phil loves statistics, so that’s why we have a lot of them — showing the NF-Core community in numbers, in terms of Slack, GitHub, Twitter, and so on. I think the pipeline numbers are interesting. We can see that we have a lot of pipelines that have been released or are in development. I think we’re almost reaching a plateau — maybe, I don’t know — I guess only time will tell.

Shauna: I had a couple more questions. It looks like you have a bunch of different ways for people to start contributing. Do you feel like there are some common new contributor pathways that are particularly successful, or common routes people tend to take?

Maxime: I would say usually the first way people communicate with us is either on GitHub — by creating an issue when something isn’t working — or by reaching out to us on Slack.

Oh, I didn’t show Slack yet. So, this is Slack. I’m on every possible Slack channel that we have in this workspace, so I see a lot of things going on. We have a dedicated Slack channel for every pipeline, and for some pipelines we have a dedicated development channel, so the people actively developing that pipeline can work together.

I also have some specific private groups because I like creating channels — and I’ve created many, many other ones. We even created a channel just to request reviews, because we’re a community-first organization. So, if you need someone to review your code for an NF-Core pipeline, you can ask people there. We have a channel where people can vent when something isn’t working and they just need to rant a bit, and another one where we thank people — where we highlight when someone has been really helpful or amazing.

Maxime: Yes, I think we’re quite a good community. I haven’t seen many issues — at least not any that reached us at the core member level. I know that we have a safety group, and they’ve responded to some situations, but from my understanding, nothing major that required us to intervene directly.

Shauna: It’s also true that what helps people be happy in a community is knowing that there are spaces for conflicts to get worked out and for issues to be handled. Often, communities that have spaces for people to be happy and celebrate each other also need to have spaces for handling issues. Any community of significant size — and NF-Core definitely qualifies — will have some kinds of conflicts or issues.

Shauna: A couple of other quick questions. You have maintainers for pipelines — I’m curious, do you ever have problems getting people to step into those maintainer or leadership roles? And if so, do you have any systems, processes, or cultural practices to help people move from being new contributors to taking on leadership roles?

Maxime: I don’t think we have major issues there. The main thing is that pipelines depend heavily on the people who work on them. Often, we have contributors who are very dedicated to a pipeline because they’re doing a PhD, a postdoc, or working on an infrastructure where that pipeline is useful. If you find collaborators in that area, then multiple people can work together on it. For some pipelines, you can see that contributors shift over time — from one person to another. For example, if we look at the contributors list for the RNASeq pipeline, we can see Phil was very active early on, then Arshad joined and was active around 2020 to 2022, then John joined later, and I joined later as well. Contributions shift over time from one pipeline to another. For me, the pipeline belongs to the community — it’s not owned by a single person but by a group of people. What we try to do is make sure people can work together and have a good framework, so if someone stops maintaining a pipeline, others can come in and take over.

Shauna: Right. Do you have a process for that, or is it more on a case-by-case basis?

Maxime: It’s definitely a case-by-case basis. For example, once there was a proposal to create a new pipeline that would replace an outdated one. So, we decided to use that new proposal to take over the older pipeline.

Shauna: Got it. So, you wouldn’t rename the old pipeline but rather merge the changes and keep the original name?

Maxime: Yes, I think we decided to keep the original name — to stick with the same repository. For example, we kept the name “KIC” instead of changing it, because it was simpler and more consistent.

Shauna: In a case where a pipeline gets completely replaced, does that cause breaking changes for people using it?

Maxime: No — every release ever made stays listed on NF-Core. We never delete anything. Everything that has worked continues to work unless something external, like BioConda or BioContainers, breaks.

Shauna: Gotcha. And users usually pin to a specific version, right?

Maxime: Definitely. That’s what we recommend. For example, when I run a pipeline — nextflow run nf-core/pipeline-name -r — using the -r flag pins it to a specific version. That way, Nextflow automatically pulls the correct version and all the right containers. In this example, it’s using Docker to pull all the images we need to run the pipeline. This particular run is just a test profile with small data, so it finishes pretty fast.

Shauna: Amazing.

Maxime: That’s also what we use for our CI tests.

Shauna: One last question, and then I’ll really end this — even though I don’t want to, since we said 90 minutes. You have a fairly developed governance structure. I’m curious when that developed and if it came about in response to any particular need or event.

Maxime: Basically, Phil created MultiQC, which is a Python tool that collects reports and metrics from various tools and generates a unified final report — a MultiQC report. It’s one of the most common tools in bioinformatics because every tool produces metrics, and everyone wants to see how their experiment performed. The idea is simple but super efficient — it’s what everyone wants. That project became huge, and Phil experienced firsthand how hard it was to handle such a big project alone. So, when NF-Core was created, he decided from the start that it should be a team effort, not a one-person project. At the beginning, NF-Core had Phil, some administrators, and other team members who joined gradually as needed. We’re still creating new teams — for example, we recently created a dedicated documentation team.

Shauna: When would you say that change — the formal team structure — really started happening?

Maxime: MultiQC itself started years ago, separate from NF-Core. But Phil’s experience managing it alone is what inspired him to structure NF-Core differently from the start — as a shared governance model.

Shauna: Gotcha. That makes sense. I think that’s relatively rare for project founders — to have that level of foresight about governance. The more common model is someone starting alone, then realizing they’re burning out, and only afterward trying to create a governance structure.

Maxime: Yes, exactly. One of my colleagues and I have been working on some visualizations of contributor growth over time. Let me share my screen. Here, you can see that there were only a few of us at the beginning, and now there are many more. Some people join for shorter periods, others stay long-term. This shows the growth of the core team over time.

Shauna: Cool. It seems like you started with a solid group and then had a few successive waves of contributors. Overall, you’ve grown, even though individual people have moved on.

Maxime: Yes. Some people moved on because they joined other companies or changed jobs — for example, Kevin switched fields completely and isn’t working in bioinformatics anymore, so he’s now listed as an alumni. UNS felt the project was too big for him, so he didn’t continue. Andreas was one of the founders with Phil and Alex, but unfortunately, he couldn’t continue forward, so he left pretty early. Alex also left when his new position made it too difficult to stay involved.

Shauna: Those are all pretty normal reasons. I think it’s great that you’ve set up a system and community where anyone can step back when their life or career changes, rather than feeling stuck maintaining something alone. So many projects rely too much on one person, and that person ends up staying even when it’s no longer right for them. I really appreciate your structure.

Maxime: Usually, what we try to do is list, for each team, who the leads are, who the members are, and who used to be involved. The maintenance team is one of the biggest. Since everything we do is open source and no one gets paid directly — we don’t own any money ourselves — that structure is important. We did receive some funding years ago from the ELIXIR-Transparg Initiative, but that’s no longer available.

Shauna: So most people who contribute are funded through their jobs — graduate students, professors, or company employees?

Maxime: Yeah, basically. For example, I have contributors working with me on some pipelines because it’s easier for them to collaborate with us on a shared project than to develop their own internally. It’s more, “Let’s contribute to an open-source project together,” rather than doing it in isolation.

Shauna: So in some sense, most contributors are paid — in that it’s part of their job — even though they’re not paid by you directly. It’s a collaborative effort.

Maxime: Exactly — a collaborative process. We get some support from private companies, especially for our infrastructure. I don’t remember who pays for Slack, but someone does. AWS helps a lot — they fund our CI runners and provide infrastructure for bigger tests. Seqera also contributes people and resources from time to time. We’re starting to see a few freelancers or paid contributors supported by companies to work on specific projects. I think the field might evolve further in that direction.

Shauna: Do you ever have issues where contributors’ paid work priorities conflict with community needs? For instance, I’ve seen projects where people’s employers push them to focus on things directly tied to company success, while community management, governance, or documentation — the “soft” but essential work — can get overlooked.

Maxime: Yes, definitely. That’s a real issue. We lack people for outreach and documentation because it’s hard to explain to an employer why that kind of work is important. It’s difficult to justify to your manager that your open-source contribution benefits many people but doesn’t directly profit your company. Still, most of us at my level are passionate people. It’s 8:30 p.m. for me right now — so you can see that this kind of work often happens outside regular hours. But I’m used to that since we collaborate globally — with colleagues in Canada, New Zealand, and elsewhere — so finding time to meet is always a challenge.

Shauna: Yeah, well, I really appreciate you taking the time to talk to me for over an hour and a half about your project! I can see from the documentation, the structure, the bot, all of these thoughtful ways that you’ve really been taking a sort of global, collaborative perspective. It sounds like that was something that was part of the culture from the beginning, with the founder being like, “Let’s not make this all about one person’s needs or put it all on one person’s shoulders,” but instead make it a group effort. Yeah, this was really interesting. I’ve never done one of these before, so I don’t entirely know how to wrap this up. Is there anything else you wanted to add or say?

Maxime: No, I think—yes, I should have presented more of the stuff that we’re doing, but we don’t have enough time. Definitely, what’s fun as well—part of the stuff that we’re doing on the outreach side—will be the Byte-Size Talks that we’re doing on the Nextflow side. I know that the SEA community team is doing some podcasts that are quite interesting. So yes, we are collaborating a lot with each other and also with other communities. For me, what’s key in all of this is trying to be as open and transparent as possible, to contribute together. All of that goes toward the goal of being reproducible. For me, those three concepts are what drive NF-Core and what drive my work every day.

Shauna: Excellent. Well, if you have anything else you want to add—or any links, whether it’s your podcast stuff, etc.—at some point, hopefully soon, I’ll make a YouTube and probably a Fediverse Tube channel. Is it PeerTube? I think it’s PeerTube.

Maxime: PeerTube.

Shauna: Right. I’ll make a channel and post this video at some point, but I haven’t actually made the channel yet, so I can’t post it immediately. But in any case, if you have any links beyond your website—maybe a new contributor area of the docs or ways for people to learn more, or just things you want to highlight—I’m happy to include them in the description as a bunch of links.

Maxime: Yeah, definitely. I think I’ll go back with my contributors within the core team, let them know what we did and how we did it, and see if they want to add anything—because yes, I probably forgot tons of stuff.

Shauna: I also gave you absolutely no structure to this. Hopefully once I do a couple more of these, I’ll get a sense for what’s helpful to talk about. Then I can be like, “Hey, prepare to talk about these things.” But for anyone viewing this, I gave you absolutely no guidance because I didn’t know what we were going to talk about—it’s the first one.

Maxime: No, but it was super fun, and I think it was good to have such an unguided structure. I had no slides prepared; I just used the website. Oh, one last thing that I really want to show—let me see if I have that somewhere. It’s sad that you cannot see my computer at the moment, because the thing I want to show is—

Shauna: You can share your screen again if you want.

Maxime: No, it’s not my—yeah, I’ll share my screen, but it’s not my screen that I wanted to share.

Shauna: Sorry, right.

Maxime: Okay, this is my computer.

Shauna: Oh, it’s not showing it—oh!

Maxime: Okay, sorry.

Shauna: Oh, there it is! Sorry, I’m just too impatient. Oh, I love that.

Maxime: This is my computer. We make stickers—quite a lot.

Shauna: I was going to ask about that because you have a drop-down that’s like, maybe in the “About” section—it says “Governance” and then “Stickers.” And I was like, “Stickers? Why are stickers at the same level as governance?”

Maxime: Yes! Stickers—because we are highly into stickers. So this is my pipeline; I designed the sticker and printed it myself. The golden stickers are the ones we make for specific events.

We had one for some quiz; when we meet each other at hackathons, we have stickers of socks around, and the people who find the most socks get a sticker for that. We even made a tiny video game. These are part of the teams I’m in—Outreach, Maintainers, Infrastructure. These are all from events that have happened in the community.

I’ve been a mentor, I’m a Network Ambassador as well. We print other specific stickers—like MergeCo, Documentation Robot, and more. We try to be fun and have this kind of stuff, and at every event you go to, you’ll get stickers.

Shauna: I love this so much. That’s so cute.

Maxime: And I think my colleagues will be proud that I didn’t forget about the stickers.

Shauna: Excellent. All right, I’m going to go ahead and stop the recording. Goodbye to anyone who’s watching this—see you next time.

The Relational Tech Blog

Getting to Know NF-Core (Open Source Project Tours)

Intros & History of nf-core

Start Actual Tour

Relational technology is built by and for people in relationship with each other.