There are many ways for newcomers to participate in a project, from testing onboarding documentation to reviewing pull requests to showing up to community events. But it can be surprisingly difficult to find ways for newcomers to contribute code.
This can lead to frustration, as newcomers are often eager to make code contributions specifically. It's worth unpacking this dynamic, but also...it's nice to help people do what they want to do.
So: how can we make it easier for newcomers to contribute code? There's a long-standing tradition for open source maintainers to label certain issues in their issue tracker 'good first task' or similar. But what makes for a good first task?
You can use the acronym FRAME to help you frame good tasks for newcomers. This stands for:
The most important element of a good first task is that it doesn't require a lot of project-specific knowledge. Newcomers by definition don't have project-specific knowledge. So a task that requires interacting with all the weird internals of your project is not going to be a good first task.
Many projects will have some part of their codebase that is abstracted away from that internal complexity. Often this is done intentionally through modular design.
For example, I maintain a project called Parsons, which is a Python package that helps progressive data professionals build pipelines. The package contains over forty "connectors" to various third party platforms. In order to build a new connector, contributors only have to deal with a few Parsons-specific abstractions. They have to collect authentication credentials in a specific way, they have to make requests through our client, and they have to return a specific class of object called a Parsons Table.
This is not trivial to learn, but it is only a sliver of the total complexity of Parsons, and crucially it is predictable what people will need to learn. Which brings us to the next attribute of a good first task...
Because we know ahead of time what parts of Parsons contributors need to understand to add a new connector, we can proactively explain it to them. We can say, "hey you need to follow this authentication protocol, here's exactly how to do so".
In fact, I wrote up a 20+ page guide to adding new connectors. In addition to explaining the relevant parts of Parsons, it also explains relevant Python concepts and the basics of using git. Many new contributors to Parsons are also new to Python, open source, and/or software development. They often need help beyond my written guide, but the guide offers them a great starting point, and allows them to communicate exactly where they're stuck.
You don't need to explain everything - there are places in my guide where I could've linked to existing explanations of things like unit testing. But you should be able to explain the steps of the contribution in a reasonable amount of detail. If you can't, it's a sign the task may be too confusing for newcomers.
If you're going to spend all this time writing up a guide for the task, it can't be a one-off task. You want something that can be used by many people over time. This is where modularity comes in again - if you've got a modular design, people can extend the project repeatedly by following the same steps.
The first version of our new connector guide was published in August 2020. People are still using the guide as of November 2024. During that time, people have added nearly twenty new connectors—and for several people, it was their first-ever open source contribution.
Bugs are, by definition, edge cases. They're not repeatable - or at least, you hope they don't repeat.
Bugs have a way of escaping our neat abstractions. You might start looking at a bug thinking the problem is in one place in your codebase, and then it turns out the issue is in another (or in the code of a dependecy, or some weird interaction of the two).
For this reason, bugfixes tend to make for bad first tasks. People can have good experiences with them, it's just risky. Sure, a maintainer can go through and double check that the bug is exactly what they think it is, but that's a lot of work for the maintainer to do for a single, non-repeatable, one-off task.
Feature additions can also be bad first tasks, if they're not abstracted, explainable, and repeatable. But they're more likely to be good tasks than bugs are.
As I mentioned several times above, Parsons is a Python package that helps progressive data professionals build ETL/ELT (extract, transform, load) pipelines. It's built around the package PETL and provides some basic data classes and utilities to standardize and transform data while moving it.
Parsons has a modular design, with most of the project's codebase existing in one of 40+ "connectors" - data classes corresponding to a specific third party platform like NGPVan, Twilio, GoogleSheets, etc. Users frequently request that new connectors be added, so it is an evergreen, repeated need.
More than half of Parsons' contributors are new to open source, and many are new to programming generally. So we wrote up a new connector guide which covers not only Parsons-specific information but also an introduction to basic concepts like unit testing.
The first version of our new connector guide was published in August 2020. People are still using the guide as of November 2024. During that time, people have added nearly twenty new connectors—and for several people, it was their first-ever open source contribution.
PlasmaPy is a Python package for plasma physicists and physics students.
One element of their package is the Formulary. It's named after the Naval Research Labratory's Plasma Formulary, "an eclectic compilation of mathematical and scientific formulas, and contains physical parameters pertinent to a variety of plasma regimes, ranging from laboratory devices to astrophysical objects". The NRL's Formulary has been compiled over nearly fifty years and is widely used in Plasma; PlasmaPy's Formulary encodes those formulas into easily usable utilities.
While implementing a new formula from the formulary in PlasmaPy takes significant domain knowledge, it does not require a ton of intricate knowledge about PlasmaPy. Some knowledge is of course needed, such as understanding how PlasmaPy handles names and units. People adding to the formulary also need to use a descriptor called validate_quantities to, well, validate quanitites.
The knowledge of PlasmaPy needed is limited, explainable, and predictable.
Unfortunately, the task of adding to the formulary is not infinitely repeatable. There are only so many formulas to be added, and the easier ones have generally already been added, leaving only esoteric ones that only a small number of people have the domain knowledge to add.
If you know of a project with feature-based, repeatable, modular and explainable first tasks, please let me know, I'd love to add them to this list of examples.
Copyright © by Shauna Gordon-McKeon. All materials on this site are licensed CC-BY-NC unless otherwise stated; please contact me directly if interested in commercial use.
This website's design is based on a theme by Themefisher. It is built using the open source projects Bootstrap, Jinja, Python Markdown, Shuffle JS, SlimSelect JS and LiberaForms.