Dialects in Code: Part 1

August 26, 2020 9 minute read

For a long time, I’ve been interested in how different folks can use the same programming language in radically different ways. I’ve privately used the term “dialects” to describe these different approaches. In this series, I want to talk about recognizing dialects in code and how we can make them work for us.

Before we get into concrete benefits though, let’s focus on dialects, where they come from and how they find their way into our code.

The Big Bad Theory

Telepathy doesn’t exist.

To communicate our thoughts, we need to encode them into language. The process of encoding thoughts into a language produces an artifact, like written text, spoken words, sign language, subtle facial expressions or even a combination of these.

This artifact is always an imperfect copy of the author’s original thoughts. You can say “X is like Y” or “I am sad today” but this is always a simplification that doesn’t have the exact connotations, knowledge or context that remain in your brain. Those thoughts remain hidden to us. Sharing the full state of your brain would take too long, be too complex and might not be possible since the other person still has their own thoughts. Like saving a JPEG, encoding a thought into a language is always a lossy process. The artifact that comes out of the encoding process is different than the thought that went into it.

The advantage of artifacts is they can be shared. By simplifying our thoughts, we reduce the information we’re transmitting to a managable amount. Sometimes the thought even becomes clearer because we’ve simplified it. Artifacts can also be communicated or stored for further use. That’s a huge advantage.

On the other side, the person consuming the artifact will understand it differently depending on their own thoughts, experiences and knowledge. I could say something like “Monads are monoids in the category of endofunctors.” This might be a concise, correct statement but how you perceive it depends on your knowledge of functional programming knowledge, or recognizing it as a meme/joke, or just how you feel about me in general.

Language has rules but they’re very fuzzy rules on both the creating and receiving side.

Now in Code

The same is true for programming. When we begin writing code, we have an idea we want to express, like a rough sketch of an API, or a fix for a bug, or a full fledged model. If we want to automate the execution of that idea, we need to express it in a programming language.

Programming languages tend to offer less expressiveness than natural languages. That means it takes extra effort to encode information in them and it takes a lot of contextual knowledge to understand the authorial intent. On the flip side, programming languages are significantly less ambiguous than natural languages. If expressing an idea in English is like saving a photo as a JPEG, expressing an idea in code is like saving a photo as a GIF with max 256 colors: you lose detail but gain conciseness.

That tradeoff allows a computer to execute our idea with incredible speed and accuracy, which is a heck of a thing.

Information Through Convention

The loss of information about business requirements or programmer intent still presents a major challenge for maintenance and future development. We can work around this by supplementing the code with discussion and documentation, and teams frequently do. However, the code itself remains one of our most vital and useful artifacts. Even very human centric methodologies like Domain-Driven Design acknowledge code as the primary artifact your team uses.

So to help preserve our mental well-being, programmers created practices, tricks, and conventions to help impart more meaning into code.

For example, long-term PHP’ers may remember that classes didn’t always have visibility modifiers for properties or methods. This was frustrating when writing a class because things could change or be called in unexpected ways and it was frustrating when using a class because it wasn’t always clear which things you should or shouldn’t touch.

// If a user changes the name property directly here, the changed name may get saved
// in the database but it will fail to change the lastUpdated time. Oops!
class User
{
    var $name;

    function name($name)
    {
        $this->name = $name;
        $this->lastUpdated = time();
    }

To work around this, PHP borrowed an existing convention that private or protected class features should be prefixed with an underscore. By adding and understanding this convention, we can convey extra information that prevents bugs:

// Here the _ tells the user “don’t change this property directly” so 
// hopefully they’ll scroll farther down and find the name() method
class User
{
    var $_name;

    function name($name)
    {
        $this->_name = $name;
        $this->lastUpdated = time();
    }

It may seem crude today but for developers at the time, this convention added a lot of safety and intent to the code. You could rely more readily on autocomplete without having to read the entire class and that was a big productivity gain.

Still, it wasn’t perfect: you couldn’t differentiate between protected and private. Some folks tried a double underscore, others did it the other way around and most didn’t differentiate between the two at all. Eventually, this was formalized into the PHP language itself as public/protected/private visibility modifiers which is now the most widely used way of expressing this.

Most practices will never end up as part of the core language itself. They’ll remain userland conventions like a variable naming scheme, a specific directory structure, a preference for one design pattern over another, etc. And that’s good, userland conventions are typically cheaper to produce and iterate on. Even features that make their way into the language itself (our visibility modifiers) may change or even be removed. Languages are always in motion, evolving towards the future.

Enter the Dialect

Over time, these practices group, combine, split and merge into semi-stable clusters which are used and recognized by a group of people. Keeping with our “Programming languages as a language” metaphor, I refer to these groups as “dialects” of the programming language.

In human language, a dialect is a version of a language (say, English) with its own grammar, vocabulary, idioms and other other features. For example, I’m originally from the Southern part of the US where we say “y’all” and “over yonder”, which is unique to our region.

Dialects within the same base language are usually still intelligible to each other. I can communicate with folks from the US Midwest who have their own dialect. I may have more trouble with someone from the UK who speaks Cockney but given time, we can probably work it out.

All of this is true for programming communities as well. Different communities will have unique practices, unique ways of combining them and occasionally different difficulty levels in understanding each other, even if they’re using the same programming language.

Because of its wide variety of uses, PHP has a host of dialects and like human languages, most are associated with and propagated by a community of some kind. The boundaries between dialects can be blurry, as they’re essentially artificial, but it can be useful to talk about. We can identify them by a particular framework (Laravel vs Symfony dialects), a product (Drupal vs Wordpress dialects). It can be generational (PHP devs from before 2010), geographic (PHP’ers from The London School of TDD), or even comparative (“Java-style” PHP). A dialect can be as small as one person (a personal style) or as large as several million.

It’s important to realize that just as with natural languages, there is no “true” dialect of a programming language. Even a standardized dialect is still just a dialect. Therefore, when we talk about encoding our thoughts into an artifact, we’re doing this by applying not just a language but a specific dialect of that language and that will impact the resulting artifact, i.e. code.

How do dialects impact my code?

Even something as simple as naming a class is impacted by our choice of dialect. Let’s say you had a class that receives an event and dispatches it to 0 or more listeners. Would you go with:

class EventDispatcher
class DispatchesEvents

EventDispatcher is probably the most common choice, as the bulk of dialects I’ve seen name classes after nouns. That said, there are communities in PHP that prefer to name classes after the responsibilities that class has, using a form that focuses on verbs. It’s a very distinctive choice and like my use of “y’all”, it’s often a good clue as to the origin of the code I’m looking at.

As we said before, both dialects are equally valid, there is no right or wrong choice here! However, confusion arises when consumers of the artifact are unaware of the implicit rules of the dialect used or, worse, apply conventions from the wrong dialect.

For example, in some communities, the verb-first form is never used for classes but is used for traits. If I’m looking at a file tree of a library and assume that DispatchesEvents.php is a trait, that can confuse my understanding of the overall library.

So, should we value mutual intelligibility of dialects above all else and always use the most common convention? There’s certainly value in that but allowing changes like this is a key element in innovation. It’s a balancing act and heavily dependent on the intended audience for our code. Is this only used in a specific team? Will it be a widely consumed library? Is it mainly consumed as a binary without ever looking at the code?

Let’s look at another example. You need to iterate over a list of numbers and perform some operation to each of them. How do you prefer to iterate over them?

for ($i = 0; $i < count($list); $i++)
foreach ($numbers as $list)
map($list, /*...*/)

While we may each have a personal preference, the choice here will depend on the dialect you’re writing in. In a C-ish style, the for loop will be the most idiomatic but has mostly died out in modern PHP. foreach is by far the most common construct used, while map is the most concise but also requires some FP knowledge.

You can also see how some dialects trade general applicability for increased precision of language:

for repeats any code any number of times based on an arbitrary condition
foreach repeats any code once for each item in a list
map applies any function once to each element of a list and returns a new list.

By using more and more specific jargon in our dialect, we can create shorter, easier-to-read code, at the expense of a higher barrier to entry.

Summary

When we write code, we filter our thoughts through a dialect, a collection of practices that help shape the code we produce.

We’ve focused on theory but in the next few articles, we’ll look at concrete benefits of using dialects, such as:

Understanding when practices work together and when they don’t
Easier communication and onboarding
Using dialects to indicate levels of quality
Improving expressiveness in code

Many thanks to Shauna Gordon and Frank de Jonge for reviewing this post.

Ross Tuck