War is Peace and Bugs are Features

In the book, “1984”, George Orwell highlighted the power of words to shape thought and behavior. The Ministry of Truth controlled news, media, and historical records. Newspeak was the official language and was used to control people’s thoughts and reality by controlling words. Confusion was deliberately created through Newspeak phrases like “War is Peace”. Orwell showed us that words and vocabulary matter. When language becomes inconsistent or unstable, chaos and confusion follow.

In the software world, we can drift into similar confusing situations. It doesn’t happen intentionally, but through neglect. If we’re not careful, we end up with phrases like, “it’s not a bug, it’s a feature” to obfuscate reality.

An antidote to Orwellian Newspeak can be found in the concept of a “Ubiquitous Language” from Domain Driven Design (DDD). A ubiquitous language establishes a shared vocabulary for entities, behaviors, and relationships within the domains of a system. It is difficult to get this right, but often, an imperfect but consistent vocabulary is more valuable than a more correct, but changing vocabulary. Newspeak seeks control through ambiguity; a Ubiquitous Language seeks precision through shared truth.

If a project can start out on the right foot and has agreeable people, it is possible to define a strong Ubiquitous Language. Much guidance exists on achieving this. However, as systems grow and as people come and go, the language inevitably degrades or evolves. This happens even within individual domains. The problem is compounded when systems are created without an initial effort to normalize the vocabulary. People naturally cling to terms they first learned and over time the code becomes littered with multiple names for the same concepts. At best, a language might tend to two or three competing vocabularies. Once this nefarious state of a muddled vocabulary has been established, it can be difficult to unwind.

Three primary strategies can be used to deal with inconsistent vocabularies. The first is to enforce consistency through gating mechanisms. The second is to allow divergence, but with control explicit mappings between different vocabularies. The third is to let the vocabulary evolve organically and hope the vocabulary stabilizes on its own. The best approach depends on the context.

Canonical Language Enforcement

Vocabulary can sometimes be best managed through automation and templates. If a Ubiquitous Language has been agreed upon but the code does not conform, then API templates and code style enforcement can be used to sanitize the code base and ensure ongoing adherence.

API Templates: A vocabulary can be embedded in required templates used across a domain or system. Common mechanisms include RAML and YAML for REST services, ProtoBufs for gRPC, WSDLs for SOAP services, shared libraries, and C header files for C code. These approaches are well understood, but achieving consistent adoption is difficult.

Source of Truth APIs: Metadata lookup APIs owned by domain services can serve as authoritative references for vocabulary and relationships. These APIs are especially helpful when vocabularies evolve over time. For example, a list of supported geographic regions may expand or shrink as an organization changes. Using shared templates requires updates and rollout cycles. APIs can be queried dynamically, allowing client services to adapt automatically, without going through an update.

Code Validators: Code validators, style checkers, attempt to enforce consistency with templates. While these tools can feel burdensome during rapid development, they help maintain code consistency in the long-term. For code bases that are already inconsistent, these tools can be used to surface discrepancies that can be resolved in a refactoring effort.

Data Warehouses are often overlooked in enforcement strategies. Strong mechanisms are rarely used to keep code and data warehouse vocabularies aligned. While it can be argued that data warehouses do not need to conform to application vocabularies, communication and reporting problems can occur without some alignment. When warehouses are populated from event streams, consistency can be facilitated through message schemas. When data warehouses are populated directly from application database change events, consistency is harder to maintain. Application databases may take performance shortcuts that break consistency with a vocabulary. Additionally, since warehouses contain data that invariably crosses domains, it is likely that even if domains are using a consistent vocabulary, the warehouse will have to use different vocabularies.

Pros:

  • Provides for the highest degree of consistency
  • Reduced cognitive load for new team members and other teams
  • Encourages stronger tooling and automation
  • Simplifies regulatory and oversight requirements
  • Encourages strong domain modeling and architectural discipline

Cons:

  • Requires strong and sustained architectural leadership
  • Adds overhead and can slow the development process
  • Vocabulary changes can become bottlenecks
  • Can increase friction for experimentation
  • Risks premature standardization
  • Introduces additional dependencies on tooling and process

Semantic Mapping

DDD explicitly allows different vocabularies across domain contexts. This is reasonable and can be useful because not every domain has the same considerations. However, at domain boundaries, the vocabularies need to be mapped to each other. These mapping effectively form a thesaurus implemented through transformations. A similar mechanism can be used within a single domain that is not conforming to a single language.

In practice, it is rare to find a non-trivial code base that does not have mapping logic. Web and database interfaces almost always require data type and vocabularies translation. Data Abstraction Layers (DALs) exist largely for this purpose. Integration with independent third-party libraries inevitably introduces foreign languages that require mappings. For these inevitable uncontrollable situations, translation can be achieved through mappings.

Pros:

  • Allows multiple vocabularies to coexist
  • Mitigates the unavoidable differences within and across domains
  • More resilient to organization and third party changes
  • Encourages explicit boundaries and contracts
  • Supports coexistence with legacy systems
  • Scales better and can enable faster development

Cons:

  • Inconsistent vocabularies can persist
  • Mapping complexity can become unmanageable
  • Transformations may introduce noticeable performance overhead
  • Vocabularies can unintentionally and unknowingly drift over time
  • End-to-end behaviors become more difficult to understand
  • Synonyms may proliferate, causing confusion

Emergent Vocabulary

A nihilistic or evolutionary approach is to let the vocabulary self-organize. In theory, well intentioned people will converge on common terms. In some organizations, allowing natural selection to produce a vocabulary can produce results comparable or better than a top down approach.

While this may seem lazy or unsophisticated to take this approach, it can have advantages. In broad organizations without strict hierarchical structures, this approach can work well; especially if the majority of the staff are experienced developers. It avoids the perception of authoritarianism or ivory tower control and enables a crowdsourced solution. It is also well suited for small organizations that are trying to move quickly or that have short life spans.

Pros:

  • Allows multiple vocabularies to coexist
  • Maximized autonomy and potentially developer satisfaction
  • Enables crowdsourced and naturally selected solution
  • Encourage strong abstraction layers
  • Highly adaptable to rapidly changing domains and environments
  • Minimizes the need for overarching architectural authority
  • Advantageous for fast paced and short lived projects
  • Reduced risk of ossification and stale standards
  • Zero initial cost

Cons:

  • A consistent unambiguous language may take a long time or never be achieved
  • Increased integration cost with different domains and organizations
  • Inconsistent mental models can proliferate
  • Knowledge becomes tribal instead of explicit
  • Incident response and debugging can suffer

Conclusion

In Orwell’s “1984”, Newspeak was used deliberately to erode the meaning of words through ambiguity and a mercurial vocabulary. In software development we can unintentionally arrive in the same place. When the domain language is neglected, a single word can be used for multiple aspects and multiple words can be used for the same things. Clarity is eroded and confusion arises. Avoiding this confusing and fragile state requires ongoing and intentional effort.

A carefully cultivated Ubiquitous Language can anchor the vocabulary used within a system. Complete adherence to a Ubiquitous Language in organizations of more than one person is probably unrealistic. However, the use of enforcement mechanisms, translation mappings, organic evolutionary processes, or a combination of the three can be used to lead to an adoption of a shared truth and allow software to scale with confidence.