Language Design – Do ML-Style Modules Need Packages?

code-reuselanguage-designlanguage-featuresmodulespackages

This is a clarification of a closed question. I've limited the scope as requested.

First, a few definitions, following e.g. A modular module system. Consider any programming language with a selected subtheory of its native type theory. Let a signature be two collections of named types, allowing for polymorphism; the first collection is the imports (also called the givens) and the second collection is the exports (also confusingly called the module type.) A ML-style module is a collection of named sentences in the language which can be judged to have the same types as the exports when in the context of values with the same types as the imports. A ML-style functor is an ML-style module whose imports include module types; the imports and exports of one module are among the imports to another module.

Given all of this, we'll say that a language admits ML-style modules when we can choose a type theory such that its ML-style modules and ML-style functors form a category, we can embed that category in the language's original syntax, and:

  • there is an initial object (an empty signature)
  • there are subobjects (the LSP always holds)

Finally, we'll say that an ML-style package is a named collection of ML-style modules. Packages do not need to obey any compositional rules; they do not have type signatures.

For example, most flavors of ML admit ML-style modules, including SML, OCaml, and ATS. Haskell can do it with Backpack, and Racket can do it with units. Additionally, languages like ECMAScript and Java can be restricted to admit ML-style modules using their existing module systems.

Suppose that a programming language admits ML-style modules, but its ecosystem does not have ML-style packages. Which software engineering tasks — if any — are expensive or impossible without packages?

Best Answer

Modules and packages solve different problems, though they certainly overlap, and can be made nearly synonymous.

  1. Packages provide a critical source of namespacing. Modules aren't guaranteed unique names, and even with the right combination of imports and exports, at some point you'd need disambiguation tools if you are drawing from two modules with the same name. If module names are made universally unique, you can begin to blur these two concepts.

  2. Packages provide control of versioning. This becomes especially critical when multiple modules each depend on different versions of the same module (e.g. module A depends on C-0.8, B on C-1.2). A language's package management system must disambiguate and link modules to their intended targets. If modules carry optional version information in their names, you can begin to blur these two concepts.

  3. Packages provide control of sourcing. The location and method of requisition of a package is all tied up at the package level: do you pull it from GitHub? From a public package repository? From your company's private artifact store? This part of the problem could theoretically be dealt with at the module level, but it's increasingly noisy to do so. However, if there are tools to abstract away package requisition, you can indeed begin to blur these two concepts.

The primary place I've seen this expressed at the moment is in the Unison language - it makes a number of very unusual tradeoffs to get there, but in return you get unique global identifiers for code objects. I'm not sure if it can fully replicate ML modules, but it does solve the 'namespacing' and 'versioning' problems quite compactly. A similar concept may serve well in accomplishing your intended aims.


Related Topic