R – Static/strong typing and refactoring

refactoringstatic-typingstrong-typingweak-typing

It seems to me that the most invaluable thing about a static/strongly-typed programming language is that it helps refactoring: if/when you change any API, then the compiler will tell you what that change has broken.

I can imagine writing code in a runtime/weakly-typed language … but I can't imagine refactoring without the compiler's help, and I can't imagine writing tens of thousands of lines of code without refactoring.

Is this true?

Best Answer

I think you're conflating when types are checked with how they're checked. Runtime typing isn't necessarily weak.

The main advantage of static types is exactly what you say: they're exhaustive. You can be confident all call sites conform to the type just by letting the compiler do it's thing.

The main limitation of static types is that they're limited in the constraints they can express. This varies by language, with most languages having relatively simple type systems (c, java), and others with extremely powerful type systems (haskell, cayenne).

Because of this limitation types on their own are not sufficient. For example, in java types are more or less restricted to checking type names match. This means the meaning of any constraint you want checked has to be encoded into a naming scheme of some sort, hence the plethora of indirections and boiler plate common to java code. C++ is a little better in that templates allow a bit more expressiveness, but don't come close to what you can do with dependent types. I'm not sure what the downsides to the more powerful type systems are, though clearly there must be some or more people would be using them in industry.

Even if you're using static typing, chances are it's not expressive enough to check everything you care about, so you'll need to write tests too. Whether static typing saves you more effort than it requires in boilerplate is a debate that's raged for ages and that I don't think has a simple answer for all situations.

As to your second question:

How can we re-factor safely in a runtime typed language?

The answer is tests. Your tests have to cover all the cases that matter. Tools can help you in gauging how exhaustive your tests are. Coverage checking tools let you know wether lines of code are covered by the tests or not. Test mutation tools (jester, heckle) can let you know if your tests are logically incomplete. Acceptance tests let you know what you've written matches requirements, and lastly regression and performance tests ensure that each new version of the product maintains the quality of the last.

One of the great things about having proper testing in place vs relying on elaborate type indirections is that debugging becomes much simpler. When running the tests you get specific failed assertions within tests that clearly express what they're doing, rather than obtuse compiler error statements (think c++ template errors).

No matter what tools you use: writing code you're confident in will require effort. It most likely will require writing a lot of tests. If the penalty for bugs is very high, such as aerospace or medical control software, you may need to use formal mathematical methods to prove the behavior of your software, which makes such development extremely expensive.

Related Solutions

Can someone tell me what Strong typing and weak typing means and which one is better

That'll be the theory answers taken care of, but the practice side seems to have been neglected...

Strong-typing means that you can't use one type of variable where another is expected (or have restrictions to doing so). Weak-typing means you can mix different types. In PHP for example, you can mix numbers and strings and PHP won't complain because it is a weakly-typed language.

$message = "You are visitor number ".$count;

If it was strongly typed, you'd have to convert $count from an integer to a string, usually with either with casting:

$message = "you are visitor number ".(string)$count;

...or a function:

$message = "you are visitor number ".strval($count);

As for which is better, that's subjective. Advocates of strong-typing will tell you that it will help you to avoid some bugs and/or errors and help communicate the purpose of a variable etc. They'll also tell you that advocates of weak-typing will call strong-typing "unnecessary language fluff that is rendered pointless by common sense", or something similar. As a card-carrying member of the weak-typing group, I'd have to say that they've got my number... but I have theirs too, and I can put it in a string :)

The difference between statically typed and dynamically typed languages

Statically typed languages

A language is statically typed if the type of a variable is known at compile time. For some languages this means that you as the programmer must specify what type each variable is; other languages (e.g.: Java, C, C++) offer some form of type inference, the capability of the type system to deduce the type of a variable (e.g.: OCaml, Haskell, Scala, Kotlin).

The main advantage here is that all kinds of checking can be done by the compiler, and therefore a lot of trivial bugs are caught at a very early stage.

Examples: C, C++, Java, Rust, Go, Scala

Dynamically typed languages

A language is dynamically typed if the type is associated with run-time values, and not named variables/fields/etc. This means that you as a programmer can write a little quicker because you do not have to specify types every time (unless using a statically-typed language with type inference).

Examples: Perl, Ruby, Python, PHP, JavaScript, Erlang

Most scripting languages have this feature as there is no compiler to do static type-checking anyway, but you may find yourself searching for a bug that is due to the interpreter misinterpreting the type of a variable. Luckily, scripts tend to be small so bugs have not so many places to hide.

Most dynamically typed languages do allow you to provide type information, but do not require it. One language that is currently being developed, Rascal, takes a hybrid approach allowing dynamic typing within functions but enforcing static typing for the function signature.