Python – the definition of implementation in programing languages? What is CPython

implementationspython

I came across this word "implementation". CPython is one of the most common implementations of Python. What exactly is an implementation?

I researched a bit on how a Python code runs. First, it is compiled and converted into bytecode and then, using PVM or interpreter (not sure), it is converted into machine language which is then executed by CPU.

In all of these procedures, where is CPython (or Jython, PyPy, etc.)?

Are PVM and interpreter the same?

Secondly, I read somewhere that PVM is written in C. Is it true? Is that what it means when they say the implementation is in C?

Best Answer

Think about how a human language works. In theory, there's some kind of document(s) that lays down what the rules are for what constitutes "English". There's a set of definitions for words, rules for how grammar works to assemble those words into meaningful sentences, and so forth. Each of us communicates in English using our own ideas of what those words mean and how English grammar works.

So we can analogize this to programming languages. You have a language like Python. The "people" who "speak" Python in this analogy are not programmers; the "speakers" of Python are what we call "implementations". These are the tools which understand Python and make the computer actually do what the Python instructions say to do.

CPython is one such "speaker" of Python. Once, it was the only Python implementation. CPython is a specific implementation that is largely written in C. Jython is an implementation of Python written in Java. They both effectively do the same thing, understanding the same grammar for the same purpose. But they do it in different ways.

Note that in the first paragraph, I only said that "in theory" languages have some document defining what they mean. That's because there are many human languages that don't have formal rules. And even within formal rules, there will always be dialects, neologisms, and other things as languages evolve naturally. Note that this makes it possible for one person who speaks one language to fail to understand someone who ostensibly speaks the same language.

This happens because the two people, the two "implementations," disagree about what the language actually is.

Programming languages tend to take one of two routes with regard to definition, or "standardization". They can have a de-facto standard or a de-jure standard. Indeed, the word "implementation" in this context means to "implement" a "standard".

Languages with a de-jure standard have a hard document(s) that lays down everything about what the language is. Hopefully, it formally specifies everything about the language, detailing in depth the behavior of every syntactic construct. If two implementations differ in behavior when given the same code, then one of the following is happening:

  1. One or both of them is not implementing the standard correctly.
  2. The standard is poorly specified for that particular piece of language. That is, it doesn't say what the behavior ought to be, or is confusingly worded such that it is ambiguous as to what should actually happen.
  3. The standard explicitly says that the behavior of the code is either defined by the implementation or completely undefined (C++ loves saying this).

Note that there is a huge difference between a formal specification for a language and reference documentation.

A de-facto standard is where you define the language by picking one implementation and saying "whatever that thing says is language X is language X". This means that if two implementations differ, the de-facto standard one is what is correct.

This also means that if the de-facto standard has quirks in it, every implementation of that language must reproduce those quirks. For example, if the de-facto standard has a hard limit of only 32 function parameters for some reason, then implementations of that language which allow more function parameters are technically wrong.

If the de-facto standard has (non-crashing) bugs in it... well, that's the thing: it can't have bugs. "Bug" is defined relative to the standard. And the de-facto standard implementation is the standard; "bugs" in it are therefore features, until the next version comes out that changes things. And such things are, on some level, changes to language features, not bug fixes.

Many languages start with de-facto standards and evolve towards de-jure standards. Both C# and Java started without formal language specifications, but both of them have them now. C++ was an ill-defined mish-mash of stuff until C++98 formalized what it meant to be C++.

Python is kind of half-and-half at this point. It has a language reference document that proports to define the language, but it isn't a "formal specification". From the language reference:

I chose to use English rather than formal specifications for everything except syntax and lexical analysis. This should make the document more understandable to the average reader, but will leave room for ambiguities. Consequently, if you were coming from Mars and tried to re-implement Python from this document alone, you might have to guess things and in fact you would probably end up implementing quite a different language.

So in the case of ambiguity, you generally would defer to CPython's behavior. Then again, there are plenty of formal specifications that have ambiguous areas as well.

Related Topic