Data Types – What is a Primitive Data Type?

data types

My understanding of a primitive datatype is that

It is a datatype provided by a language implicitly (Others are user defined classes)

So different languages have different sets of datatypes which are considered primitive for that particular language. Is that right?

And what is the difference between a "basic datatype" and "built-in datatype". Wikipedia says a primitive datatype is either of the two.

PS – Why is "string" type considered as a primitive type in SNOBOL4 and not in Java ?

Best Answer

It kind of depends on the language.

For example, in languages like C and C++, you have a number of built-in scalar types - int, float, double, char, etc. These are "primitive" in the sense that they cannot be decomposed into simpler components. From these basic types you can define new types - pointer types, array types, struct types, union types, etc.

Then you have a language like old-school Lisp, where everything is either an atom or a list. Again, by the above definition, an atom is "primitive" in the sense that it cannot be decomposed into something simpler.

Edit

As far as I'm concerned, the terms "primitive", "basic", and "built-in" are pretty much interchangeable. If you want to get really pedantic, though, you can distinguish between types that are "built-in" (those explicitly provided by the language definition) and types derived from the built-in types that are still "primitive" or "basic" in that they cannot be decomposed into simpler elements. C's typedef facility allows you to create new type names for existing types. Ada allows you to create new scalar types that have constraints on them. For example, you can derive a Latitude type from the built-in floating type, with the constraint that it can't take on values outside the range [-90.0, 90.0]. It's still a primitive or basic type in that it cannot be broken down into any simpler components, but since it's user-defined, it's not considered a "built-in" type.

Again, these concepts are a little fuzzy, and it really depends on context. For example, the notion of a "built-in" type is meaningless for a typeless language like BLISS.

Values

Values are the concrete pieces of data that programs evaluate and juggle. Nothing fancy, some examples might be

1
true
"fizz buzz foo bar"

Types

A nice description for a type is "a classifier for a value". A type is a little bit of information about what that value will be at runtime, but indicated at compile time.

For example if you tell me that e : bool at compile time, and I'll know that e is either true or false during runtime, nothing else! Because types classify values nicely like this, we can use this information to determine some basic properties of your program.

For example, if I ever see you adding e and e' when e : int and e' : String, then I know something is a bit off! In fact I can flag this and throw an error at compile time, saying "Hey, that doesn't make any sense at all!".

A more powerful type system allows for more interesting types which classify more interesting values. For example, let's consider some function

f = fun x -> x

It's pretty clear that f : Something -> Something, but what should that Something be? In a boring type system, we'd have to specify something arbitrary, like Something = int. In a more flexible type system, we could say

f : forall a. a -> a

That is to say "for any a, f maps an a to an a". This let's us use f more generally and write more interesting programs.

Moreover, the compiler is going to check actually satisfying the classifier we've given it, if f = fun x -> true then we have a bug and the compiler will say so!

So as a tldr; a type is a compile time constraint on the values an expression can be at runtime.

Type Constructor

Some types are related. For example a list of integers is very similar to a list of strings. This is almost like how sort for integers is almost like sort for strings. We can imagine a sort of factory that builds these almost-the-same types by generalizing over their differences and building them upon demand. That's what a type constructor is. It's kind of like a function from types to types, but a little more limited.

The classic example is a generic list. A type constructor for is just the generic definition

 data List a = Cons a (List a) | Nil

Now List is a function which maps a type a to a list of values of that type! In Java-land I think these are perhaps called "generic classes"

Type Parameters

A type parameter is just the type passed to a type constructor (or function). Just like in the value level we'd say foo(a) has a parameter a just like how List a has a type parameter a.

Kinds

Kinds are a bit tricky. The basic idea is that certain types are similar. For example, we have all the primitive types in java int, char, float... which all behave as if they have the same "type". Except, when we're speaking of the classifiers for types themselves, we call the classifiers kinds. So int : Prim, String : Box, List : Boxed -> Boxed.

This system gives nice concrete rules about what sort of types we can use where, just like how types govern values. It'd clearly be nonsense to say

 List<List>

 List<int>

In Java since List needs to be applied to a concrete type to be used like that! If we look at their kinds List : Boxed -> Boxed and since Boxed -> Boxed /= Boxed, the above is a kind error!

Most of the time we don't really think about kinds and just treat them as "common sense", but with fancier type systems it's something important to think about.

A little illustration of what I've been saying so far

 value   : type : kind  : ...
 true    : bool : Prim  : ...
 new F() : Foo  : Boxed : ...

Better Reading Than Wikipedia

If you're interested in this sort of thing, I'd highly recommend investing a good textbook. Type theory and PLT in general is pretty vast and without a coherent base of knowledge you (or at least I) can wander around without getting anywhere for months.

Two of my favorite books are

Types and Programming Language - Ben Pierce
Practical Foundations of Programming Languages - Bob Harper

Both are excellent books that introduce what I've just talked about and much more in beautiful, well explained detail.

Can an interpreted statically typed language be considered type safe

Statically typed language: a language that does not allow you to change the type of a variable at run-time.

That's not the definition of statically typed. Statically typed means that type checking (and type inference) happens before runtime. The opposite is dynamic typing, where type checking happens at runtime.

It is perfectly possible to design a statically typed language in which either identifiers can change types or types can change.

Type safety: type safety means that you are not allowed to mix incompatible data types. For example, you cannot assign a float to an int, and you cannot assign an int to a function pointer, and you cannot add a user-defined object to an int (unless you use operator overloading), etc.

There is no universally accepted definition of type safety. Yours is a very sensible one.

Back to my question, let's say that there is an interpreted statically typed language,

There is no such thing as an interpreted statically typed language, simply because there is no such thing as an interpreted language, period. Interpretation and compilation are traits of the, well, interpreter or compiler (duh!), i.e. the implementation, not the language. Every language can be implemented with an interpreter and every language can be implemented with a compiler. In fact, many languages have both interpreted and compiled implementations. For example, Haskell has several compiled implementations (Ghc, Jhc, Yhc, Lhv) and an interpreted implementation (Hugs). ECMAScript has pure compiled implementations (V8), and hybrid mixed-mode implementations (e.g. SpiderMonkey AOT-compiles ECMAScript to SpiderMonkey bytecode, then both compiles and interprets this bytecode)

Saying that a language is an "interpreted language" is not just wrong, it is even more than wrong, it is simply non-sensical. The idea of "language" and the idea of "interpretation" live on two different levels of abstraction. If English were a typed language, "interpreted language" would be a type error.

in such a language I can write code that assigns a float to an int, but when this code is executed, a (run-time) type error will occur.

You are inconsistent with yourself here.

You say "statically typed" means "does not allow to change the type of a variable", which means you can not write such code. But now you say that you can write such code in a statically typed language.

But if you can write such code, then by your own definition, it is not statically typed. And if it is statically typed, then by your own definition, you cannot write such code.

So, it's not surprising that you get a contradiction, because you are contradicting yourself.

Is such a language considered to be type safe, or are type safe languages can only be statically typed and compiled, and so type errors must be caught at the compilation stage?

Whether or not type safe languages can only be statically typed or not really only depends on how you define "type safe". Another common definition of "type safe" is that you cannot subvert the type system. Ruby is dynamically typed, but it is usually described as type safe. On the other hand, C is statically typed but nobody would describe it as type safe.

As for whether a type safe language can only be compiled, I already explained above that compilation is an implementation detail, it is not a property of the language itself.