1. Introduction
In this tutorial, we’ll discuss the concept of type safety in programming languages.
Type safety in a programming language is an abstract construct that enables the language to avoid type errors.
Every programming language has an implicit level of type safety. So, when we compile a program, the compiler will apply type safety construct to validate types, and it’ll throw an error if we try to assign the wrong type to a variable. Type safety is not only validated at compile-time but it is also validated at the run time of a program.
2. Concept of Type Safety
Type safety in the source code is a programming language control that ensures that any variable access only its authorized memory locations in a well-defined and permissible way. In other words, the type safety feature ensures that the code doesn’t perform any invalid operation on the underlying object.
2.1. Type Error
Let’s first understand the concept of type error. A type error is an error or an undefined behavior that arises when a program attempts to operate on a value on which the operation is undefined. For example, we can treat a Boolean as an integer and perform the addition operation on it. Although, in this case, the result will be undefined, the language raises no compile or runtime errors.
2.2. Type Safety Control
The type safety feature of a programming language is the threshold to which it prevents type errors. The language can either prevent a type error at compile type or runtime.
Let’s understand this concept with a simple C++ example:
int main()
{
/*
* This line will give a compile time error about
* conversion from int to non scalar type
*/
string greeting = 1;
cout << greeting;
/*
* This line will give a compile time error invalid
* conversion from const char* to int
*/
int counter = "fails";
return 0;
}
Here in this example, we are trying to assign an integer value to a string variable. This will result in a compile-time error about illegal type conversion. Similarly, in the same code, we are trying to assign a constant string literal to an integer variable. This would also result in an invalid type conversion error during compile time.
In essence, type safe variables are key pillars of a safe and robust program. This is so because the algorithms that use these variables are rest assured that these variables will only take values from a well-defined domain. Thus, this ensures the integrity and quality of the data and the program.
3. Type Safety and Type Checking
Type safety is a broader term, and we can easily confuse it with some other similar-sounding terms. Let’s differentiate type safety with four of these terms.
3.1. Static Type
The broad concept of type checking can be divided into static and dynamic type checking based on the time of the type checking operation. A compiler that performs static type checking, does type check the operation at compile time. This means that it checks the type of a variable before running it. Some common examples of statically-typed languages include Ada, C, C++, C#, JADE, Java, Fortran, Haskell, ML, Pascal, and Scala.
For example, in the following code, we’ll get an error when we try to assign a character string to an integer variable, count:
int main()
{
int count = 1;
cout << "Value of count is " << count << endl;
/*
* This line will give a compile time error invalid
* conversion from const char* to int
*/
count = "OverThreshold";
return 0;
}
Static types programming languages are generally faster than dynamically types languages. This is so because the compiler knows, before running, the exact data types used when the variable is initialized. Thus, this results in optimized code that not only runs faster but also uses less memory.
3.2. Dynamic Type
A language that supports dynamic type checking performs type checking operations only at runtime. That means it checks the type of variable only while executing it. For example, Python is a dynamically typed language that allows the type of a variable to change over its lifetime. Some other dynamically typed languages are Perl, Ruby, PHP, and Javascript.
For example, in the following Python code, we first assign an inter value to count and then later assign it a string value. We get no error since Python allows dynamic typing:
def main():
count = 1;
print(f"Value of count is {count}")
count = "OverThreshold"
print(f"Value of count is {count}")
main()
Dynamic type checking results in less optimized code and runtime type error, which is likely to occur since it forces runtime checks every time the program executes.
3.3. Strong Type
Static or Dynamic type is independent of strong and weak type. So, a statically typed language can be strongly or weakly typed. A strongly typed language is one in which the type of variable is strongly bound to a specific data type. We may note that most static typed checking languages are strongly typed languages. This is so as these languages define datatype at initialization only.
Next, we ponder the case of strong type in a dynamically typed language. These languages are also strongly typed, but it is implemented differently. Here, these languages infer the type of a variable from the value it holds and then use it to define the datatype of the variable.
3.4. Weak Type
A weakly typed language is one in which the variable type isn’t bound to a specific data type. So, in essence, the variable has a type, but its type constraint is lower than that of a strongly typed programming language.
For example, the following PHP script adds 10 to a string variable str that already stores “candies“, but it outputs 10:
<!DOCTYPE html>
<html>
<body>
<?php
$str = "candies";
$str = $str + 10;
echo ($str);
?>
</body>
</html>
We can conclude that a strongly typed language has a high degree of type safety, whereas a weakly typed language has a low degree of type safety control.
4. Type Safety in Modern Programming Language
Now, let’s revisit type safety in three popular high-level languages, i.e., C++, Python, and Java.
4.1. Type Safety in C++
Although C++ exhibits type safe control in many contexts, it contains several features that are not type safe. We can temporarily change the datatype of a variable by an exclusive cast statement. The main problem in this approach is that it is not dynamically checked for type truthfulness, i.e., type compatibility.
To explain it further, if the values are incompatible, then the compiler will reinterpret the in-memory bit pattern of the expression being cast as if it belonged to the type being cast to. For example, in the following C++ script, in the function func(), we try to cast a pointer to a character to a pointer to a double and then print it. It interprets the variable as a double pointer (claiming memory for a double) and stores 5.0 in it:
int main()
{
char *tmp = new char;
*tmp = 'Y';
cout << "Value of pointer before calling func(): " << (*tmp) << endl;
func(tmp);
return 0;
}
void func(char* char_ptr)
{
double* d_ptr = (double*) char_ptr;
(*d_ptr) = 5.0;
cout << "Value of pointer after cast in func(): " << *d_prt << endl;
}
4.2. Type Safety in Python
Python is a semi type safe language. Python is a dynamically and strongly typed language that has a high degree of type safety control built-in. But the type checking is done only at the run time. So, we can say that Python is not a 100% type safe language.
For example, the following Python code will give a Type error when we run it:
print('a' + 1)
4.3. Type Safety in Java
The Java language, by design, enforces type safety. It implies that Java prevents the programs from accessing memory in inappropriate ways by controlling the memory access of each object. Java does this by using objects (instantiated from classes) to perform operations.
For example, in this Java code, we get an error when we try to cast a String variable to an integer:
public class TypeCastingExample
{
public static void main(String args[])
{
String d = "Nikhil";
System.out.println("Before conversion: "+d);
//converting string type to int data type
int num = (int) d;
System.out.println("After conversion into int: "+num);
}
}
5. Issues with Type Safety
In this section, let’s explore some common issues associated with type safety in programming languages.
5.1. Memory Access
We can say that a char typically requires 1 byte or 8 bits per character, whereas an int typically requires 4 bytes or 32 bits. A type safe language maintains data truthfulness from the cradle to the grave. This means it won’t allow an int (or any other data type) to be inserted into a char at runtime. It’ll usually throw some kind of class cast or out-of-memory exception.
On the other hand, a type unsafe language would allow the insertion of int to a char variable by overwriting existing data in 3 more adjacent bytes of memory. This way, we’ll get undefined behavior in our program when other components try to interpret the adjacent three memory locations.
5.2. Datatype
The main issue here is how the language interprets the datatype without considering their allocated memory. Let’s consider a signed int vs. an unsigned int. Both of them use 32 bits but signed int uses one bit to store the sign. So, in essence, we can have a maximum value of 2,147,483,647 (232 – 1).
In a type unsafe language, we can perform the read operation on all 32 bits. Hence, we’ll get undefined behavior when we read an unsigned integer as a signed integer.
5.3. Speed vs. Safety
Type unsafe languages are generally faster and result in efficient code, but they are prone to memory leaks and security holes. In many situations, overriding type safety constructs in C/C++ help the compiler generate CPU-efficient code. For example, in many scientific applications that demand near real-time response, we usually omit checking array bounds or use garbage collection.
We do this as these are expensive operations and reduce the turnaround time of the application. However, if we incorrectly use non-type safe powerful constructs, it can result in unexpected program behavior and errors that are difficult to detect.
6. Conclusion
In this article, we have gone through in detail the type safety issues in computer programming.
We started by defining the concept of type safety in programming languages. Then we moved to type-checking categories and their benefits. After that, we examined type safety handling in C++, Python, and Java. Ultimately, we looked at some pros and cons of the Type safety level, such as speed and safety.
We conclude this article by saying that type safe languages ensure a high level of type-error identification, and that would, in turn, help the developers to write code that is not only robust but also less vulnerable.