1. Overview
In this tutorial, we’ll start by briefly going through some general category types for every defined Unicode code point or character range to understand the difference between letters and alphabetic characters.
Further, we’ll look at the isAlphabetic() and isLetter() methods of the Character class in Java. Finally, we’ll cover the similarities and distinctions between these methods.
2. General Category Types of Unicode Characters
The Unicode Character Set (UCS) contains 1,114,112 code points: U+0000—U+10FFFF. Characters and code point ranges are grouped by categories.
The Character class provides two overloaded versions of the getType() method that returns a value indicating the character’s general category type.
Let’s look at the signature of the first method:
public static int getType(char ch)
This method cannot handle supplementary characters. To handle all Unicode characters, including supplementary characters, Java’s Character class provides an overloaded getType method which has the following signature:
public static int getType(int codePoint)
Next, let’s start looking at some general category types.
2.1. UPPERCASE_LETTER
The UPPERCASE_LETTER general category type represents upper-case letters.
When we call the Character#getType method on an upper-case letter, for example, ‘U‘, the method returns the value 1, which is equivalent to the UPPERCASE_LETTER enum value:
assertEquals(Character.UPPERCASE_LETTER, Character.getType('U'));
2.2. LOWERCASE_LETTER
The LOWERCASE_LETTER general category type is associated with lower-case letters.
When calling the Character#getType method on a lower-case letter, for instance, ‘u‘, the method will return the value 2, which is the same as the enum value of LOWERCASE_LETTER:
assertEquals(Character.LOWERCASE_LETTER, Character.getType('u'));
2.3. TITLECASE_LETTER
Next, the TITLECASE_LETTER general category represents title case characters.
Some characters look like pairs of Latin letters. When we call the Character#getType method on such Unicode characters, this will return the value 3, which is equal to the TITLECASE_LETTER enum value:
assertEquals(Character.TITLECASE_LETTER, Character.getType('\u01f2'));
Here, the Unicode character ‘\u01f2‘ represents the Latin capital letter ‘D‘ followed by a small ‘Z‘ with a caron.
2.4. MODIFIER_LETTER
A modifier letter, in the Unicode Standard, is “a letter or symbol typically written next to another letter that it modifies in some way”.
The MODIFIER_LETTER general category type represents such modifier letters.
For example, the modifier letter small H, ‘ʰ‘, when passed to Character#getType method returns the value of 4, which is the same as the enum value of MODIFIER_LETTER:
assertEquals(Character.MODIFIER_LETTER, Character.getType('\u02b0'));
The Unicode character ‘\u020b‘ represents the modifier letter small H.
2.5. OTHER_LETTER
The OTHER_LETTER general category type represents an ideograph or a letter in a unicase alphabet. An ideograph is a graphic symbol representing an idea or a concept, independent of any particular language.
A unicase alphabet has just one case for its letters. For example, Hebrew is a unicase writing system.
Let’s look at an example of a Hebrew letter Alef, ‘א‘, when we pass it to the Character#getType method, it returns the value of 5, which is equal to the enum value of OTHER_LETTER:
assertEquals(Character.OTHER_LETTER, Character.getType('\u05d0'));
The Unicode character ‘\u05d0‘ represents the Hebrew letter Alef.
2.6. LETTER_NUMBER
Finally, the LETTER_NUMBER category is associated with numerals composed of letters or letterlike symbols.
For example, the Roman numerals come under LETTER_NUMBER general category. When we call the Character#getType method with Roman Numeral Five, ‘Ⅴ’, it returns the value 10, which is equal to the enum LETTER_NUMBER value:
assertEquals(Character.LETTER_NUMBER, Character.getType('\u2164'));
The Unicode character ‘\u2164‘ represents the Roman Numeral Five.
Next, let’s look at the Character#isAlphabetic method.
3. Character#isAlphabetic
First, let’s look at the signature of the alphabetic method:
public static boolean isAlphabetic(int codePoint)
This takes the Unicode code point as the input parameter and returns true if the specified Unicode code point is alphabetic and false otherwise.
A character is alphabetic if its general category type is any of the following:
- UPPERCASE_LETTER
- LOWERCASE_LETTER
- TITLECASE_LETTER
- MODIFIER_LETTER
- OTHER_LETTER
- LETTER_NUMBER
Additionally, a character is alphabetic if it has contributory property Other_Alphabetic as defined by the Unicode Standard.
Let’s look at a few examples of characters that are alphabets:
assertTrue(Character.isAlphabetic('A'));
assertTrue(Character.isAlphabetic('\u01f2'));
In the above examples, we pass the UPPERCASE_LETTER ‘A’ and TITLECASE_LETTER ‘\u01f2’ which represents the Latin capital letter ‘D‘ followed by a small ‘Z‘ with a caron to the isAlphabetic method and it returns true.
4. Character#isLetter
Java’s Character class provides the isLetter() method to determine if a specified character is a letter. Let’s look at the method signature:
public static boolean isLetter(char ch)
It takes a character as an input parameter and returns true if the specified character is a letter and false otherwise.
A character is considered to be a letter if its general category type, provided by Character#getType method, is any of the following:
- UPPERCASE_LETTER
- LOWERCASE_LETTER
- TITLECASE_LETTER
- MODIFIER_LETTER
- OTHER_LETTER
However, this method cannot handle supplementary characters. To handle all Unicode characters, including supplementary characters, Java’s Character class provides an overloaded version of the isLetter() method:
public static boolean isLetter(int codePoint)
This method can handle all the Unicode characters as it takes a Unicode code point as the input parameter. Furthermore, it returns true if the specified Unicode code point is a letter as we defined earlier.
Let’s look at a few examples of characters that are letters:
assertTrue(Character.isAlphabetic('a'));
assertTrue(Character.isAlphabetic('\u02b0'));
In the above examples, we input the LOWERCASE_LETTER ‘a’ and MODIFIER_LETTER ‘\u02b0’ which represents the modifier letter small H to the isLetter method and it returns true.
5. Compare and Contrast
Finally, we can see that all letters are alphabetic characters, but not all alphabetic characters are letters.
In other words, the isAlphabetic method returns true if a character is a letter or has the general category LETTER_NUMBER. Besides, it also returns true if the character has the Other_Alphabetic property defined by the Unicode Standard.
First, let’s look at an example of a character which is a letter as well as an alphabet — character ‘a‘:
assertTrue(Character.isLetter('a'));
assertTrue(Character.isAlphabetic('a'));
The character ‘a‘, when passed to both isLetter() as well as isAlphabetic() methods as an input parameter, returns true.
Next, let’s look at an example of a character that is an alphabet but not a letter. In this case, we’ll use the Unicode character ‘\u2164‘, which represents the Roman Numeral Five:
assertFalse(Character.isLetter('\u2164'));
assertTrue(Character.isAlphabetic('\u2164'));
The Unicode character ‘\u2164‘ when passed to the isLetter() method returns false. On the other hand, when passed to the isAlphabetic() method, it returns true.
Certainly, for the English language, the distinction makes no difference. Since all the letters of the English language come under the category of alphabets. On the other hand, some characters in other languages might have a distinction.
6. Conclusion
In this article, we learned about the different general categories of the Unicode code point. Moreover, we covered the similarities and differences between the isAlphabetic() and isLetter() methods.
As always, all these code samples are available over on GitHub.