1. Introduction

We often need to convert between a String and byte array in Java. In this tutorial, we’ll examine these operations in detail.

First, we’ll look at various ways to convert a String to a byte array. Then we’ll look at similar operations in reverse.

2. Converting a String to Byte Array

String is stored as an array of Unicode characters in Java. To convert it to a byte array, we translate the sequence of characters into a sequence of bytes. For this translation, *we use an instance of Charset. This class specifies a mapping between a sequence of chars and a sequence of bytes*.

We refer to the above process as encoding.

In Java, we can encode a String into a byte array in multiple ways. Let’s look at each of them in detail with examples.

2.1. Using String.getBytes()

The String class provides three overloaded getBytes methods to encode a String into a byte array:

First, let’s encode a string using the platform’s default charset:

String inputString = "Hello World!";
byte[] byteArrray = inputString.getBytes();

The above method is platform-dependent, as it uses the platform’s default charset. We can get this charset by calling Charset.defaultCharset().

Then let’s encode a string using a named charset:

@Test
public void whenGetBytesWithNamedCharset_thenOK() 
  throws UnsupportedEncodingException {
    String inputString = "Hello World!";
    String charsetName = "IBM01140";

    byte[] byteArrray = inputString.getBytes("IBM01140");
    
    assertArrayEquals(
      new byte[] { -56, -123, -109, -109, -106, 64, -26,
        -106, -103, -109, -124, 90 },
      byteArrray);
}

This method throws an UnsupportedEncodingException if the named charset isn’t supported.

The behavior of the above two versions is undefined if the input contains characters which aren’t supported by the charset. In contrast, the third version uses the charset’s default replacement byte array to encode unsupported input.

Next, let’s call the third version of the getBytes() method, and pass an instance of Charset:

@Test
public void whenGetBytesWithCharset_thenOK() {
    String inputString = "Hello ਸੰਸਾਰ!";
    Charset charset = Charset.forName("ASCII");

    byte[] byteArrray = inputString.getBytes(charset);

    assertArrayEquals(
      new byte[] { 72, 101, 108, 108, 111, 32, 63, 63, 63,
        63, 63, 33 },
      byteArrray);
}

Here we’re using the factory method Charset.forName to get an instance of the Charset. This method throws a runtime exception if the name of the requested charset is invalid. It also throws a runtime exception if the charset is supported in the current JVM.

However, some charsets are guaranteed to be available on every Java platform. The StandardCharsets class defines constants for these charsets.

Finally, let’s encode using one of the standard charsets:

@Test
public void whenGetBytesWithStandardCharset_thenOK() {
    String inputString = "Hello World!";
    Charset charset = StandardCharsets.UTF_16;

    byte[] byteArrray = inputString.getBytes(charset);

    assertArrayEquals(
      new byte[] { -2, -1, 0, 72, 0, 101, 0, 108, 0, 108, 0,
        111, 0, 32, 0, 87, 0, 111, 0, 114, 0, 108, 0, 100, 0, 33 },
      byteArrray);
}

Thus, we have completed the review of the various getBytes versions. Next, let’s look into the method provided by Charset itself.

2.2. Using Charset.encode()

*The Charset class provides encode(), a convenient method that encodes Unicode characters into bytes.* This method always replaces invalid input and unmappable-characters using the charset’s default replacement byte array.

Let’s use the encode method to convert a String into a byte array:

@Test
public void whenEncodeWithCharset_thenOK() {
    String inputString = "Hello ਸੰਸਾਰ!";
    Charset charset = StandardCharsets.US_ASCII;

    byte[] byteArrray = charset.encode(inputString).array();

    assertArrayEquals(
      new byte[] { 72, 101, 108, 108, 111, 32, 63, 63, 63, 63, 63, 33 },
      byteArrray);
}

As we can see above, unsupported characters have been replaced with the charset’s default replacement byte 63.

The approaches we have used so far use the CharsetEncoder class internally to perform encoding. Let’s examine this class in the next section.

2.3. CharsetEncoder

CharsetEncoder transforms Unicode characters into a sequence of bytes for a given charset. Moreover, it provides fine-grained control over the encoding process.

Let’s use this class to convert a String into a byte array:

@Test
public void whenUsingCharsetEncoder_thenOK()
  throws CharacterCodingException {
    String inputString = "Hello ਸੰਸਾਰ!";
    CharsetEncoder encoder = StandardCharsets.US_ASCII.newEncoder();
    encoder.onMalformedInput(CodingErrorAction.IGNORE)
      .onUnmappableCharacter(CodingErrorAction.REPLACE)
      .replaceWith(new byte[] { 0 });

    byte[] byteArrray = encoder.encode(CharBuffer.wrap(inputString))
                          .array();

    assertArrayEquals(
      new byte[] { 72, 101, 108, 108, 111, 32, 0, 0, 0, 0, 0, 33 },
      byteArrray);
}

Here we’re creating an instance of CharsetEncoder by calling the newEncoder method on a Charset object.

Then we’re specifying actions for error conditions by calling the onMalformedInput() and onUnmappableCharacter() methods*.* We can specify the following actions:

  • IGNORE – drop the erroneous input
  • REPLACE – replace the erroneous input
  • REPORT – report the error by returning a CoderResult object or throwing a CharacterCodingException

Furthermore, we’re using the replaceWith() method to specify the replacement byte array.

Thus, we have completed the review of various approaches to convert a String to a byte array. Next, let’s look at the reverse operation.

3. Converting a Byte Array to String

We refer to the process of converting a byte array to a String as decoding. Similar to encoding, this process requires a Charset.

However, we can’t just use any charset for decoding a byte array. In particular, we should use the charset that encoded the String into the byte array.

We can also convert a byte array to a String in many ways. Let’s examine each of them in detail.

3.1. Using the String Constructor

The String class has a few constructors which take a byte array as input. They’re all similar to the getBytes method, but work in reverse.

So let’s convert a byte array to String using the platform’s default charset:

@Test
public void whenStringConstructorWithDefaultCharset_thenOK() {
    byte[] byteArrray = { 72, 101, 108, 108, 111, 32, 87, 111, 114,
      108, 100, 33 };
    
    String string = new String(byteArrray);
    
    assertNotNull(string);
}

Note that we don’t assert anything here about the contents of the decoded string. This is because it may decode to something different, depending on the platform’s default charset.

For this reason, we should generally avoid this method.

Then let’s use a named charset for decoding:

@Test
public void whenStringConstructorWithNamedCharset_thenOK()
    throws UnsupportedEncodingException {
    String charsetName = "IBM01140";
    byte[] byteArrray = { -56, -123, -109, -109, -106, 64, -26, -106,
      -103, -109, -124, 90 };

    String string = new String(byteArrray, charsetName);
        
    assertEquals("Hello World!", string);
}

This method throws an exception if the named charset is not available on the JVM.

Next, let’s use a Charset object to do decoding:

@Test
public void whenStringConstructorWithCharSet_thenOK() {
    Charset charset = Charset.forName("UTF-8");
    byte[] byteArrray = { 72, 101, 108, 108, 111, 32, 87, 111, 114,
      108, 100, 33 };

    String string = new String(byteArrray, charset);

    assertEquals("Hello World!", string);
}

Finally, let’s use a standard Charset for the same:

@Test
public void whenStringConstructorWithStandardCharSet_thenOK() {
    Charset charset = StandardCharsets.UTF_16;
        
    byte[] byteArrray = { -2, -1, 0, 72, 0, 101, 0, 108, 0, 108, 0,
      111, 0, 32, 0, 87, 0, 111, 0, 114, 0, 108, 0, 100, 0, 33 };

    String string = new String(byteArrray, charset);

    assertEquals("Hello World!", string);
}

So far, we have converted a byte array into a String using the constructor, and now we’ll look into the other approaches.

3.2. Using Charset.decode()

The Charset class provides the decode() method that converts a ByteBuffer to String:

@Test
public void whenDecodeWithCharset_thenOK() {
    byte[] byteArrray = { 72, 101, 108, 108, 111, 32, -10, 111,
      114, 108, -63, 33 };
    Charset charset = StandardCharsets.US_ASCII;
    String string = charset.decode(ByteBuffer.wrap(byteArrray))
                      .toString();

    assertEquals("Hello �orl�!", string);
}

Here, the invalid input is replaced with the default replacement character for the charset.

3.3. CharsetDecoder

Note that all of the previous approaches for decoding internally use the CharsetDecoder class. We can use this class directly for fine-grained control on the decoding process:

@Test
public void whenUsingCharsetDecoder_thenOK()
  throws CharacterCodingException {
    byte[] byteArrray = { 72, 101, 108, 108, 111, 32, -10, 111, 114,
      108, -63, 33 };
    CharsetDecoder decoder = StandardCharsets.US_ASCII.newDecoder();

    decoder.onMalformedInput(CodingErrorAction.REPLACE)
      .onUnmappableCharacter(CodingErrorAction.REPLACE)
      .replaceWith("?");

    String string = decoder.decode(ByteBuffer.wrap(byteArrray))
                      .toString();

    assertEquals("Hello ?orl?!", string);
}

Here we’re replacing invalid inputs and unsupported characters with “?”.

If we want to be informed in case of invalid inputs, we can change the decoder:

decoder.onMalformedInput(CodingErrorAction.REPORT)
  .onUnmappableCharacter(CodingErrorAction.REPORT)

4. Conclusion

In this article, we investigated multiple ways to convert a String to a byte array, and vice versa. We should choose the appropriate method based on the input data, as well as the level of control required for invalid inputs.

As usual, the full source code can be found over on GitHub.