1. Overview

In this tutorial, we’re going to review how to convert binary input streams to text data in Scala. At a low level, Scala uses standard Java classes from the package java.io. However, it also offers some utilities making the work with them easier.

For instance, the class Source, inside the scala.io package, provides a smooth conversion from java.io.InputStream instances to string data, so we’ll also explore its features.

2. Creation of Source Instances

The Source companion object offers convenient factory methods to create its instances corresponding to different InputStream implementations. Let’s explore some of the most typical uses.

To illustrate, we can read read remote data via the method fromUrl:

lazy val sourceFromUrl: Source = Source.fromURL("https://google.com")

In this particular example, it fetches the Google home page content, but, in practice, it could be used to query JSON from some API for further usage.

To read data from our application classpath, we can use fromResource:

lazy val sourceFromClassPath: Source = Source.fromResource("com.baeldung.scala.io/file_in_classpath.txt")

The typical use case of this method is reading test data from files placed into the resources folder.

Another Source factory method fromFile allows us to read content from a file:

lazy val sourceFromFile: Source = Source.fromFile("./some_text_file")
lazy val sourceFromFileWithCustomEncoder: Source = Source.fromFile("./some_text_file", enc = "Cp1252")

As we can see in the code example, it contains several overloaded versions. While the first one uses a default UTF-8 encoder to convert bytes to text, the second one takes a provided custom encoder.

Besides the ones listed in the examples above, Source also contains several other useful factory methods, which we can find in the class source code.

3. Processing String Data in Source

When we obtain a Source instance, we have several options. First, we can eagerly convert the whole underlying stream data to a plain String with the mkString method:

val oneLineSource = Source.fromResource("com.baeldung.scala.io/one_line_string.txt")
try {
  oneLineSource.mkString shouldEqual "One line string"
} finally {
  oneLineSource.close()
}

Secondly, if we want more fine-grained processing, we can use the method getLines. This method returns an Iterator[String] instance, so we can process all the lines sequentially:

 // Every line in the test file starts with the 'String' prefix
 val fourLinesSource = Source.fromResource("com.baeldung.scala.io/four_lines_string.txt")
 try {
   fourLinesSource.getLines().foreach(line => assert(line.startsWith("String")))
 } finally {
  fourLinesSource.close()
}

Also, it is important to highlight the usage of a try/finally block in both examples above. In order to prevent file or socket descriptor leakages, when source data processing has finished, we should release resources properly by calling the close() method.

4. Dealing with Large or Infinite InputSreams

Unfortunately, when working with different data sources, not all data we encounter is small enough to fit into memory. So, the eager approach will not work in such cases.

Happily, Source methods returning iterators are memory-safe, and they can be used to work with large or even infinite streams. Moreover, the class Source itself is an implementation of the Iterator[Char] trait. Thus, we can use its methods for partial processing of infinite streams:

val infiniteSource = Source.fromIterable {
  new Iterable[Char] {
    override def iterator: Iterator[Char] = Iterator.continually('A')
  }
}
infiniteSource.slice(100000, 100010).mkString shouldEqual "AAAAAAAAAA"

Though the above example is purely synthetic, it illustrates the approach applicable to some real-world sources like genomic data files, which deal with large-scale data.

5. Conclusion

In this tutorial, we explored how some of the features of the scala.io.Source class help to make conversions from InputStream data sources to text data easier in Scala.

As usual, the source code with all the examples is available over on GitHub.


« 上一篇: Scala中的伴生对象
» 下一篇: Scala地图指南