1. Overview
In this tutorial, we’re going to take a deep look into file IO operations in Scala. We’ll look into reading from files and writing to files.
We’ll also look into some bad practices to avoid when performing any sort of file IO, as well as some errors that can occur if we don’t accurately handle these operations.
2. Writing to a File
Scala doesn’t have any native writing capability, so writing to a file exclusively involves using Java classes. Because Scala is interoperable with Java, this isn’t a problem for us, but there are some slight differences when calling Java code from Scala that will cause some issues when not handled properly.
Let’s take a look at some of the various ways of writing to a file in Scala using Java classes.
2.1. FileWriter
The FileWriter is one of the cleanest ways to write to a file. It has a neat and concise syntax that allows us to write various data types, from characters to strings, into a file.
Here’s an example of using a FileWriter to write a simple message to a file:
val fileWriter = new FileWriter(new File("/tmp/hello.txt"))
fileWriter.write("hello there")
fileWriter.close()
It’s crucial to close the FileWriter after we finish writing, as failure to do this will yield unstable results. In some cases, we may see a part of what was written, while in other cases, we may see nothing.
Closing the FileWriter ensures that all the contents that were to be written are properly flushed and correctly written to the file.
2.2. PrintWriter
The PrintWriter is another clean way to write to a file. It also has a very neat and concise syntax, along with the added advantage of providing an API-like printf and format to help us write a formatted text that the FileWriter class doesn’t provide.
Let’s see how to use a PrintWriter to write formatted text:
val writer = new PrintWriter(new File("data.txt"))
val s = "big"
val numberOfLines = 3000000
writer.printf("This is a %s program with %d of code", s, new Integer(numberOfLines))
writer.close()
Scala provides an f”” interpolator that can help us format strings easily though.
2.3. DataOutputStream
The DataOutputStream class helps us to write Java primitive data types efficiently.
Let’s imagine a scenario where we want to write a very long list of double values. A pretty naive approach is to use a regular FileWriter or a PrintWriter, which technically isn’t a bad idea.
Let’s use PrintWriter to write 10,000 double values to a file:
val printWriter = new PrintWriter(new FileOutputStream(new File("data.txt")))
val random = Random
for (_ <- 1 to 10000) {
printWriter.write(random.nextDouble().toString)
}
printWriter.close()
If we check the size of this file, we see that the file size seems to always be greater than 150KB, but let’s try to write the same double values using a DataOutputStream:
val random = Random
val dataOutputStream = new DataOutputStream(new FileOutputStream(new File("data.txt")))
for (_ <- 1 to 10000){
dataOutputStream.writeDouble(random.nextDouble())
}
dataOutputStream.close()
If we check the size of this file with the same command, we see that the file size is 80KB. The reason why this is way smaller in size than using the PrintWriter class is that the DataOutputStream class doesn’t store the data as a string, as in the case of any of the Writer classes. Instead, it stores those double values as raw binary data.
A double takes 8 bytes of memory, which explains why our file size is exactly 80KB for our 10,000 double values. The PrintWriter class converts the double into a String, which uses more bytes.
Here’s an example that gets the number of bytes used to store the mathematical value of PI as a double versus a String:
val pi = Math.PI
val dataOutputStream = new DataOutputStream(new FileOutputStream(new File("data.txt")))
dataOutputStream.writeDouble(pi)
val piLengthDouble = dataOutputStream.size() // evaluates to 8
dataOutputStream.flush() // clear the buffer
val piLengthString = pi.toString.getBytes.length // evalueates to 17
2.4. Handling Exceptions When Writing to a File
We’ve taken a non-exhaustive look at the various ways we can write data to a file, We saw some areas where we were able to be more efficient when writing data to a file. In this section, we’ll take a look at handling exceptions when using Java classes in Scala.
Let’s take a look at the constructor of a FileWriter in Java 8:
public FileWriter(File file) throws IOException {
super(new FileOutputStream(file));
}
We can clearly see that using this class can throw an IOException. And in fact, using this class in Java forces us to handle the exception that may be thrown.
But that isn’t the case in Scala, as Scala treats all exceptions as runtime exceptions and doesn’t force us to handle any exceptions at all. This indicates that using any of the Java classes to write files could easily lead to an exception being thrown without us pre-handling the error.
Let’s simulate this by creating a read-only file and attempting to write to that file from our Scala code. We can do that in Linux easily with touch data.txt && chmod 444 data.txt. With this command, we’ve created a new file named “data.txt” and have made the file read-only.
Let’s now try to write the text “Hello World!” to that file:
val fileWriter = new FileWriter(new File("data.txt"))
fileWriter.write("Hello World!")
fileWriter.close()
When we run this code, we get an exception:
Exception in thread "main" java.io.FileNotFoundException: data.txt (Permission denied)
at java.io.FileOutputStream.open0(Native Method)
This isn’t the desired behavior. If we’d done this in Java, we would’ve been forced to deal with the error that may have been thrown, but since Scala removes this limitation, we can easily write code that could blow up unexpectedly.
A better way to deal with this is to wrap the write operation in a try/catch block.
Here’s an example of using a Try data structure to handle any exception that may be thrown in the process:
Try {
val fileWriter = new FileWriter(new File("data.txt"))
fileWriter.write("Hello World!")
fileWriter.close()
}.toEither match {
case Left(ex) =>
// handle exception: ex
case Right(_) =>
// write operation was successfull
}
With this approach, we’re sure that any exception that is thrown at runtime is handled.
2.5. Efficiently Writing to a File
Let’s take the same example where we wrote ten thousand double values to a file. Doing this results in making ten thousand system calls and writes to disk which is a little inefficient. If we were doing a single write, we won’t be concerned.
But if we’re going to be doing a lot of writes, we need a more efficient way to reduce multiple system calls to write to disk. One way to do this is to use a BufferedWriter.
A Buffered Writer is more efficient if:
- there are multiple writes between flush or close
- writes are small compared to the buffer size
Here’s an example of using a BufferedWriter to write 10,000 double values to a file:
val bufferedPrintWriter = new BufferedWriter(new PrintWriter(new File("data.txt")))
for (_ <- 1 to 10000) {
bufferedPrintWriter.write(random.nextDouble().toString)
}
bufferedPrintWriter.close()
In this example, we wrap our PrintWriter in a BufferedWriter. The BufferedWriter has a default buffer size of 8,192 bytes or 8KB.
What this means is that whenever we try to write to the file, the text is first written to the buffer, and only when the buffer is full does it write all its data to disk, thus reducing the number of system writes performed.
To put this in perspective, let’s assume that we want to write the value of PI as a string 10,000 times. We’ve seen that PI as a string uses 17 bytes. Without a BufferedWriter, this will result in 10,000 system calls.
But if we used a BufferedWriter, since we need to fill up the buffer before a system call is made, the maximum number of system calls will be (8 * 10,000) / 8192, which is approximately 10.
Although this a contrived example, the principle remains that a BufferedWriter is more efficient when trying to handle multiple writes.
In the case of the DataOutputStream, instead of a BufferedWriter, we can wrap the FileOutputStream inside a BufferedOutputStream.
3. Reading From a File
Unlike writing to a file, Scala provides a native way to handle reading from a file. Scala provides the scala.io.Source.fromFile method for reading from a file, although we can still use reading methods provided by the Java API like FileReader and some other alternatives.
Let’s read some text using the Source.fromFile method:
val fileName = "data.txt"
scala.io.Source.fromFile(fileName).getLines().foreach{ line =>
//do something with line
}
The Source.fromFile method returns a BufferedSource, and its getLines method returns an Iterator that treats any of \r\n, \r, or \n as a line separator, so each element in the sequence is a line from the file.
In this example, we’re using the higher-order function foreach to loop through each of the lines in the text.
Now, let’s see a more functional approach to reading from the file using a for-comprehension:
val fileName = "data.txt"
for (lines <- scala.io.Source.fromFile(fileName).getLines()) {
// do something with lines
}
One very important thing to note is that we are not closing this BufferedSource. If we keep this program running a bit longer by adding a Thread.sleep call, then by running the Linux command lsof | grep “data.txt”, we can observe that our “data.txt” file is left open. Failure to close this BufferedSource can easily lead to leakage of resources as well as data inconsistency or corruption.
Here’s how we can close the BufferedSource immediately after use:
val fileName = "data.txt"
val bufferedSource = scala.io.Source.fromFile(fileName)
for (lines <- bufferedSource.getLines()) {
// do something with lines
}
bufferedSource.close()
In this example, we close the BufferedSource immediately after using it, and thus, avoid leaking resources.
3.1. Handling Exceptions When Reading From a File
Similar to writing files, Scala treats any exception involved in opening a file such as IOException as a runtime exception. This means that we are not warned of any exception that may be thrown at compile-time.
To avoid this scenario, similar to writing files, we should wrap our read operation in a try/catch block.
Let’s see this in action, using the Try data structure to safely read a file:
Try {
val bufferedSource = scala.io.Source.fromFile(fileName)
for (lines <- bufferedSource.getLines()) {
// do something with lines
}
bufferedSource
}.toEither match {
case Left(error) =>
//handle error
case Right(bufferedSource) =>
//close buffered source
bufferedSource.close()
}
In this example, we were able to handle exceptions and still close the file.
3.2. Efficiently Reading From a File
It’s important to note that the Iterator returned by the getLines method on the BufferedSource is a lazy construct, in that not the whole underlying stream is evaluated. In our case, not all the text is immediately read into memory.
We may think that each call to getLines performs a read system call, but if we remember when we talked about BufferedWriter, we said that an internal buffer has to be filled before a write operation is done at the operating system level. In that same way, not every call to getLines results in a system call.
Since we’re using a BufferedSource, an internal buffer is used to read a certain amount of elements (usually 2KB), and each call to getLines reads from that buffer instead.
Only when that buffer is empty does Scala make a system call to read more data.
In some cases, we may want to read the whole string at once. This shouldn’t be a problem if we’re dealing with small text.
Let’s look at an example that reads the whole content of a file into memory at once:
val fileName = "data.txt"
val bufferedSource = scala.io.Source.fromFile(fileName)
val text = bufferedSource.getLines().mkString
bufferedSource.close()
In this example, by calling mkString on the Iterator, we forced our code to read the whole contents of the file into memory at once.
If we’re dealing with very large files, it is not advisable to read the whole file into memory at once, but instead, we should use the Iterator to gradually stream each line of the text and process each line one at a time.
If we try to read a huge file, it greatly increases our chance of running into an OutOfMemoryError.
3.3. Reading From Files Using the Java API
It’s worth noting that it’s also possible to read from files using the Java API.
Here’s an example of using the Java FileReader to read a file:
val fileReader = new BufferedReader(new FileReader(fileName))
def handleRead(line : String) : Unit = {
//handle line that was read
val newLine = fileReader.readLine()
if(newLine != null) // if there are more lines to read
handleRead(newLine)
}
handleRead(fileReader.readLine())
fileReader.close()
In this example, we defined a tail-recursive function to keep reading lines from the file until we reached the very end, represented as a null value.
4. Conclusion
In this article, we’ve seen how to read from and write to files in Scala, as well as how to handle exceptions and avoid memory leakage by closing resources immediately after use.
Code snippets and examples can be found over on GitHub.