使用Java提取tar文件 | Baeldung中文网

概述

在这个教程中，我们将探讨使用Java库来提取tar存档的各种方法。tar格式最初是作为Unix平台上的一个工具，用于打包未压缩的文件。但如今，压缩tar存档与gzip的结合非常普遍。我们将讨论压缩与未压缩tar存档对我们的代码有何影响。

2. 创建实现基类

为了避免重复代码，我们首先创建一个抽象基类，作为我们所有实现的基础。这个类将定义一个名为untar()的抽象方法，用于执行解压操作：

public abstract class TarExtractor {

    private InputStream tarStream;
    private boolean gzip;
    private Path destination;

    // ...

    public abstract void untar() throws IOException;
}

现在，我们为基类定义几个构造函数。主构造函数会接收一个tar存档的InputStream，无论其内容是否被压缩，并指定文件将被提取到的位置：

protected TarExtractor(InputStream in, boolean gzip, Path destination) throws IOException {
    this.tarStream = in;
    this.gzip = gzip;
    this.destination = destination;

    Files.createDirectories(destination);
}

最重要的是，我们使用Files.createDirectories()创建要提取的文件的基本目录结构，这样我们就无需自己创建目标文件夹。为了简化，我们使用一个布尔值来定义存档是否使用gzip，这样就不需要编写代码通过内容来检测实际的文件类型。

在第二个构造函数中，我们将接受一个tar存档的Path，并根据文件名判断它是否被压缩。请注意，这依赖于文件名的准确性：

protected TarExtractor(Path tarFile, Path destination) throws IOException {
    this(Files.newInputStream(tarFile), tarFile.endsWith("gz"), destination);
}

最后，为了简化测试，我们将创建一个类，提供一个从资源文件夹获取tar存档的方法：

public interface Resources {
    
    static InputStream tarGzFile() {
        return Resources.class.getResourceAsStream("/untar/test.tar.gz");
    }
}

这可以是任何用gzip压缩的tar存档。我们只是将其放入方法中以避免“流已关闭”的错误。

3. 使用Apache Commons Compression进行解压

在第一个实现中，我们将使用Apache Commons库的*commons-compress*：

<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-compress</artifactId>
    <version>1.23.0</version>
</dependency>

解决方案涉及实例化一个TarArchiveInputStream，它将接收我们的存档流。如果使用gzip，我们需要在其中嵌套一个GzipCompressorInputStream：

public class TarExtractorCommonsCompress extends TarExtractor {

    protected TarExtractorCommonsCompress(InputStream in, boolean gzip, Path destination) throws IOException {
        super(in, gzip, destination);
    }

    public void untar() throws IOException {
        try (BufferedInputStream inputStream = new BufferedInputStream(getTarStream());
          TarArchiveInputStream tar = new TarArchiveInputStream(
          isGzip() ? new GzipCompressorInputStream(inputStream) : inputStream)) {
            ArchiveEntry entry;
            while ((entry = tar.getNextEntry()) != null) {
                Path extractTo = getDestination().resolve(entry.getName());
                if (entry.isDirectory()) {
                    Files.createDirectories(extractTo);
                } else {
                    Files.copy(tar, extractTo);
                }
            }
        }
    }
}

首先，我们遍历TarArchiveInputStream。为此，我们必须检查getNextEntry()是否返回一个ArchiveEntry。然后，如果是目录，我们在目标文件夹下创建它。这样，在其内部写入文件时就不会出错。否则，我们使用Files.copy()将tar内容复制到我们想要提取的位置。

让我们通过将存档内容提取到任意文件夹中来测试它：

@Test
public void givenTarGzFile_whenUntar_thenExtractedToDestination() throws IOException {
    Path destination = Paths.get("/tmp/commons-compress-gz");

    new TarExtractorCommonsCompress(Resources.tarGzFile(), true, destination).untar();

    try (Stream files = Files.list(destination)) {
        assertTrue(files.findFirst().isPresent());
    }
}

如果我们的存档不使用gzip，在创建TarExtractorCommonsCompress对象时只需传入false即可。另外，请注意GzipCompressorInputStream可以解压除gzip以外的其他格式。

4. 使用Apache Ant进行解压

使用Apache的*ant，我们可以接近纯Java实现，因为我们可以在存档使用gzip*的情况下使用java.util中的GZIPInputStream：

<dependency>
    <groupId>org.apache.ant</groupId>
    <artifactId>ant</artifactId>
    <version>1.10.13</version>
</dependency>

我们的实现将非常相似：

public class TarExtractorAnt extends TarExtractor {

    // standard delegate constructor

    public void untar() throws IOException {
        try (TarInputStream tar = new TarInputStream(new BufferedInputStream(
          isGzip() ? new GZIPInputStream(getTarStream()) : getTarStream()))) {
            TarEntry entry;
            while ((entry = tar.getNextEntry()) != null) {
                Path extractTo = getDestination().resolve(entry.getName());
                if (entry.isDirectory()) {
                    Files.createDirectories(extractTo);
                } else {
                    Files.copy(tar, extractTo);
                }
            }
        }
    }
}

这里的逻辑相同，但我们使用了Apache Ant的TarInputStream和TarEntry而不是TarArchiveInputStream和ArchiveEntry。我们可以像前面的解决方案一样进行测试：

@Test
public void givenTarGzFile_whenUntar_thenExtractedToDestination() throws IOException {
    Path destination = Paths.get("/tmp/ant-gz");

    new TarExtractorAnt(Resources.tarGzFile(), true, destination).untar();

    try (Stream files = Files.list(destination)) {
        assertTrue(files.findFirst().isPresent());
    }
}

5. 使用Apache VFS进行解压

在最后一个示例中，我们将使用Apache的*commons-vfs2，它支持多种文件系统方案，使用单一API。其中之一就是tar*存档：

<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-vfs2</artifactId>
    <version>2.9.0</version>
</dependency>

但由于我们从输入流读取，首先需要将流保存到临时文件，以便稍后生成URI：

public class TarExtractorVfs extends TarExtractor {

    // standard delegate constructor

    public void untar() throws IOException {
        Path tmpTar = Files.createTempFile("temp", isGzip() ? ".tar.gz" : ".tar");
        Files.copy(getTarStream(), tmpTar);

        // ...

        Files.delete(tmpTar);
    }
}

解压完成后我们会删除临时文件。接下来，我们将获取一个FileSystemManager实例，并将文件URI解析为FileObject，然后使用它来遍历存档内容：

FileSystemManager fsManager = VFS.getManager();
String uri = String.format("%s:file://%s", isGzip() ? "tgz" : "tar", tmpTar);
FileObject tar = fsManager.resolveFile(uri);

对于resolveFile()，如果我们使用gzip，我们将构造URI的方式不同，将其前缀为“tgz”（表示tar+gzip），而不是“tar”。最后，我们遍历存档内容，逐个提取文件：

for (FileObject entry : tar) {
    Path extractTo = Paths.get(
      getDestination().toString(), entry.getName().getPath());

    if (entry.isReadable() && entry.getType() == FileType.FILE) {
        Files.createDirectories(extractTo.getParent());

        try (FileContent content = entry.getContent(); 
          InputStream stream = content.getInputStream()) {
            Files.copy(stream, extractTo);
        }
    }
}

由于我们可能会收到乱序的项目，我们将检查条目是否为文件，并在其父目录上调用createDirectories()。这样，我们就不会冒险在创建文件之前先创建其目录。最后，由于entry路径带有开头的斜线，我们将不会像先前的实现那样使用Paths.resolve()来创建目标文件。让我们测试一下：

@Test
public void givenTarGzFile_whenUntar_thenExtractedToDestination() throws IOException {
    Path destination = Paths.get("/tmp/vfs-gz");

    new TarExtractorVfs(Resources.tarGzFile(), true, destination).untar();

    try (Stream files = Files.list(destination)) {
        assertTrue(files.findFirst().isPresent());
    }
}

这个解决方案只适用于已经在项目中使用VFS的情况，因为它需要更多的代码。

6. 总结

在这篇文章中，我们学习了如何使用不同的库来提取tar存档。我们的实现扩展自一个基类，减少了代码量，使其更易于使用。

如往常一样，源代码可在GitHub上找到。

Persistence

REST

Security

概述

2. 创建实现基类

3. 使用Apache Commons Compression进行解压

4. 使用Apache Ant进行解压

5. 使用Apache VFS进行解压

6. 总结