Zip Download Fails to Unzip in Frontend — Fix

8月 8, 2021 · 4 分钟阅读时长 · 1899 字 · -阅读 -评论

Downloading a zip via frontend succeeded, but unzip errored “inappropriate file type or format”. Postman could download and unzip fine. Without backend changes, frontend had to fix it. Weekend investigation below.

Error

报错如上

Code

Relevant backend and frontend snippets:

后端

router.get('/download-binary', function (req, res) {
    const currentPath = process.cwd();
    const file = fs.readFileSync(`${currentPath}/static/file/test.zip`, 'binary');
    res.setHeader('Content-Length', file.length);
    res.setHeader('Content-Type', 'application/zip');
    res.write(file, 'binary');
});

前端


    function downloadFileClick(addResponseType = false) {
        const xhr = new XMLHttpRequest();
        xhr.open('get', '/test/download-binary');
        xhr.setRequestHeader("Content-Type", "application/json;charset=UTF-8");
        xhr.onload = () => {
            if (xhr.status === 200) {
                saveFile(xhr.response);
            }
        }
        xhr.send();
    }
    
    function saveFile(content) {
        const a = document.createElement("a");
        const file = new Blob([content], {type: 'application/zip'});
        a.href = URL.createObjectURL(file);
        a.download = 'test.zip';
        a.click();
    }

Solution

At first I suspected Blob construction issues — tried different types/encodings — no luck. MDN reminded me that XHR supports responseType, so I added:

xhr.responseType = 'blob';

Unzipping then worked reliably. Conclusion: when the backend returns binary data, set xhr.responseType = 'blob'. A new‑tab navigation download doesn’t hit this because the browser handles it.

问题虽然解决，但为什么这样就行？这里整体分析下。

分析

xhr.responseType

MDN文档上关于responseType这样描述

XMLHttpRequest.responseType 属性是一个枚举类型的属性，返回响应数据的类型。它允许我们手动的设置返回数据的类型。如果我们将它设置为一个空字符串，它将使用默认的"text"类型。

断点查看下xhr.responseType设定对于返回response体的影响

默认文本
blob

可以看出response类型有区别

因此可以说后端返回的response[content-type]在xhr中并没有用，仍然需要手动指定。但response[content-type]一点用都没有？错。

Response Headers[Content-Type]

尝试修改response[content-type]来对比测试

chrome下预览返回结果，

对于二进制还是文本，预览是chrome都会文本形式进行预览，所以直观来看内容一样
但是在JS中仍然会有影响，如果后端返回文本形式的内容，最终下载解压仍然失败。
Content-Length在文本形式返回及二进制返回大小并不相同，比如这里如果是Text，长度是350，而Blob是318

所以结论就是前端xhr要明确responseType同时后端需要response[content-type]设置二进制返回，缺一不可。

Postman

使用postman测试，确保后端代码返回二进制数据，前端代码不指定responseType下解压报错。而postman却解压正常。

原因就容易解释了，浏览器异步请求需要指定XHR来确保返回数据正确进行格式解析，而postman请求的内部实现无论是利用的浏览器异步还是别的程序编程，总之正确的解析了返回的二进制数据，因此没问题。

DataURL

当然除了上述办法外还有一个办法即DataURL，这也是在一开始我还没解决找到解压根本原因前的解决方案。

后端

router.get('/download-base64', function (req, res) {
    const currentPath = process.cwd();
    const content = fs.readFileSync(`${currentPath}/static/file/test.zip`, 'base64');
    res.json({content});
});

前端

     function downloadFileBase64Click() {
        const xhr = new XMLHttpRequest();
        xhr.open('get', '/test/download-base64');
        xhr.setRequestHeader("Content-Type", "application/json");
        xhr.onload = () => {
            if (xhr.status === 200) {
                saveBase64File(JSON.parse(xhr.response).content);
            }
        }
        xhr.send();
    }
    function saveBase64File(content) {
        const a = document.createElement("a");
        a.href = 'data:application/zip;base64,' + content;
        a.download = 'test.zip';
        a.click();
    }

如上即可解决。

需要注意的是，采用base64，后端返回的内容长度是438，相较二进制的318，长度会变大。

查资料了解：Base64编码的数据体积通常是原数据的体积4/3

Binary vs DataURL

搞清楚了问题所在，且了解到有两种方案，那哪种方案更好呢。首先了解下利弊

Base64编码后数据体积变大
对于文件，采用Base64编码一般是用于降低HTTP会话请求数量

个人觉得如果是异步下载，优先二进制流即可，没必要造成后端编码成本及请求体积增大。

DataURL适用于比如网页加载部分图片等资源，不希望占用HTTP会话，且有些简单的图片资源需要动态生成，那么可以使用DataURL

延伸

utf8 vs utf-8

利用Node来编写下载支持的后端代码时注意到fs.readFileSync的编码有两个类似的值utf8，utf-8。这样傻傻分不清。查了下资料，这里两个值使用哪个均可，准确来说utf8是utf-8的别名。

WHY？因为有些框架下，对于编码常量上，不支持中线，因此才有了utf8这个值，node是跨平台运行环境，对此做了兼容处理。

Wireshark抓包

Chrome下看到的请求无法以16进制/二进制形式查看数据，肉眼看到的只是文本形式，因此觉得类似，但毕竟内容不同，那么具体怎么不同，这里可以使用抓包工具Wireshark来真正查看HTTP响应报文数据。

截图仔细对比16进制的数据，可以看出HTTP请求响应包-文本/二进制数据的内容是不同的。当然如果看右侧的文本形式还是会觉得相同。这点也与chrome中看到的一致
留意红框选中的length值，并不是前面讲到的内容长度350，这是因为Wireshark在包列表页展示的length为以太网帧的长度，这其中当然包含了在chrome看到的请求响应中的content-length即应用层HTTP-请求响应体的内容长度
虽然有返回不同的文本/二进制，但实际的数据传输最终都会是二进制01，这里只是显示为16进制格式而已，在Wireshark下也可以切换为二进制展示

写在最后

关于文中提到的代码demo，可以在这里找到

问题到此为止，结论也清晰了，Mark。