Backing up Android's /data/media (i.e. internal storage) using adb and TWRP

Posted on Sun 30 July 2017 in Android

Recently I had to clone my Nexus 4 to another Nexus 4 device since my Nexus 4's front glass broke :-(.

At first, the task seemed quite straight-forward. Install the latest version of TWRP's recovery on the phone, backup all partitions using TWRP, transfer them to the other device and restore them.

But thing are never as easy as them seem, are they?

Problems with the straight-forward approach

It turns out that creating a backup of the user data partition using TWRP will NOT include /data/media (i.e. your internal storage). That means if you save photos or data on the internal storage (some apps will save data there as well), they will be not included in a TWRP backup.

This is explained in TWRP's FAQ and TWRP's owner also explained why they exclude /data/media from the backup in a heatedly debated GitHub issue on (not) excluding /data/media from TWRP's backups.

There is even an old open issue on GitHub that asks for an implementation of an option to backup /data/media using TWRP, but there has been no updates since Apr 15, 2015 (as of writing this blog post).

Simple command-line solution using adb and tar

Browsing the internet for a solution, I found a nice thread on XDA-developers explaining how to backup data from an Android device booted into TWRP to a computer.

The trick is to use adb exec-out command , an undocumented adb command that works like adb shell but without creating a pseudoterminal (pty), which would otherwise mangle binary data.

To backup /data/media (i.e. internal storage), first reboot your phone into TWRP recovery.

Then execute the following:

adb exec-out 'tar --create --exclude=data/media/0/TWRP \
  data/media/0 2>/backup-errors.txt | \
  gzip' | \
dd of=<backup-name>-$(date +%Y%m%d).tar.gz && \
adb shell cat /backup-errors.txt

replacing <backup-name> with your desired backup name, e.g. tadej-nexus4-sdcard_backup.

Which compression algorithm to use?

A natural question that comes to mind is, which compression algorithm would perform best in an overall sense, i.e. with respect to compression ratio and the time it takes to create and transfer the archive.

Since we are just piping the (compressed) tar archive from the phone to the computer, we can easily swap between creating the compressed archive on the phone or on the computer. Compressing the archive on the computer would not help reducing backup time if the bandwidth is the limit, but it may help reducing the backup size. On the other hand, if compression on the phone is CPU-bound, then performing the compression on the computer will help reduce the backup time.

The variant of the command that compresses files on the computer is:

adb exec-out 'tar --create --exclude=data/media/0/TWRP \
  data/media/0 2>/backup-errors.txt' | \
<compression-command> | \
dd of=<backup-name>-$(date +%Y%m%d).tar.<compression-extension> && \
adb shell cat /backup-errors.txt

where <compression-command> is one of gzip, bzip2 and xz (optionally, with extra arguments) and <compression-extension> is the chosen compression tool's extension, i.e. gz, bz2 or xz.

The results of backing up my phone's internal storage with different compression commands and different execution places are the following (read more details about the data being backed up in the next section ):

Compression command Compression execution Backup size Size compared to best Backup time Time compared to best
without N/A 3558 MiB +4.8% 878 s +61%
gzip on the phone 3428 MiB +1.0% 549 s +1%
gzip --best on the phone 3428 MiB +1.0% 545 s N/A
bzip2 on the phone 3411 MiB +0.5% 4236 s +677%
gzip on the computer 3425 MiB +0.9% 982 s +80%
bzip2 on the computer 3411 MiB +0.5% 824 s +51%
xz on the computer 3402 MiB +0.2% 1566 s +187%
xz --threads=0 on the computer 3404 MiB +0.3% 811 s +48%
xz --best --threads=0 on the computer 3395 MiB N/A 895 s +64%

The overall winner is clearly using gzip and performing compression on the phone. The time needed to perform the backup is the smallest by a large margin while the backup size is only 1% larger compared to the best compression achieved with xz --best.

Note

Benchmarks were conducted on my Nexus 4 running TWRP 3.1.1-0 (twrp-3.1.1-0-mako.img) connected to my HP ZBook 15 G2 workstation running Fedora 25 Workstation.

TWRP 3.1.1-0 is based on OmniROM, a community developed Android derivative, 7.1, which in turn is based on AOSP 7.1.2.

TWRP doesn't use the gzip command provided by BusyBox. Rather, it uses the one provided by pigz, a parallel implementation of gzip.

TWRP's bzip2 and xz commands are provided by Busybox. Note that OmniROM still uses Busybox, while AOSP switched to toybox at the end of 2014.

The xz command provided by this version of BusyBox, v1.22.1 bionic, doesn't support compression. Note that one can pass the -J argument when creating an archive with the tar command but it will silently create an uncompressed tar archive.

Running xz --best --threads=0 requires lots of RAM. More precisely, it requires 674 MiB per thread. For example, on my quad-core CPU with hyperthreading enabled, it requires 5.3 GiB of RAM.

What kind of data was being compressed?

To be fair and transparent when comparing different compression algorithms, it is necessary to also tell something about the data being compressed.

The data being compressed is what I accumulated in my phone's user data partition (i.e. internal storage) over the course of 4 years. Of course, I also deleted data regularly, otherwise I would run of free space.

This is what makes up the 3558 MiBs of my phone's internal storage:

My phone's internal storage usage by file type

The largest portion is used by JPEG images (mostly pictures taken with the phone), followed by MP4 videos (also mostly videos taken with the phone), followed by MPEG and Ogg audio files (my music collection). The rest only occupies 10 % of the phone's internal storage. In my opinion, this can serve as a good real-life benchmark of backing up a phone's internal stoarge.

This also explains why there were so little differences between different compression algorithms in the resulting backup size (tar archive without compression was only 4.8 % larger that the smallest tar archive compressed with xz --best). JPEG, MP4, MPEG audio, i.e. MP3, and Ogg Vorbis are all file formats that use special-purpose compression techniques to reduce the size of files so they cannot be further compressed to any significant extent.

Note

To analyse my phone's internal storage usage by file type I used lemonsqueeze's disk_usage_by_file_type Bash script.

To draw the pie chart, I used matplotlib (source).