Back to blog list Back to blog list
Bitcoin
Backups
History

Published on Sun, Jan 15, 2023 by Will Clark

State of the art for Bitcoin wallet backups

A historical inquiry on wallet backups since the very first Bitcoin release to the latest most featureful wallets.

Blog image cover
Midjourney-generated image of Alice managing her backups in the land of sardine wizards

When making a Bitcoin wallet backup we want to have the optimal combination of the following properties to suit our particular needs and risk profile:

Redundancy
How many distinct full copies of the backup exist.

Security
The difficulty for a 3rd party to acquire a full copy of the backup.

Complexity
… of the system used to create, secure and restore the backup.

Robustness
The ability of the backup to survive adverse political, social, temporal and environmental conditions.

Cross-compatibility
Whether the backup can be used with different wallets.

Whilst none of these attributes are diametrically opposed it is often the case that trying to yield better results from one will require compromise of another.

Determinism vs non-determinism

We will talk about certain operations being deterministic or non-deterministic, which can be defined as the following:

Deterministic
MUST produce the same output each time from the same input.

Non-deterministic
MAY produce a different output each time from the same input.

Private keys can be generated using either deterministic or non-deterministic methods, as we’ll discuss later.

Addresses are derived deterministically from the public key(s), locking scripts or a combination of both.

Determinism and Bitcoin scripts

Although address generation itself is deterministic, from a backup perspective it should still be considered essential for the backup to also contain the scripts and derivation paths used by the wallet. Without these, a wallet restored from the backup will not know that coins sent to an address might actually belong to it, nor how it can spend those coins!

There are some “shortcuts” available here in the form of “script templates”:

  • “Standard” address types (e.g. P2PKH, P2WPK, P2WPKH) are essentially templates of a certain scripts. Wallets can use these templates to brute-force derivation of common address types from a given set of keys.

It is important to note that these shortcuts are essentially relying on implied information for restoration, and are not actually using a backup which explicitly contains all of the necessary data. This gives backups of this type lower robustness. If at some time in the future knowledge of this implied information was ever lost, backups relying on it would be useless or require large, possibly infeasible, amounts of brute-forcing.

See the section on Output Descriptors for more detail on how a backup can explicitly contain the scripts one or more keys are being used in.

A history of Bitcoin backups

We are going to follow the journey of Alice who has been using Bitcoin since its earliest days, and who has been trying to keep good backups of her wallet along the way.

Ideally Alice would have redundant, secure, robust and low-complexity backups which, in the event of her being unavailable, might also be able to be accessed safely and easily by her chosen benefactors.

Early Bitcoin software

Backups of the earliest versions of the Bitcoin Software (the Satoshi client, today called “Bitcoin Core”) were a rudimentary affair. All Alice could do as best practice was to backup the wallet.dat file – created and actively used by the software wallet – periodically. While this system benefits from a low level of complexity, it was generally not secure, robust, nor particularly redundant – unless Alice manually made many copies.

When sending or receiving a transaction the wallet would generate new keys on the fly. This meant that every time Alice made a transaction or generated a new receive address she would have to make a new back up of her updated wallet.dat file containing the new keys which were not present in previous backups.

In the event that Alice had been diligently making multiple redundant copies of her wallet.dat file, this means that she would need to update all her copies after every transaction too. To say that this was both error-prone and user-unfriendly would be an understatement.

One advantage of backing up this wallet.dat file is that transaction and wallet metadata is also backed up. This includes “labels” assigned to addresses, so that Alice can remember where each coin (or more specifically UTXO) came from or was being sent to. More on this point later.

Key Pools

Later a “Key Pool” was added to the wallet. This buffered 100 keys (by default) for future transactions and meant that technically backups were no longer needed after every transaction.

The wallet would generate 100 random, non-deterministic private keys which it would save in the wallet.dat file. When Alice made a transaction (either sending or receiving) the wallet would select a key from the key pool and use it to derive a new change or receive address for her. Immediately following this operation another key would be generated to keep the pool topped up to the target number of 100 unused addresses. Note that this newly-generated key would not be in Alice’s current backup, but the other remaining 99 would be.

With the introduction of the Key Pool Alice’s backups were likely to be slightly less error-prone, however she now needed to keep track of the status of her keypool. Backup complexity is essentially unchanged as Alice has to be sure that she updates all copies of her backup in lockstep otherwise she risks compromising her redundancy.

This still meant that she needed to make regular backups of her wallet, just optionally at slightly longer intervals. There was guidance at the time stating that you only needed to backup “every 30 transactions or so”. Alice was worried she might lose count of how many transactions since she’d made since her last backup and so opted for continuing to backup after every transaction.

Paper wallets

In mid 2011 Paper Wallets became a hot topic, providing an easy way for users like Alice to have something tangible, with very low complexity, that could be backed up redundantly with ease. Security of paper wallets was very low – anyone obtaining a copy could immediately access and sweep the funds unless the wallet also used a strong BIP38 passphrase.

Paper Wallets generally stored a single private key in Wallet Import Format (WIF) alongside its corresponding P2PKH address, often represented using a QR code for easy scanning using a phone camera or webcam. Once securely generated and printed, Alice could easily send bitcoin to her address at will.

At spending time she would have to import the private key into her wallet software which could derive the corresponding address and scan the blockchain to find matching UTXOs. Using these UTXOs as the transaction inputs, the software wallet could then construct and sign a transaction and broadcast it to the P2P network or mine a block which included it.

The list of shortcomings of paper wallets however, is large.

From a protocol standpoint the biggest issue is that a Paper Wallet realistically only permits backup of a single key, encouraging address reuse which is bad for privacy. In addition, although the bitcoin might be secure whilst it is stored offline (on paper), at spending time you still need to load your private key onto a computer to generate the transaction, leaving you vulnerable to any attacks possible against that computer.

Deterministic Wallets

Also referred to as: non-heirarchical deterministic wallets

In 2011 other Bitcoin wallet implementations started to use forms of deterministic key derivation for their wallets. Under these schemes large numbers of new keys can be generated deterministically from a single randomly-generated “seed”. This provides two main benefits:

  1. A single object can be backed up (the seed)
  2. New addresses can be derived and used for every transaction without invalidating previous backups

The effect is that Alice can increase the redundancy of her backups, whilst preserving best practice of using a new address for every transaction.

Backing up a single seed also started to open up new possibilites related to more robust backup storage mediums as the seed backup did not need to be on a digital medium, although these were not generally explored until BIP39 was introduced later on.

BIP32 and HD wallets

HD
Hierarchical Deterministic

BIP32
Hierarchical Deterministic Wallets

BIP32 from early 2012 sought to improve upon the early deterministic designs and allow derivation of many chains of child keys from a single seed using a pre-determined hierarchy and derivation path syntax:

HD wallet chain derivation

This permitted a single backup to correspond to multiple independant wallets, and in theory even multiple coins, so long as the user remembered and backed up the derivation paths used.

Whilst this did not notably improve the backup situation from Non Heirarchical Deterministic Wallets – both essentially required backup of a single seed – it did provide one of the crucial building blocks for future wallet designs.

Note that one of the creators of BIP32, gmaxwell, later commented that:

BIP32 is not at all a replacement for backups; at most it protects key material. It does nothing to preserve metadata which can be absolutely critical…

and

I somewhat regret inventing the public derivation thats most commonly used now– it’s overused, and in ways that harm security (often by people who do not understand the implications).

Backing up meta-data can be important, perhaps critical, and not covered by the BIP32 scheme.

P2SH

Mid 2012 also saw the introduction of P2SH address type. When using a P2SH address the receiver would hash the (redeem) script they want to encumber the bitcoin, and use this hash to generate an address. They would then be required to reveal the redeem script as part of the ScriptSig only when they wanted to spend the bitcoin in the future. BIP16 provides more information on the motivation.

From a backup standpoint this meant that wallets receiving to P2SH addresses had something new they must back up alongside their private key(s): the unlocking script (commonly called redeemScript) itself.

This data could not be derived deterministically from a seed like master and child private and public keys (and therefore P2PK or P2PKH addresses) could.

We can see in the image below from transactionfee.info that P2SH address usage did not really take off until 2014.

p2sh usage graph

Alice did not use any P2SH addresses herself as writing Bitcoin scripts was too difficult for her with the limited tools available and she was not a programmer.

Hardware wallets, BIP39 and BIP44

BIP39
Mnemonic code for generating deterministic keys

BIP44
Multi-Account Hierarchy for Deterministic Wallets

2013 saw the first release of one of the first “hardware wallets”, the Trezor. The design goals here were simple: provide users with the ability to store their private keys on a secure, offline and airgapped device which could sign valid bitcoin transactions using keys stored on it.

The Trezor used an HD wallet design and to complement this the Trezor team (SatoshiLabs) designed BIP39, which designated how a random seed could be converted to a seed phrase, or mnemonic, in a bi-directional way. This meant that users of Deterministic or HD wallets no longer had to back up a long and unfriendly alphanumeric (or hex) string representing their seed, but could instead simply back up 12 or 24 (English) words.

A seed in hex:

3d1d3da431ce5f2f26a4fd3b481c285edac280cebdbc3fbbf7739c2ccbb73f99acc816687daa86ef51a7fb95c19f170da5aafbf49a9e4177abc4cad13a3c7ca2

A seed phrase:

check day then tiger collect join hotel hawk absorb ginger wash track crowd hero scale

You can generate generate your own BIP39 seed phrases to see more examples of a mnemonic, seed and root key, but do not use one generated from this site with any real money.

BIP39 massively widened the range of possible seed backup solutions, and users did begin to get pretty creative with storage mediums. Alice could now increase the robustness of her backup.

There was however still an issue for hardware wallets in that, whilst they might have been able to in theory generate the private key needed to sign a transaction from their seed, they now might not know which BIP32 derivation path they should be looking at to do so.

BIP44 sought to address this by designating meanings to different parts of the BIP32 derivation descriptor. Following its implementation a hardware wallet could know which BIP32 derivation path (e.g. of the form m/44'/0'/0'/0/0) it should be following and deriving keys and addresses for.

The impact of hardware wallets, BIP39 and BIP44 for Alice, was that she could now securely generate a seed phrase, which could be backed up trivially in a redundant manner. She could generate receive addresses using an online (hot) wallet (via an xPub) and sign transactions she wanted to make securely offline using her hardware wallet.

Seed phrase backup solutions

Following BIP39, many interesting seed storage designs were tested:

1, 2, 3, 4, 5, 6

SegWit

BIP141
Segregated Witness (Consensus layer)

BIP49
Derivation scheme for P2WPKH-nested-in-P2SH based accounts

BIP84
Derivation scheme for P2WPKH based accounts

Before SegWit the following address types needed to be considered by the wallet or signing device and therefore, for backup:

  • P2PK
  • P2PKH
  • P2SH

The SegWit upgrade in 2017 saw the introduction of new standard address scripts, namely Pay-To-Witness-PubKey-Hash (P2WPKH) and Pay-To-Witness-Script-Hash (P2WSH), both of which used a new bech32 address encoding.

This in turn meant that new derivation specifications were needed so that wallets could determine how to derive and spend from addresses of these types from their seed. For P2WPKH this was codified in BIP84.

In addition to this, there was support to “wrap” the new segwit address types into an older (and more compatible) P2SH format, e.g. P2SH-P2WPKH, which would mean that old, un-upgraded wallets would still be able to send to a wallet that wanted to work with SegWit outputs and the benefits they bring. BIP49 specified derivation of P2SH-wrapped P2WPKH addresses.

SegWit did not alter the backup landscape much on its own, but it did begin to make complex scripts cheaper in transaction fee terms, as well as coinciding with a multitude of new Bitcoin applications and services using multisig, threshold multisig and even more complicated scripts.

Like P2SH if Alice wanted to use a custom script as part of a P2WSH address she would need to back up the script seperately, although this time as part of the “witness”.

It also highlighted how the backup situation was becoming exponentially more complex; with each iteration of improvements to Bitcoin came additional information required for backup, something which would need to be addressed.

Taproot, Scripts and Backups

With the introduction of Taproot, which further enabled and incentivised the usage of more complex custom scripts, it seems like a good idea to tabulate what needs backing up…

Bitcoin Core developer Gloria Zhao created a useful chart showing the various address types and their satisfaction requirements, which is shown below with script elements coloured red which require additional backup.

Output Type scriptPubKey scriptSig redeem script witness
P2PK <pubkey>
OP_CHECKSIG
<signature>
P2PKH OP_DUP
OP_HASH160
<pubKeyHash>
OP_EQUALVERIFY
OP_CHECKSIG
<signature>
<public key>
P2SH OP_HASH160
<scriptHash>
OP_EQUAL
<data pushes>
<redeem script>
(arbitrary)
P2WPKH 0
<pubKeyHash>
<signature>
<public key>
P2WSH 0
<witnessScriptHash>
<data pushes>
<witness script>
P2SH-P2WPKH OP_HASH160
<redeemScriptHash>
OP_EQUAL
<redeem script> 0
<pubKeyHash>
<signature>
<public key>
P2SH-P2WSH OP_HASH160
<redeemScriptHash>
OP_EQUAL
<redeem script> 0
<scriptHash>
<data pushes>
<witness script>
P2TR (key path) 1
<public key>
<signature>
P2TR (script path) 1
<public key>
<data pushes>
<script>
<control block>

CAUTION: The table does not cover derivation paths which may also need to be backed up if using BIP32-style HD wallet!

Output descriptors

In late 2017 Bitcoin Core v0.17 included early support for Output Script Descriptors.

The idea was to have a programmatic description of one or a set of outputs which would be generated, often from a wallet seed or xpub. We can look as some examples from the BIP to see what a descriptor looks like:

pkh(02c6047f9441ed7d6d3045406e95c07cd85c778e4b8cef3ca7abac09b95c709ee5)

Describes a P2PKH output with the specified public key.

wpkh(02f9308a019258c31049344f85f89d5229b531c845836f99b08601f113bce036f9)

Describes a P2WPKH (SegWit) output with the specified public key.

pkh([d34db33f/44'0/0']xpub6ERApfZLCkDtcHTcx...BGRjaDMzQLcgJvLJuZZvRcEL/1/*)

Describes a set of P2PKH outputs, but additionally specifies that the specified xpub is a child of a master with fingerprint d34db33f, and derived using path 44'/0'/0'.

The effects of this are that Alice can now back up her wallet seed, derivation path and all information required for a wallet to reproduce the scripts used in a single, cross-compatible object.

There are some drawbacks to this approach however, whilst a descriptor is more complete than a seed, it is also not possible to back it up as easily as 12 or 24 seed words.

Legacy vs descriptor backups for single sig

Alice is considering using a simple single signature setup to secure her coins and wants to evaluate what her backup would look like for this setup with descriptors and legacy seeds.

Seed backup requirements

(If using a standardised derivation path, e.g. BIP49, BIP84)

A seed phrase:

check day then tiger collect join hotel hawk absorb ginger wash track crowd hero scale

Descriptor backup requirements

A single sig descriptor for a series of taproot addresses:

tr(tprv8ZgxMBicQKsPeXo5tpYTymqeW6MVjobp7mBAe/86'/1'/0'/1/*)#en5pdj5a

Single sig conclusion:

In the single sig case, where Alice is using a standardized derviation path, backing up a single seed phrase appears to win out as the easiest approach, as writing 12 English words beats out writing down a descriptor which includes a private key.

However this is reliant on the implication that standardised derivation paths would be used starting from address 0. If this was not the case, Alice’s wallet may not be able to locate and spend funds based on only knowledge of the seed. To combat this risk a website has been created to try and aid users locate their funds in the event their wallet was using non-standard or unusual derviation paths.

This is not the case for the taproot descriptor, which explicitly states the derivation paths to be used.

Legacy vs descriptor backups for a custom spending policy

Alice has decided that she might like her bitcoin to have better redundancy in the event of her death, but does not want to simply share her private keys or seed words with others. She has heard that it’s now possible to compile custom policies using miniscript herself, so she sets out to design a policy with the following properties:

  1. Will allow her to spend funds on her own at any time

And in the event Alice is not able to sign, the others could spend the funds in 3 ways:

  1. Immediately with her parents and lawyer collaborating
  2. Parents can wait until a timeout has elapsed
  3. Lawyer can wait until a timeout has elapsed

The policy could be represented as below (image courtesy of the Bitcoin devkit playground):

alice miniscript policy

For p2wsh addresses this policy in text format would be:

or(pk(Alice),thresh(2,pk(Parents),pk(Lawyer),older(4209492)))

If Alice, her parents and her lawyer were using the following mnemonic seeds and xpubs:

Alice seed phrase:

check day then tiger collect join hotel hawk absorb ginger wash track crowd hero scale

xpub:

[148def1d/0/1h/2]xpub6DNwBKQytFFCcav7YKTxGBVxPBkd1MyhsLEd2MDtn8CLWJsLAUZGtAntKCxQu45xF2Yrj46YzvHRLMAa1LPVhWcJJKryNfWPX8Ubbm4HBAg/18/*

Parents xpub:

[fb68bb01/0/42]xpub69x7KEERUvDHWALWoQ9rFbK5mmLzCXZqDsnAy3gZH6M29PPW5DsKch6gEWLNcivzGdwU9NiqxijNCCysoVRzmUkLehiuiBGEfvPXK87GFTA/*

seed phrase:

enlist civil inherit olympic supply income mad lock pencil transfer sure celery raven zebra then

Lawyer xpub:

[35ea9c73/0/42h]xpub6BFYWjYe8shCWGzn2rL8HTYEfQxN8knqrWkAKng1QN7vFJUJF6ueiacWAzvgCsJsM5myLXVXmtS6pB13kzkoAPcMJT7iY4DyMwFyLP4xt6j/789/987/*

Let’s take a look how Alice would back up her funds using this relatively simple policy she created herself.

Legacy backup requirements

First she would back up her seed phrase as before:

check day then tiger collect join hotel hawk absorb ginger wash track crowd hero scale

In addition to this she would need to back up the script created by her wallet for each new address generated, each one of which would look something like this:

# Alice pubkey
038eeab436808eb353b6af6e77f643bbb30fe7c571ed953c882bf3406875f08c82
OP_CHECKSIG
OP_NOTIF
    # Parent pubkey
    024b9dd6d5293644a386fac67627a2c47c7d937b9bc120e85fc9844f22cc267219
    OP_CHECKSIG
    OP_SWAP
    # Lawyer pubkey
    02c318db4f038fd658bdb43b5ad26da910b31602e3783627f7e28a6bfb1da33e0b
    OP_CHECKSIG
    OP_ADD
    OP_SWAP
    OP_DUP
    OP_IF
        543b40
        OP_CSV
        OP_VERIFY
        OP_ENDIF
    OP_ADD
    OP_PUSHNUM_2
    OP_EQUALVERIFY
    OP_ENDIF
OP_PUSHNUM_1

Alice would also need to make sure she could communicate to her heirs and lawyer which wallets are able to import and export raw scripts like this.

Even more importantly perhaps, whilst any Bitcoin node can validate any legal scripts they see in a transaction, most wallets do not know how to satisfy/spend from arbitrary custom scripts like this, so Alice might need to write the wallet software herself, or as part of her backup communicate which wallet(s) can be used to spend her funds.

Descriptor backup requirements

The policy Alice decided on could be translated into the following ranged descriptor:

wsh(
    t:or_c(
        pk([148def1d/0/1h/2]xpub6DNwBKQytFFCcav7YKTxGBVxPBkd1MyhsLEd2MDtn8CLWJsLAUZGtAntKCxQu45xF2Yrj46YzvHRLMAa1LPVhWcJJKryNfWPX8Ubbm4HBAg/18/*),
        v:thresh(2,
            pk([fb68bb01/0/42]xpub69x7KEERUvDHWALWoQ9rFbK5mmLzCXZqDsnAy3gZH6M29PPW5DsKch6gEWLNcivzGdwU9NiqxijNCCysoVRzmUkLehiuiBGEfvPXK87GFTA/*),
            s:pk([35ea9c73/0/42h]xpub6BFYWjYe8shCWGzn2rL8HTYEfQxN8knqrWkAKng1QN7vFJUJF6ueiacWAzvgCsJsM5myLXVXmtS6pB13kzkoAPcMJT7iY4DyMwFyLP4xt6j/789/987/*),
            snl:older(4209492)
        )
    )
)#66wwtznl

In addition to this, Alice would still need to back up her seed phrase/mnemonic as before:

check day then tiger collect join hotel hawk absorb ginger wash track crowd hero scale

Whilst the descriptor looks a little scary and is still quite a substantial object to back up, it has the following advantages over “legacy” backup format:

  1. Can derive multiple addresses without requiring additional backup steps to eliminate address re-use.
  2. Can be imported and used by any wallet which uses descriptors.

TIP: Like the raw Bitcoin script above the descriptor does not contain any security-critical information and can therefore be backed up “publicly” e.g. in cloud storage or in a password manager, only losing only privacy if discovered.

Importing both the mnemonic seed and descriptor into a descriptor wallet will give the wallet all the information it needs in order to be able to form the required witness script/witness and spend the funds.

Custom spending policy conclusion

In this setup we can clearly see that exporting and backing up raw Bitcoin scripts is a more brittle process than using a descriptor.

In addition to this, Bitcoin wallets that support descriptors will be able to parse, understand, import and, critically, know how to spend funds sent to any Bitcoin address created using any valid descriptor.

This means Alice has peace of mind that in the future her heirs would be able to easily import two descriptors into any wallet and recover her funds, in the event of her being unable to access them.

Final thoughts

Although today many Bitcoin users are content with backing up a single seed phrase, using more advanced locking policies like Alice’s above (or even just a basic multisig without relying on implicit sorted keys) has the potential to give the user much greater security overall by easily permitting timeouts, threshold signature schemes and additional unlocking paths.

Schemes like this are best backed up as a combination of a seed phrase and a descriptor.

If you do choose to switch to using more advanced policies then descriptors have a large advantage over attempting to back up seed phrases and raw Bitcoin scripts. Communicating these policies to heirs and lawyers for usage in the event you are unable to do so yourself is much more failure-resistant.