Build Your Own Blockchain: A Python Tutorial

Download the full Jupyter/iPython notebook from Github here

Build Your Own Blockchain – The Basics

This tutorial will walk you through the basics of how to build a blockchain from scratch. Focusing on the details of a concrete example will provide a deeper understanding of the strengths and limitations of blockchains. For a higher-level overview, I’d recommend this excellent article from BitsOnBlocks.

Transactions, Validation, and updating system state

At its core, a blockchain is a distributed database with a set of rules for verifying new additions to the database. We’ll start off by tracking the accounts of two imaginary people: Alice and Bob, who will trade virtual money with each other.

We’ll need to create a transaction pool of incoming transactions, validate those transactions, and make them into a block.

We’ll be using a hash function to create a ‘fingerprint’ for each of our transactions- this hash function links each of our blocks to each other. To make this easier to use, we’ll define a helper function to wrap the python hash function that we’re using.

In [1]:
import hashlib, json, sys

def hashMe(msg=""):
    # For convenience, this is a helper function that wraps our hashing algorithm
    if type(msg)!=str:
        msg = json.dumps(msg,sort_keys=True)  # If we don't sort keys, we can't guarantee repeatability!
    if sys.version_info.major == 2:
        return unicode(hashlib.sha256(msg).hexdigest(),'utf-8')
        return hashlib.sha256(str(msg).encode('utf-8')).hexdigest()

Next, we want to create a function to generate exchanges between Alice and Bob. We’ll indicate withdrawals with negative numbers, and deposits with positive numbers. We’ll construct our transactions to always be between the two users of our system, and make sure that the deposit is the same magnitude as the withdrawal- i.e. that we’re neither creating nor destroying money.

In [2]:
import random

def makeTransaction(maxValue=3):
    # This will create valid transactions in the range of (1,maxValue)
    sign      = int(random.getrandbits(1))*2 - 1   # This will randomly choose -1 or 1
    amount    = random.randint(1,maxValue)
    alicePays = sign * amount
    bobPays   = -1 * alicePays
    # By construction, this will always return transactions that respect the conservation of tokens.
    # However, note that we have not done anything to check whether these overdraft an account
    return {u'Alice':alicePays,u'Bob':bobPays}

Now let’s create a large set of transactions, then chunk them into blocks.

In [3]:
txnBuffer = [makeTransaction() for i in range(30)]

Next step: making our very own blocks! We’ll take the first k transactions from the transaction buffer, and turn them into a block. Before we do that, we need to define a method for checking the valididty of the transactions we’ve pulled into the block.

For bitcoin, the validation function checks that the input values are valid unspent transaction outputs (UTXOs), that the outputs of the transaction are no greater than the input, and that the keys used for the signatures are valid. In Ethereum, the validation function checks that the smart contracts were faithfully executed and respect gas limits.

No worries, though- we don’t have to build a system that complicated. We’ll define our own, very simple set of rules which make sense for a basic token system:

  • The sum of deposits and withdrawals must be 0 (tokens are neither created nor destroyed)
  • A user’s account must have sufficient funds to cover any withdrawals

If either of these conditions are violated, we’ll reject the transaction.

In [4]:
def updateState(txn, state):
    # Inputs: txn, state: dictionaries keyed with account names, holding numeric values for transfer amount (txn) or account balance (state)
    # Returns: Updated state, with additional users added to state if necessary
    # NOTE: This does not not validate the transaction- just updates the state!
    # If the transaction is valid, then update the state
    state = state.copy() # As dictionaries are mutable, let's avoid any confusion by creating a working copy of the data.
    for key in txn:
        if key in state.keys():
            state[key] += txn[key]
            state[key] = txn[key]
    return state
In [5]:
def isValidTxn(txn,state):
    # Assume that the transaction is a dictionary keyed by account names

    # Check that the sum of the deposits and withdrawals is 0
    if sum(txn.values()) is not 0:
        return False
    # Check that the transaction does not cause an overdraft
    for key in txn.keys():
        if key in state.keys(): 
            acctBalance = state[key]
            acctBalance = 0
        if (acctBalance + txn[key]) < 0:
            return False
    return True

Here are a set of sample transactions, some of which are fraudulent- but we can now check their validity!

In [6]:
state = {u'Alice':5,u'Bob':5}

print(isValidTxn({u'Alice': -3, u'Bob': 3},state))  # Basic transaction- this works great!
print(isValidTxn({u'Alice': -4, u'Bob': 3},state))  # But we can't create or destroy tokens!
print(isValidTxn({u'Alice': -6, u'Bob': 6},state))  # We also can't overdraft our account.
print(isValidTxn({u'Alice': -4, u'Bob': 2,'Lisa':2},state)) # Creating new users is valid
print(isValidTxn({u'Alice': -4, u'Bob': 3,'Lisa':2},state)) # But the same rules still apply!

Each block contains a batch of transactions, a reference to the hash of the previous block (if block number is greater than 1), and a hash of its contents and the header

Building the Blockchain: From Transactions to Blocks

We’re ready to start making our blockchain! Right now, there’s nothing on the blockchain, but we can get things started by defining the ‘genesis block’ (the first block in the system). Because the genesis block isn’t linked to any prior block, it gets treated a bit differently, and we can arbitrarily set the system state. In our case, we’ll create accounts for our two users (Alice and Bob) and give them 50 coins each.

In [7]:
state = {u'Alice':50, u'Bob':50}  # Define the initial state
genesisBlockTxns = [state]
genesisBlockContents = {u'blockNumber':0,u'parentHash':None,u'txnCount':1,u'txns':genesisBlockTxns}
genesisHash = hashMe( genesisBlockContents )
genesisBlock = {u'hash':genesisHash,u'contents':genesisBlockContents}
genesisBlockStr = json.dumps(genesisBlock, sort_keys=True)

Great! This becomes the first element from which everything else will be linked.

In [8]:
chain = [genesisBlock]

For each block, we want to collect a set of transactions, create a header, hash it, and add it to the chain

In [9]:
def makeBlock(txns,chain):
    parentBlock = chain[-1]
    parentHash  = parentBlock[u'hash']
    blockNumber = parentBlock[u'contents'][u'blockNumber'] + 1
    txnCount    = len(txns)
    blockContents = {u'blockNumber':blockNumber,u'parentHash':parentHash,
    blockHash = hashMe( blockContents )
    block = {u'hash':blockHash,u'contents':blockContents}
    return block

Let’s use this to process our transaction buffer into a set of blocks:

In [10]:
blockSizeLimit = 5  # Arbitrary number of transactions per block- 
               #  this is chosen by the block miner, and can vary between blocks!

while len(txnBuffer) > 0:
    bufferStartSize = len(txnBuffer)
    ## Gather a set of valid transactions for inclusion
    txnList = []
    while (len(txnBuffer) > 0) & (len(txnList) < blockSizeLimit):
        newTxn = txnBuffer.pop()
        validTxn = isValidTxn(newTxn,state) # This will return False if txn is invalid
        if validTxn:           # If we got a valid state, not 'False'
            state = updateState(newTxn,state)
            print("ignored transaction")
            continue  # This was an invalid transaction; ignore it and move on
    ## Make a block
    myBlock = makeBlock(txnList,chain)
In [11]:
{'contents': {'blockNumber': 0,
  'parentHash': None,
  'txnCount': 1,
  'txns': [{'Alice': 50, 'Bob': 50}]},
 'hash': '7c88a4312054f89a2b73b04989cd9b9e1ae437e1048f89fbb4e18a08479de507'}
In [12]:
{'contents': {'blockNumber': 1,
  'parentHash': '7c88a4312054f89a2b73b04989cd9b9e1ae437e1048f89fbb4e18a08479de507',
  'txnCount': 5,
  'txns': [{'Alice': 3, 'Bob': -3},
   {'Alice': -1, 'Bob': 1},
   {'Alice': 3, 'Bob': -3},
   {'Alice': -2, 'Bob': 2},
   {'Alice': 3, 'Bob': -3}]},
 'hash': '7a91fc8206c5351293fd11200b33b7192e87fad6545504068a51aba868bc6f72'}

As expected, the genesis block includes an invalid transaction which initiates account balances (creating tokens out of thin air). The hash of the parent block is referenced in the child block, which contains a set of new transactions which affect system state. We can now see the state of the system, updated to include the transactions:

In [13]:
{'Alice': 72, 'Bob': 28}

Checking Chain Validity

Now that we know how to create new blocks and link them together into a chain, let’s define functions to check that new blocks are valid- and that the whole chain is valid.

On a blockchain network, this becomes important in two ways:

  • When we initially set up our node, we will download the full blockchain history. After downloading the chain, we would need to run through the blockchain to compute the state of the system. To protect against somebody inserting invalid transactions in the initial chain, we need to check the validity of the entire chain in this initial download.
  • Once our node is synced with the network (has an up-to-date copy of the blockchain and a representation of system state) it will need to check the validity of new blocks that are broadcast to the network.

We will need three functions to facilitate in this:

  • checkBlockHash: A simple helper function that makes sure that the block contents match the hash
  • checkBlockValidity: Checks the validity of a block, given its parent and the current system state. We want this to return the updated state if the block is valid, and raise an error otherwise.
  • checkChain: Check the validity of the entire chain, and compute the system state beginning at the genesis block. This will return the system state if the chain is valid, and raise an error otherwise.
In [14]:
def checkBlockHash(block):
    # Raise an exception if the hash does not match the block contents
    expectedHash = hashMe( block['contents'] )
    if block['hash']!=expectedHash:
        raise Exception('Hash does not match contents of block %s'%
In [15]:
def checkBlockValidity(block,parent,state):    
    # We want to check the following conditions:
    # - Each of the transactions are valid updates to the system state
    # - Block hash is valid for the block contents
    # - Block number increments the parent block number by 1
    # - Accurately references the parent block's hash
    parentNumber = parent['contents']['blockNumber']
    parentHash   = parent['hash']
    blockNumber  = block['contents']['blockNumber']
    # Check transaction validity; throw an error if an invalid transaction was found.
    for txn in block['contents']['txns']:
        if isValidTxn(txn,state):
            state = updateState(txn,state)
            raise Exception('Invalid transaction in block %s: %s'%(blockNumber,txn))

    checkBlockHash(block) # Check hash integrity; raises error if inaccurate

    if blockNumber!=(parentNumber+1):
        raise Exception('Hash does not match contents of block %s'%blockNumber)

    if block['contents']['parentHash'] != parentHash:
        raise Exception('Parent hash not accurate at block %s'%blockNumber)
    return state
In [16]:
def checkChain(chain):
    # Work through the chain from the genesis block (which gets special treatment), 
    #  checking that all transactions are internally valid,
    #    that the transactions do not cause an overdraft,
    #    and that the blocks are linked by their hashes.
    # This returns the state as a dictionary of accounts and balances,
    #   or returns False if an error was detected

    ## Data input processing: Make sure that our chain is a list of dicts
    if type(chain)==str:
            chain = json.loads(chain)
            assert( type(chain)==list)
        except:  # This is a catch-all, admittedly crude
            return False
    elif type(chain)!=list:
        return False
    state = {}
    ## Prime the pump by checking the genesis block
    # We want to check the following conditions:
    # - Each of the transactions are valid updates to the system state
    # - Block hash is valid for the block contents

    for txn in chain[0]['contents']['txns']:
        state = updateState(txn,state)
    parent = chain[0]
    ## Checking subsequent blocks: These additionally need to check
    #    - the reference to the parent block's hash
    #    - the validity of the block number
    for block in chain[1:]:
        state = checkBlockValidity(block,parent,state)
        parent = block
    return state

We can now check the validity of the state:

In [17]:
{'Alice': 72, 'Bob': 28}

And even if we are loading the chain from a text file, e.g. from backup or loading it for the first time, we can check the integrity of the chain and create the current state:

In [18]:
chainAsText = json.dumps(chain,sort_keys=True)
{'Alice': 72, 'Bob': 28}

Putting it together: The final Blockchain Architecture

In an actual blockchain network, new nodes would download a copy of the blockchain and verify it (as we just did above), then announce their presence on the peer-to-peer network and start listening for transactions. Bundling transactions into a block, they then pass their proposed block on to other nodes.

We’ve seen how to verify a copy of the blockchain, and how to bundle transactions into a block. If we recieve a block from somewhere else, verifying it and adding it to our blockchain is easy.

Let’s say that the following code runs on Node A, which mines the block:

In [19]:
import copy
nodeBchain = copy.copy(chain)
nodeBtxns  = [makeTransaction() for i in range(5)]
newBlock   = makeBlock(nodeBtxns,nodeBchain)

Now assume that the newBlock is transmitted to our node, and we want to check it and update our state if it is a valid block:

In [20]:
print("Blockchain on Node A is currently %s blocks long"%len(chain))

    print("New Block Received; checking validity...")
    state = checkBlockValidity(newBlock,chain[-1],state) # Update the state- this will throw an error if the block is invalid!
    print("Invalid block; ignoring and waiting for the next block...")

print("Blockchain on Node A is now %s blocks long"%len(chain))
Blockchain on Node A is currently 7 blocks long
New Block Received; checking validity...
Blockchain on Node A is now 8 blocks long

Conclusions and Extensions

We’ve created all the basic architecture for a blockchain, from a set of state transition rules to a method for creating blocks, to mechanisms for checking the validity of transactions, blocks, and the full chain. We can derive the system state from a downloaded copy of the blockchain, validate new blocks that we recieve from the network, and create our own blocks.

The system state that we’ve created is effectively a distributed ledger or database- the core of many blockchains. We could extend this to include special transaction types or full smart contracts.

We haven’t explored the network architecture, the proof-of-work or proof-of-state validation step, and the consensus mechanism which provides blockchains with security against attack. We also haven’t discussed public key cryptography, privacy, and verification steps. More on that in the future!


  1. Very good and simple explanation of the priciple of block chains.
    I followed the instructions step by step and have got now an idea how it might work in reality.
    Thanks for posting this tutorial.
    Perhaps someone has a continuation of this tutorial how to deal with conflicts when multiple miners create blocks at roughly the same time?

  2. In 16 when you try to run other code for txn in chain [0] [‘contents’] [‘txns’] : using python 3 it gives a keyError :’contents’.how can this be solved?

    1. Thanks for checking this in Python3! I just posted an update (also on github) which should address Python 3 compatibility issues, tested on Python 3.5.2

  3. Thank you very much for the outstanding article! I’m just starting with the blockchain and not familiar with the basics – so a question regaring

    [6], transaction validation:

    How to handle wallet balance checks during transactions in the real world, with millions/billions wallets? Can’t really just keep them all in a single “state” variable and cannot iterate whole blockchain for all transactions to track couple wallets. I guess it’s about proof of stake concept – as I can imagine it’s a hash of wallet ID + existing balance sum + hashed private key used to generate wallet ID (otherwise everyone would be able to spend coins in the wallet), but not sure.

    Could you elaborate, please?

    1. With millions/billions of wallets you wouldn’t want to keep the state in-memory, but storing the system state in a local database accessible by your blockchain node is entirely feasible- this is how big blockchain networks (ethereum, bitcoin) work! In bitcoin, the ‘state’ is the set of unsent transaction outputs (UTXOs); when a node initially downloads the blockchain it works from the genesis node forwards to build up a database of current UTXOs which it can reference. In Ethereum, the ‘state’ is the state of the Ethereum Virtual Machine; similarly after downloading a blockchain the node will store the EVM state locally.

      1. Thank you for the explanation! After reading all the math calculations about difficulty of blocks, probability of finding one, deflation pace, profitability of mining etc. I’ve expected blockchain code to resemble AI calculations – but so far all concepts are extremely simple yet ingenious.

        One more question – if I may: how does a wallet prove its authenticity to the node to register a transaction?

        1. Note that we don’t check the authenticity of the wallet per se, but rather the authenticity of the transaction– in our example, checking that the sender actually has funds to cover the transaction. We check this by building up the current state of the blockchain, starting at the genesis block, and saving the blockchain state in local memory or a database (as mentioned in the previous answer). As anyone can create a new wallet on public blockchains by just creating a public/private key pair, there isn’t a need for a wallet to prove its authenticity per se – but in order for a person to submit a valid transaction that is mined into the blockchain, the nodes processing the transaction must see that the wallet’s current state is able to meet all the transaction rules (e.g. have sufficient funds to cover the transaction).
          In permissioned blockchains, there is a ‘whitelist’ of recognized wallets that are allowed to create transactions, which is maintained as part of the blockchain state- but most fully public blockchains (bitcoin, ethereum, litecoin, zcash) don’t have permissioned wallets.

          1. Thank you once again! But I meant actual validation procedure of public/private wallet key – how does it happen in case of blockchain? There are explanations about Bob’s and Alice’s keys in Bitcoin documents but puppet spectacles without code don’t work on me due to lack of imagination and probably experience as well =( Blockchain itself doesn’t keep public keys (for Bitcoin at least) – what may prevent third party from re-generating keys and sending fake transaction data to the node? It seems re-generated pair of keys may work as well as the original. I understand it’s possible to validate public key if the wallet previously generated at least one transaction and included its public key in the data – but what about wallets which received coins but didn’t send anything and didn’t broadcast their public key yet?

            Wallet validation is like 1/4 of cryptocurrency concept and so far I didn’t see a single code snippet with this procedure applied to a blockchain. Could be great if you add the code to your article.

          2. Ah, ok- now I understand what you meant! This system relies on public/private key cryptography, where the private key is kept a secret, but the public key can be broadcast to the network- in most blockchains, the public key actually is the wallet address! This means that as long as a wallet receives or generates a transaction, its public key is available on the blockchain (as the ‘from’ or ‘to’ field of the transaction)- so there isn’t any issue about wallets receiving coins but not having a visible public key.

            Public/private key cryptography allows the user to ‘sign’ a transaction with their private key, creating a signature which uniquely encodes the contents of the transaction. Anyone with the public key can then verify that the signature is valid for that specific transaction- without being able to know anything about the private key! Any changes to the transaction contents would require a different signature, so having the public key and the signature allows a node to verify that the transaction was created by the account owner, and has not been tampered with. There are some good video explanations of public/private key cryptography; I have an unfinished tutorial on this which I might try to finish up to better explain the concept!

            Note that most blockchain protocols allow somebody to send to an invalid or unclaimed address- these tokens are then ‘lost’ as nobody has the private key matching that wallet address! This is used in some creative ways, e.g. by ‘burning’ bitcoins to buy tokens in a newly created blockchain (this is how Ethereum was first started, and is a good way of creating value in proof-of-work or proof-of-stake systems).

            Public/private key cryptography algorithms can be extremely strong- unlike cracking a password, it would take years for all of the world’s computing power working together to crack a single public/private key pair! So there isn’t any issue about ‘re-generating’ a public/private key pair; this can be viewed as effectively impossible as long as the public/private key pair was seeded randomly!

            If you’re curious about more, I’d encourage you to look around for resources on public/private key cryptography and blockchains- there’s a lot of fascinating topics in this realm.

          3. Public cryptography key as a wallet ID makes everything clear. Yet another simple yet great concept. Thank you for the replies and the article!

    1. Hi Hafsa! There are a number of approaches for this; the simplest is to hash the contents of the document (and an address to the document) into the message contents (rather than having the message have a target and amount). This proves existence of a specific version of the document at a given location. More complicated versions which work like IPFS are outside of the scope of this tutorial, but there are other resources for understanding that project!

  4. Hi, thank you very much for the tutorial, indeed some clever yet simple ideas.

    in the makeBlock function definition you populate the variable tnxCount with the number of transactions and immediately after when you define the content of the block you recalculate the length.

    txnCount = len(txns)
    blockContents = {u’blockNumber’:blockNumber,u’parentHash’:parentHash,

    Is there a specific reason for this or could we just use the txnCount variable in the blockContents dictionary?

    I would also been interested in understanding the mechanism according to which transactions get allocated to nodes for block creation. I know bitcoin uses the longest chain rule to solve concurrent blocks creation but I cannot figure out how this would work in code. When a transaction is generated how is it decided which node should process it and add it to a block? Are transaction assigned to multiple nodes at the same time? How the transactions buffer can be kept synchronized across the nodes?

    1. Good questions- reusing txnCount would improve efficiency, and is a good suggestion.

      The consensus mechanism is an emergent property of the network dynamics, and so is harder to model in a single-node representation like this one. Roughly, the network is a peer-to-peer network in which nodes pass messages and block onwards to all their listed peers. This allows a node to identify the longest chain which it receives from the rest of the network; as well as develop a queue of messages to bundle into blocks. The transactions will typically be ordered by the miner fees (or gas price in Ethereum) for batching into blocks. Once a block is formed, it is passed to all of the node’s peers, who validate it and remove any of its transactions from their own queues.

      1. Thank you for the clarification. I think I underestimated the speed information can propagate across peer networks. I had stuck in my mind the idea that it was not possible that a reasonably big amount of peers could sync transactions in time to avoid a massive duplication of jobs, but apparently the speed is high enough to allow processed transactions to be removed from working buffers fast enough to avoid continuous conflicts.

        I’m interested on the topic because I have a scenario where blockchain could come handy due to the lack of central arbitrations.

        The scenario I have in mind is similar to a found rising: one node need to create a request for found, lets say 100. All the other nodes can then decide how much to contribute to the request. Can be all 100 from only one node or smaller amounts from multiple nodes. Once the total for the transaction is 0 (request and supply net each other) then the transaction is ready to be added to the block. I think it should be rather straightforward to implement, the only complexity I see is in linking requests and supply with some sort of ID in order to decide when the request is solved. I guess I can solve potential conflicts of oversupply when creating the block, by discarding the final redundant transaction based on timestamp or by reducing the value in case the supplied amount is too high.

        Do you see a fit for the blockchain model?

        1. This is something that you could definitely code up in a smart contract in Solidity and deploy on the Ethereum network to test out. In your example, I don’t see why you necessarily need to wait for all the funds to be submitted before creating a block- e.g. if 50 units are committed in Block 1, 30 units are committed in Block 2, and 20 units are committed in Block 3, you still would be able to accomplish your goal (and then use the smart contract to ‘close’ the crowd sale and not accept additional units). Only if there are some nonlinear interactions between submissions would you need to worry about transaction ordering- but this again could be captured with a queue within a smart contract to accomplish the same goal.

  5. Thanks for the clarification. I have an idea of Preserving privacy in vehicular ad hoc networks using Blockchain, however, I am stack which blockchain can provide me with the best results, does my idea have any sense/? I think Blockchain can prevent attacker in vehicular communication and allow a good multicast communication among nodes. Do you have any idea of how I can do it and help me?? Thank you!!

    1. Thanks for reaching out- I’d suggest reading some of the work from the Initiative for Cryptocurrencies and Contracts (IC3), which should give you an understanding of privacy, throughput, and device security questions.

    1. The chain can be extended just as we’ve done here with our initial blocks- we haven’t modeled the network dynamics of how messages get passed between nodes and added to the network, but you could imagine having a queue of incoming messages, and forming a new block whenever there are sufficient messages.

  6. Hi, first of all thanks fot this great article, just one question. In case of we have several pear-to-pear nodes, there is a chance of chain inconsistancy. I mean than on different nodes transaction with number N could be different. Question is, how to properly merge chains copies from different nodes? Is there some master Node should be implemented to be a single source of true for chain consistency? Thanks.

    1. Hi Alex! This topic is commonly referred to as ‘sharding’ of the chain (separate from sharding of storage) and is an active area of research. There isn’t a simple way of presenting this in an tutorial which is focused on general concepts, but I’d encourage you to do your own research into approaching this.

  7. Good day, Eric! About the privacy – since governments have started to be very active with their attempts to control crypto-currencies. I’ve seen an explanation of wallet addresses hiding mechanic in Monero but somehow I cannot comprehend it – what and how exactly they hide data? It look like system is hiding transaction sum in a “temporary wallet address” which work like an invoice (a hash contains full/real wallet address + demanded transaction sum) – but aren’t normal transactions in other cryptocurrencies contain exactly that data? Or “hiding” mechanic is based solely on white noise effect when nobody (except for running nodes) knows which “invoices” were actually paid and which didn’t + lack of necessity to publish wallet addresses in web?

    Also recently there was an issue of Cardano (in October) which then gained $32B value in under 3 months. Besides light wallets mechanic (it’s actually great but not unique feature afaik) – what exactly is so useful in this blockchain that resulted in explosive value growth? To compare: Qtum has achieved “only” $4B value in 10 months.

Leave a Reply

Your email address will not be published. Required fields are marked *