# Pastebin JjfwEylh
So summary of merkle grinding.

So the header format is ￼https://en.bitcoin.it/wiki/Block_hashing_algorithm

version(4bytes) prevBlock (32bytes) merkleRoot (32bytes) time (4bytes)
bits (4bytes) nonce (4bytes) = 80 bytes.

sha256 works on 64 byte chunks so that will be processed in two chunks.

the 64-bit message length is appended to the data after 1 or more
0bytes to pad to 64 bytes so what is actually hashed is:

there is an inner hash and an outer hash.  inner first, data hashed is

inner hased data =
version(4bytes) prevBlock (32bytes) merkleRoot (32bytes) time (4bytes)
bits (4bytes) nonce (4bytes) <40bytes of 0> loCount (4byte value 80)
hiCount (4bytes)

hiCount is always 0.

IV is magic constants.

stateA = transform call A( IV, version || prevBlock[0-31] || merkleRoot[0-27]￼ )

inner digest = transform call B( stateA, merkleRoot[28-31] || time ||
nonce || <40bytes of0> || loCount || <4bytes 0> )

outer hashed data = <inner digest> || <28bytes 0> || loCount (4 byte
value 32) || <4bytes 0>

outer = transform call C( IV, <inner digest> || <28bytes 0> || loCount
(4 byte value 32) || <4bytes 0> )

if target outer bits == 0 found proof of work.


stateA is precomputed and transform call 1 only done when extraNonce
changes, which changes merkleRoot.

so the most work is repeating call B by changing nonce (and maybe some
low order bits of time) and then calling transform call C.


now transform itself is in two parts.

W array = transform_part1( data )
state = transform_part2( state, W )

part1 does 13 operations of various things rightrotate, rightshift,
xor, 32bit unsigned add 48 times.  importantly transform_part1 does
not depend on state and so doesnt depend on the first block.

part2 does 23 operations of various rightrotate, xor, and, 32-bit
unsigned add 64 times.  it costs more than part1.

now if we precompute multiple merkleRoots that have the same last
4bytes, then transform_part1 in transform call 2 can be reused like
this:

expensive precompute eg FPGA
(mrA,mrB,mrC,mrD) = precompute_merkle_collision()
such that mrA[28..31]==mrB[28..31]==mrC[28..31]==mrD[28..31]

cheap precompute

stateA1= transform call A( IV, prevBlock, mrA[0-27] )
stateB1= transform call A( IV, prevBlock, mrB[0-27] )
stateC1= transform call A( IV, prevBlock, mrC[0-27] )
stateD1= transform call A( IV, prevBlock, mrD[0-27] )

then repeat in loop changing 4 byte nonce, and some low bits of time maybe.

inner W = transform_part1( mrA[28-31] ||  || time || nonce || <40bytes
of0> || loCount || <4bytes 0> )

inner digest A1=transform_part2( stateA1, inner W )
inner digest B1=transform_part2( stateB1, inner W )
inner digest C1=transform_part2( stateC1, inner W )
inner digest D1=transform_part2( stateD1, inner W )

outerA = transform call C( IV, <inner digest A1> || <28bytes 0> ||
loCount (4 byte value 32) || <4bytes 0> )
outerB = transform call C( IV, <inner digest B1> || <28bytes 0> ||
loCount (4 byte value 32) || <4bytes 0> )
outerC = transform call C( IV, <inner digest C1> || <28bytes 0> ||
loCount (4 byte value 32) || <4bytes 0> )
outerD = transform call C( IV, <inner digest D1> || <28bytes 0> ||
loCount (4 byte value 32) || <4bytes 0> )