Sunday, January 19, 2025

bitcoin core improvement – What’s the knowledge format structure for txindex LevelDB values?

The keys I perceive, t + 32-byte hash.

However my drawback are the values. I perceive from sources comparable to What are the keys used within the blockchain levelDB (ie what are the important thing:worth pairs)? that the values ought to encode three values: dat file quantity, block offset, and tx offset inside block.

However I’ve observed that every worth has a special sizes between 5 and 10 on the primary thousand entries, so I am unsure how one can decode the values into these three fields. Are these fields merely 3 varint values?

Here is my Plyvel code that prints out the lengths utilizing plyvel==1.5.1, Bitcoin Core v26.0.0 on Ubuntu 23.10:

#!/usr/bin/env python3

import struct

import plyvel

def decode_varint(knowledge):
    """
    https://github.com/alecalve/python-bitcoin-blockchain-parser/blob/c06f420995b345c9a193c8be6e0916eb70335863/blockchain_parser/utils.py#L41
    """
    assert(len(knowledge) > 0)
    dimension = int(knowledge[0])
    assert(dimension <= 255)

    if dimension < 253:
        return dimension, 1

    if dimension == 253:
        format_ = '<H'
    elif dimension == 254:
        format_ = '<I'
    elif dimension == 255:
        format_ = '<Q'
    else:
        # Ought to by no means be reached
        assert 0, "unknown format_ for dimension : %s" % dimension

    dimension = struct.calcsize(format_)
    return struct.unpack(format_, knowledge[1:size+1])[0], dimension + 1

ldb = plyvel.DB('/house/ciro/snap/bitcoin-core/widespread/.bitcoin/indexes/txindex/', compression=None)
i = 0
for key, worth in ldb:
    if key[0:1] == b't':
        txid = bytes(reversed(key[1:])).hex()
        print(i)
        print(txid)
        print(len(worth))
        print(worth.hex(' '))
        worth = bytes(reversed(worth))
        file, off = decode_varint(worth)
        blk_off, off = decode_varint(worth[off:])
        tx_off, off = decode_varint(worth[off:])
        print((txid, file, blk_off, tx_off))
        print()
        i += 1

however it will definitely blows up at:

131344
ec4de461b0dd1350b7596f95c0d7576aa825214d9af0e8c54de567ab0ce70800
7
42 ff c0 43 8b 94 35
Traceback (most up-to-date name final):
  File "/house/ciro/bak/git/bitcoin-strings-with-txids/./tmp.py", line 39, in <module>
    blk_off, off = decode_varint(worth[off:])
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/house/ciro/bak/git/bitcoin-strings-with-txids/./tmp.py", line 29, in decode_varint
    return struct.unpack(format_, knowledge[1:size+1])[0], dimension + 1
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
struct.error: unpack requires a buffer of 8 bytes

So I ponder if I guessed the format unsuitable, or if it is only a bug in my code.

Evaluating to: https://en.bitcoin.it/wiki/Protocol_documentation#Variable_length_integer I might decode:

42 ff c0 43 8b 94 35

manually as:

  • 42
  • ff: count on 8 bytes subsequent
    • c0 43 8b 94 35: solely 5 bytes left, blowup

I additionally tried to inverse worth:

worth = bytes(reversed(worth))

however then it blows up very early, undoubtedly unsuitable.

I additionally tried to disregard the error to see if there are others, however there have been a whole bunch them, so one thing is unquestionably unsuitable with my methodology.

Associated:

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles