Skip to content

[Bug] Make writing data directory JSON files atomic (Book.json, Config.json etc.) #189

@andrewnguyen22

Description

@andrewnguyen22

Describe the bug

Normal operation then suddenly unable to unmarshal peer book

Image

Occurred on a non-validator full node on the canary network

The writing of the peer book

{
  "book": [
    {
      "address": {
        "publicKey": "xyz",
        "netAddress": "x.x.x.x",
        "peerMeta": {
          "networkID": 0,
          "chainID": 2,
          "signature": "tmhrEfFFrc+POilNER0Nltj08AmzNJjNz21cRITQfUjUG+EA0lD/M4i+u/mlvrbbAfDs4yrIcUhzxVbhs/EevW0NTbsgHe1+/bPFGJnE6M3SyL8RwgRn3k5tAJD/aJrl"
        }
      },
      "consecutiveFailedDial": 0
    },
    {
      "address": {
        "publicKey": "xyz",
        "netAddress": "x.x.x.x",
        "peerMeta": {
          "networkID": 0,
          "chainID": 2,
          "signature": "ioD44LwehMfdAYhyYI1XhmfCCW2KHlv4k5l1U4quBl92ldwfFPE69VLj46tOviuDCcfrG+TwLmv1B7RFaKrdll8GojxDrcTBPKXLT5RO252wiQfMENzALZrNPRMhKvt0"
        }
      },
      "consecutiveFailedDial": 0
    },
    {
      "address": {
        "publicKey": "xyz",
        "netAddress": "x.x.x.x",
        "peerMeta": {
          "networkID": 0,
          "chainID": 2,
          "signature": "sgNTWjFimG+idC7wei4XF61C3veRPMs5g/J80b+GQPZTAEKRwx2yPM96Qt/qiriiBtNpkQIsumKWAUBo7ShwZtMiO8P5REdWWbj3pQXPN9UsDlWYNg7/RTfEnDtPV7VP"
        }
      },
      "consecutiveFailedDial": 0
    },
    {
      "address": {
        "publicKey": "xyz",
        "netAddress": "x.x.x.x",
        "peerMeta": {
          "networkID": 0,
          "chainID": 2,
          "signature": "mCk6Gmm7TjxYAkJC/VaHwm3udVA4B7oEKHSjyBilFS0h2kOEuLkNEZcijUlxXKx4Dob2MzTEcyIt59xTiPqS/i29WsiMNW9KMkF3EGBAYzacu28U6Hsh7fkh5Eolyi/v"
        }
      },
      "consecutiveFailedDial": 0
    },
    {
      "address": {
        "publicKey": "xyz",
        "netAddress": "x.x.x.x",
        "peerMeta": {
          "networkID": 0,
          "chainID": 1,
          "signature": "j+tSBNZZKPdSoDwkjP/7rjpbPKroFflmxBaTU/JJCER7Gv/J1haHZw+TGyw8wKTKGf5pOF4LgfXl+GmR6Z1ekh6SF+/VYdsQTgjaL1QyPQq2nhZ3IzYB+1fkj6m/s6HB"
        }
      },
      "consecutiveFailedDial": 0
    },
    {
      "address": {
        "publicKey": "xyz",
        "netAddress": "x.x.x.x",
        "peerMeta": {
          "networkID": 0,
          "chainID": 2,
          "signature": "iD+6jUrjv7QeLo/Eme9EnmFAIEFUFBq4mVEcdo0HFd5ZwN9RCZlvg+QWP9NFIS2DBjFYjzzfPk3ADDxMf8gy94pJ/bOU80eRdoEp+47jjIjiGc3NLKRba+TCuZw9uiFc"
        }
      },
      "consecutiveFailedDial": 0
    },
    {
      "address": {
        "publicKey": "xyz",
        "netAddress": "x.x.x.x",
        "peerMeta": {
          "networkID": 0,
          "chainID": 1,
          "sig

It's clear that the json was cut off here due to a terminated flush. Not sure why this happened

Steps to reproduce

Intermittent

Expected behavior

Peer book flush and unmarshal is safe

Screenshots

See above

Environment

  • OS: Linux + docker
  • Software Version: alpha-0.0.1

Additional context

It's likely the writing was stopped due to a crash - but the software doesn't gracefully handle incomplete peer books.

Consider just fixing this part:

// read the json file
	bz, err := os.ReadFile(pb.path)
	if err != nil {
		l.Fatalf("unable to read peer book: %s", err.Error())
	}
	// load the bytes into the peer book object
	if err = json.Unmarshal(bz, pb); err != nil {
		l.Fatalf("unable to unmarshal peer book: %s", err.Error())
	}

Required Tags

  • Priority: Low
  • Module: Multi-Module

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions