Grant Update April 2019

Technical Updates | May 10, 2019
Home » Company » Blog » Grant Update April 2019

Outputs and Outcomes

Our focus is to:

  • Improve the performance and reliability of the Protocol
  • Provide ability to monitor the exchange rate and make adjustments as needed
  • Maintain the Anchor Master Securing the Factom blockchain in the long term

Grants: | Protocol Grant 012  |  Oracle Grant – 010   | Anchor Grant 011

Status and Achievements

April was a vexing month for Factom core development.  We got very close to a full deployment of 6.2.2, but it was ultimately not to be.  Despite that, much progress was made in April. Many bugs were found and squished.

There were two Sponsor meetings in April.  They were held on April 10 and April 24. The April 10 Sponsor meeting was recorded and is published here.  Sponsors in attendance this month were  Dominic Luxford, Nolan Bauer, and Nikola Nikolov (Factomatic).

April started with the release candidate of 6.2.2-rc3 being deployed onto the testnet.  The testnet crew stress tested it several times and it passed that scrutiny. This was a very promising development, as it showed that the extensive changes that had been introduced in the 6.2.2 release (over 6.2.0) were able to withstand a real world environment very well.

The 6.2.2 release contained two major fixes that 6.1.1 had been held back for.  It fixed a high CPU utilization bug with things in the holding queue. It also fixed a big bug where the replay filter was being populated improperly during the first hour of boot.  This was shown when servers that were brain-swapped to being a federated server within an hour of boot were being faulted out. This was in contrast to the Federated servers that were promoted from Audits through faulting.  That process was operating properly. The fix in 6.2.2 allowed the normal upgrade process to proceed.

WhoSoup from Factomize also contributed a useful fix to the 6.2.2 release that helps with the election sync messages.  Adam S Levy from Canonical Ledgers also contributed to the 6.2.2 release.

The 6.2.2 version was eased out onto the mainnet.  It started with a handful of Audit servers, and those didn’t show any problems.  A couple Federated servers upgraded on the mainnet and they progressed just fine.  After 6 of the 25 ANOs had upgraded to 6.2.2, there was an increased load amount. This was similar to a load increase earlier in the week when 5 of the 25 ANOs had updated.  Even after the load subsided, the network prioritized consistency over liveness and wouldn’t continue forward. The 6 ANOs downgraded to the prior version and liveness was restored.  A report is available here.

The debugging process for finding what caused this continued through April and a breakthrough was discovered, where the 6.2.2 code would send out two acknowledgements for the same process list height.  This violates some of the assumptions the system makes and causes a consistency failure. This had a similar result as the February network pause where the network would protect itself against the fault.  The difference would be that the Federated server would create the error on its own, and not with multiple machines like what happened in February.  This bug had a few avenues to solve the problem but a solution was coded and placed in the release candidate.

The Bond (as in printer paper) release candidate was started in April to collect the fixes that were developed from 6.2.2.  The Parchment release has more extensive fixes for factomd planned, but getting stability with fewer changes in Bond is taking priority.  The Parchment release will have more scalability improvements, but also requires more testing time.

The Bond release will contain a fix that was developed by non-Factom, Inc developers as well.  It was replicated by Alex from Factoshi and was debugged and fixed by Sander Postma from BIF.

The entrysyncing code was also rewritten for the Bond release.  It was causing problems keeping up with consensus code and could cause the system to malfunction during high load.

Another issue that the Bond release will fix is the End of Minute handling.  The internal messages that a Federated server was using to decide when to finish it’s part of the blockchain and hand it off to the next server was getting lost.  By creating a different dedicated pathway for these messages to be handled is going to increase stability.

The Parchment release is planned to have an even better improvement.  A large capacity increase that is showing a lot of potential is redefining how the pending Entries and Commits, etc. are handled internally.  Currently the holding queue collects large numbers of items and gets backed up when there is a lot of traffic. Preliminary research shows that if this data structure is restructured and refactored with a different type of data structure, it will have a large performance increase.  This change is also one of the precursors to sharding. The older type of data structure did not allow for the type of cooperation between processes that sharding will require. This improvement will boost the capacity of the network.

The Parchment release will also contain an update to the API, which was implemented by Tom from Layertech.  It was done at the request of the OpenNode team (Bedrock and Defacto).

April was exciting, as so many latent bugs were found and are shaping up to provide an exceptional release in May.

Completed

New capabilities that were released

  • Community Contribution – remove duplicate filenames of fourSegments26.txt – remove messy output from go mod or go build in any project that has any dependency within the factomd repo, directly or indirectly
  • Community Contribution – election sync message could cause confusion.

Maintenance\QA\Debugging released

  • Replay filter of entries is not populated on reboot – Allow a federated server to reject replayed messages which are also rejected by the rest of the network within the first hour of booting
  • Factomd thrashes when things are in holding queue – Don’t needlessly burn CPU when there is an item in the holding queue

HR

  • New Hire Ted Gilman – General Council.

Awareness

In Progress

New capabilities

  • Refactor sim testing – This ticket rolls up past work into a the latest revision. This enables new types of tests to be written that include adding a node during simulation and also the ability to filter outs specific messages during testing .  Additionally this will run simulation tests that previously did not execute on Circle.ci
  • Enable ability to retrieve receipts at directory block level – New API that gives much clearer visibility to anchor transactions.
  • Split receipts API into receipts and anchors – With ethereum anchoring enabled, we need a way for developers to ask factomd which anchors covered a given transaction.
  • Improve simulation by allowing fnodes to be added during tests – Allow us to do new kinds of testing where network topology of sim nodes can change dynamically during testing.
  • Downloads dbstates in batches when syncing from the network – Increases efficiency when syncing from the network
  • Check for repeat ABlock signatures – Allow the database integrity checker to detect an edge case found in testing where there are multiple signatures with the same pubkey.  Not expected in production.
  • Add more simulation testing scenarios – Adding more extensive testing for other brain-swap scenarios will help catch more types of backward-incompatible changes from making it into the codebase.
  • Community Contribution – add if booted from disk to diagnostics API – allow outside programs to know when the 1st pass has been fully processed from disk.
  • Cleanup Failing sim tests – Clean up details of several unit tests and modify some of the tested functions to be more testable.
  • Create New HoldingList datastructure – Existing holding queue structure is not as efficient as it could be. This new struct will enable more efficient processing of messages whenever their requisite payments or commit/reveal pair messages become available.
  • Refactor Messages to use New Dependent holding – As a preparation for more aggressive refactoring – messages should be refactored to use new Dependent holding queue. Eventually this will replace the older method of holding entirely.
  • Null pointer exception is possible checking payments for commits – Don’t panic when timing issues create a null process list
  • Community Contribution – CrossBoot replay garbage collection never ends – let Garbage collector use less resources with cross boot replay filter
  • Inefficiency with falling time.now() multiple times – use less CPU when handling p2p peers
  • Community Contribution – Legibility Improvements Part 1 – Factomd code should be more legible when possible.
  • Community Contribution – add config option to set factom-walletd config file path – Allow factom-walletd to use a config file loaded from a configurable path.
  • Preemptively save entries to database as they are processed rather than when block is done – Increase performance under high entry load by spreading out expensive database operations over the entire block building period.
  • Update Database Integrity Checker – Useful for testing, for development, and as a sanity check for database changes.
  • Make holding management be dependency driven instead of scan driven – Replace holding queue with a holding map to increase performance as well as to prepare for sharding
  • Refactor Directory Block building – The directory block has no information independent of the components collected while building the next block.  So instead of trying to build it as we go, we should build it in dbstatemanager.fixuplinks where we finally have everything we need to create a directory block. This has a side effect of avoiding any caching of hashes that could damage or blow up our state. The expected results:  cleaner code, necessary for breaking up processing of messages, performance of +20% or more (because currently we build very expensive hashes over and over).
  • Dependent holding should be periodically cleaned – This is an additional enhancement to new style of message holding. adding a method to periodically expire messages will ensure that stale data gets removed so system will not get backlogged.
  • Rework Entry Syncing with channels and go routines – Entry Syncing has some performance issues.  This rework ensures we only handle entries once as we sync (maybe a few requests, but no trashing when an entry has been handled).

Maintenance\QA\Debugging

  • Add DBSigs for top DB State to Database – Keeps signatures from the highest block in the database so that we can provide the block to other nodes
  • Slow sync with minutes for 10 minutes on boot – Quickly catch up with the federated servers and don’t peg the CPU while doing it.
  • The simulation Test other than TestSetupANetork are not run on CircleCI – Expand the number of tests run on Circle CI to help with automated QA
  • Raw Object not found on transaction acknowledgement – Fix intermittent timing errors when using the raw API in factomd
  • When loading from a save state if the current dblock had FCT/EC transactions the balances were not loaded prior to evaluating the transactions. – Fix timing issue when evaluating FCT and EC balances.
  • Factomd Authority JSON marshalling should be usable – correctly export server identity coinbase address and efficiency fields in JSON unmarshaling.
  • Clean up logging of messages. Force filename to lowercase – Improves logging so we can better diagnose problems. Increases resolution of timing and standardize on output files.
  • Make the version and git version set-able from goland – allow goland debugger to show git commit hash and golang version in factomd and the control panel.
  • Inefficient DBState loading while syncing from the network – Better handle downloading the first pass of getting the blockchain
  • Logging improvements from old revisions – Add millisecond to times for better analysis. generally better log behavior. No crash on nil state logging.
  • Wallet database reinitializes – automatically clear the cache of factom-walletd when detected a new local test-blockchain.
  • Sort auth set only on block boundaries – Bug fix related to elections to allow them to resolve properly in some elections.
  • Local EOM or DBSig could be follower instead of leader executed  executed in rare cases and needed to be added to holding – keep around internal EOM messages even if in an odd edge condition.
  • Extra DBSTATE messages were generated when restoring from a savestate – Don’t create extraneous DBstates when loading from a savestate file.
  • Testing API to filter inbound and outbound messages – Add a message filter and an API to set it so that QA can simulate nodes disappearing from the network.
  • Fast track chain commit/reveals – Prioritize Chain creation over Entry creation to help prevent backed up queues
  • Make holding management be dependency driven instead of scan driven – Replace holding queue with a holding map to increase performance as well as to prepare for sharding
  • Refine some of the unit test code – Add unit tests for local commits and local wallet simulations
  • Fix a pokemon instance involving MessageBase – Catch a new style of Pokemon bug found on the testnet with MessageBase
  • Ack holding – Fixes bug that could cause useless elections immediately after boot.
  • Logging saves only partial hashes but in some cases the whole hash is required. Add a separate log  of all unique full hashes – to help with debugging curious messages on the network, print out the full entry hash to aid tracking down the problem
  • Factomd not reading from CustomSeedsURL – In a custom isolated test network, allow followers to find peers with the CustomSeedsURL
  • Gossip Improvements – Reduce the redundant network traffic caused by Gossip protocol
  • Community Contribution – 2nd pass flagging issue on boot – Remove confusion when starting factomd if the 2nd pass has actually completed when looking at the control panel.
  • Investigate 6.2.2 issues on mainnet – Need to unblock release of 6.2.2 – either we need to fully understand the misbehavior or diagnose and correct a bug when running alongside 6.2.0
  • Remove code that tosses incoming messages based on holding – Retain messages instead of deleting them shortly before needing them.
  • Missing Message Requests are tossed if the ask has an all 0 ID – Don’t ignore peers who added the identity of all zeros to their config file.

 

Future Plans

Protocol Grant

  • Ongoing Performance Improvements
  • Ongoing Maintenance corrections
  • Continue supporting the protocol and issues that arise

Oracle Grant

  • Ongoing Maintenance corrections

Anchor Grant

  • Ongoing Maintenance corrections

Awareness

POSTED: May 10, 2019 BY Kevin Casper IN Technical Updates
ABOUT THE AUTHOR

Factom Inc's community manager, carrying a load of marketing operations experience from the enterprise tech and video gaming industries.