Grant Update September 2019

Status and Achievements

September was a month tantalizingly close to huge success.  The Xuan version of factomd was released to master, after passing testing on the testnet.  Deployment started on the mainnet, but some workarounds to maintain full capabilities ended up not working.  The release was pulled back in September. Later a pause on the network was successfully resolved after some quick diagnosis and bug fixing.  The next phase of development also began for factomd.

There were two sponsor meetings in September.  One of them was recorded and published: https://youtu.be/sZepNTwkal0  The sponsors in attendance in September were Nolan Bauer, David Kuiper, Valentin Ganev, and Nikola Nikolov (both of Factomatic).

September encompassed a lot of development and testing.  The last consensus bug preventing agreement between different versions of federated servers every 5 blocks on the testnet was fixed.  This was enough to make Xuan ready for the Community Testnet.

The first release candidate of Xuan was deployed across the testnet.  Federated servers when they had upgraded were experiencing an odd behavior.  The upgraded servers were ending their part of the block almost a minute ahead of the non-upgrades ones.  It was observed that after an election on the testnet, all the federated servers once again got in sync, and none of the servers were ahead of the others.  The load test on the testnet also passed, with the blockchain being able to accommodate more Entries than the load test program was able to write. Here are the logs from the load test on testnet: https://chockablock.luciap.ca/loadtest-history

The kickoff meeting for the refactor was also held in September.  The design document that describes how the various pieces fit together and how the message flows is located here: https://drive.google.com/file/d/19mNQJlV9ehgbDQphlOQh6hmKF5XOweT8/view?usp=sharing

https://imgur.com/OU37225

https://imgur.com/oMqgoaU

At the kickoff was Matt York, Paul Snow, Veena, Clay, Steven, Brian, and Justin.

Also in September Factom, inc provided more testing and feedback for the Livefeed API.  Factom, Inc is hoping to utilize the livefeed API. Sphereon is developing the Livefeed.

Unfortunately on September 30th, the mainnet experienced a pause.  This event was not straightforward to get started again. The network was stopped for about 11 hours.  There was a problem where the various federated servers were not able to communicate effectively with each other.  Not enough of them were directly connected to each other, and the backhaul network nodes were a block behind, so would not propagate messages.  The federated servers would have needed to advance by two blocks to trigger the followers around the network to download a block. This would have allowed them to help with network communication instead of harming it.  As a way to get the federated servers back to building blocks, a custom version of factomd was created and put on the backhaul nodes.  This version relaxed some of the network self protections and helped the federated servers get back to building the blockchain.  

This month’s Github and Jira reports are also available:

Github Report:  https://drive.google.com/file/d/1MGeAMM9aG7cgt7OeVx1lPvMDXxxrIdK4/view?usp=sharing

Jira Report: https://drive.google.com/file/d/12CWWlbVNiNP0I32gVesq2_03BehGXpia/view?usp=sharing

September 9th also was the last day of the Factom, Inc Protocol Development Grant.  It spanned June 9 to Sep 9. There were two main measurable criteria that the grant provided for.  The first was Maintenance of Mainnet. The second was Refactor Code Initial Development. Factom, Inc was successful in satisfying the requirements of this grant.  

https://factomize.com/forums/threads/factom-inc-015-protocol-development.1950/

The maintenance of the mainnet came in two flavors.  The most obvious demonstration of this deliverable was the results.  The factom network operates in a fail-safe method, where it actively prevents blockchain forks when there are consensus failures by pausing and stopping the building of new blocks.  At this point in the blockchain’s development, human intervention is required to resolve the fault. Factom, Inc responded multiple times during the grant period to these faults with multiple engineers.  Here is one example of the response and analysis: https://drive.google.com/open?id=1gHtb2AvdrPU8j8dmMFAaiS8eTxmyqMKU

More importantly, the Bond release was successfully deployed on mainnet, which represented several months of development.  This brought more stability and fixed several security holes in factomd.

During the grant period, the first release candidate of xuan was tagged and made ready for deployment on the testnet.  This comparison shows the magnitude of development that occurred during the grant period.

https://github.com/FactomProject/factomd/compare/v6.3.2…v6.4.2-rc1

The other measurable deliverable was incremental code structure updates.  This was also referred to as precursors to sharding. One of the biggest examples of this is Dependent Holding.  This was released during the grant period.   

https://github.com/FactomProject/factomd/pull/756

Looking at the Threading Rewrite document we can see that Dependent Holding plays a big part in the message flow through the system.  It was implemented in a way that is usable in the current generation of code, as well as the refactored generation of code.

While it was not a measurable part of the grant proposal, the grant spirit included Developer Support for non-Factom, Inc developers.  That effort ramped up during this grant period with weekly developer standups that helped coordinate core development. Another example of this is support of the Livefeed API.  This support has helped further decentralize the protocol.  

Factom, Inc has successfully completed the June 9 – Sep 9 2019 grant, and is well on the way to satisfying the Sep 9 – Dec 9 grant.

Completed 

New capabilities and maintenance issues that were released

  • FD-1105
    • Make FollowerExecuteACK also look for messages in Dependent Holding
    • Fix a bug that hindered the performance improvements of Dependent Holding Actions
  • FD-450
    • Community Contribution – Removing Hoisie Web
    • Remove dependency on an old un-maintained library and use a more modern one instead for the API network handlers
  • FD-745
    • Test double application of factoid transactions when booted in minute 9
    • Test for regressions of a prior dangerous fault where factoid transactions are double applied, calculating the wrong balance
  • FD-827
    • fast track chain commit/reveals
    • Prioritize Chain creation over Entry creation to help prevent backed up queues under load
  • FD-932
    • All tests should be run against Factomd Releases
    • For each release we need a way to automatically test using all Simulation Tests. Currently this is a somewhat manual effort – automating this makes it easier to track history and expose visibility into the release process.
  • FD-1038
    • Community Contribution – Fix loading of custom cert paths
    • Actually use the TLS cert that is specified in the config file.
  • FD-1041
    • Review holding map in order of arrival
    • Increase network capacity by decreasing randomness when lots of traffic is arriving over the p2p network
  • FD-1089
    • Extend SetupSim
    • Helpful for create databases for DevTestNet and removed “special/annoying”ness of FNode0 in tests.
  • FD-1104
    • Put missing message responses into their own thread
    • Increase reliability and prepare for refactoring by fully isolating missing messages and their responses from state access. Rely only on channels for communication
  • FD-1113
    • Election Testing Failed
    • Sequential elections within the same block should elect the correct unresponsive vm
  • FD-1114
    • Community Contribution – Factable solutions code cleanup
    • Code cleaning and commenting
  • FD-1126
    • Add a way to alert developers when Nightly CI build fails
    • Adding a visible alert to the automation pipeline will help maintain high standards of quality during the development life-cycle
  • FD-1129
    • Correctly handle Rejected Messages
    • Fixes potential bug when recovering from election caused by possible attack vector or bugs.
  • FD-1136
    • Fix timing on EOM for Minute 10
    • Increase stability under load when an timing bug can cause factomd to get confused and degrade performance.
  • FD-1137
    • test and allow golang 1.13
    • let developers build using the latest version of golang
  • FD-1138
    • New SimTest to test elections in every consecutive minute.
    • Create a test which can be run repeatably that runs many elections
  • FD-1146
    • Community Contribution – Factable solutions code changes
    • Comment and simplify code to make it more legible to future developers
  • FD-1154
    • Increase Entry Sync retry rate
    • Reduce the amount of time it takes a node to catch up with the second pass blockchain download
  • FD-1155
    • Followers take many blocks to start following minutes after boot
    • Improved followers and audit servers ability to stay in real time with the blockchain. This helps audit servers be able to take over. It allows followers to know the latest transactions on blockchain.
  • FD-1162
    • Repair Dependent Holding to use highest block instead of leader height
    • Allow the blockchain to be downloaded more reliably in a stochastic or hostile network environment
  • FD-1163
    • Process Entry Reveals only after their corresponding Commits
    • Fixes a serious bug causing network pauses manifesting during high traffic when network messages come in out of order
  • FD-1171
    • Update savestate to 13 so that fixes for FD-1091 (blocks divisible by 1000) are captured
    • Protects against the risk of nodes being stopped on a block divisible by 1000 and becoming corrupted.
  • FD-1172
    • Adjust message propagation filter to be set/reset by heartbeats
    • This allows followers to resume propagating network messages when a hour long pause occurs. This will help recover from a future pause.
  • FD-1174
    • Avoid a race condition on boot when RestoreFactomdState is set
    • Avoid corrupting the local state which would cause balances to be incorrectly calculated
  • FD-1176
    • Pokemon bug detected in Validate()
    • Protect against a newly detected form of the Pokemon bug to prevent random panics
  • FD-1177
    • investigate inconsistent responses from debug API
    • Debug API should return data from behind a port forward
  • FD-1180
    • Never Hold non-ACK’d messages
    • Avoid nodes getting clogged up with missing message responses if they can’t be used
  • FD-1181
    • Missing Message Processing is losing requests
    • Avoid deadlocks when following along with the blockchain when local messages are missing
  • FD-1182
    • With 2 elections in a minute, correctly compute the appropriate VM
    • Fixes serious bug where after a one election, audit servers are incapable of replacing a second federated server
  • FD-1183
    • Add more logging around process list debugging + scripts for analysis
    • Allows core developers more insight over the system internals
  • FD-1186
    • Update EntrySyncing to get rid of unnecessary complexity
    • Slightly simplify the 2nd pass downloading code to increase reliability
  • FD-1187
    • Remove Remnants of ELECTION_NO_SORT activation.
    • Remove unused code to make it more legible
  • FD-1190
    • Drop messages if outbound messages queue is full outside of wait period
    • Protect against a possible deadlock scenario under certain situations – where (for example) we are booting while the network is under load.
  • FD-1192
    • Avoid race condition with DBsigs and setting up for Factoid transactions
    • Avoid panic when booting node
  • FD-1199
    • Expire stale messages from Holding
    • factomd could accumulate messages in holding and slow down over time, eventually crash by running out of memory
  • FD-1201
    • Xuan dbstates sync off-by-one bug
    • Stop downloading blocks that a node already has locally
  • FD-1203
    • Maintain proper coinbase behavior
    • Xuan needs to maintain same behavior as previous release for coinbase transactions to stay reverse compatible with older nodes every 25 blocks
  • FD-1208
    • Allow DBsigs 20 minutes before boot
    • Allow Leader nodes to recognize other leaders when they boot asynchronously.
  • FD-898
    • Update circle build environment to golang 1.12
    • Build with the latest circleCI/golang environments to maintain forward compatibility Actions
  • FD-924
    • Community Contribution – Docker dev environment misconfigured
    • Allow simulated development nodes to communicate in an isolated docker based environment.

Awareness

In Progress

  • FD-677
    • Gossip Improvements
    • Reduce the redundant network traffic caused by Gossip protocol Actions
  • FD-808
    • Updates to enterprise-wallet
    • Include the proposed changes to entreprise-wallet from Factomize
  • FD-984
    • Create a separate thread to handle all missing data requests.
  • FD-1033
    • Log peer behavior. Fix double p2p close.
  • FD-1101
    • Create framework to execute code on minute edges to consolidate state change code.
    • Simplifying the code to allow better code readability and reliability for EOM processing to prevent race conditions
  • FD-1122
    • Move Entry Credit balances to own thread
    • Move EC balances to their own thread for future vm threading
  • FD-1123
    • Distribute commits across VMs
  • FD-1131
    • DevNet Developer toolbox
    • Having a standard kit for interacting w/ DevNet will make it easier for all developers to make reproducible test scenarios.
  • FD-1132
    • Add an RPC api for interacting w/ Nodes on Devnet
    • As a developer I’d like to be able to write test scenarios against DevNet in the same manner as Local Simulation tests.
  • FD-1152
    • POC Jenkins builds against DevNet
    • Adding more automation around Integration testing in this POC will help us finish designing the rest of our CI/CD pipeline for Factomd
  • FD-1165
    • Update docs for debug API
    • need to update API docs so other users can make use of the new features.
  • FD-1166
    • Align grant height to grant payout boundary when reading grant descriptors.
  • FD-1175
    • Build factomd docker images with golang 1.13
    • Build images used by ANOs using the latest golang version
  • FD-1188
    • Community Contribution – Add documentation around directory and admin blocks
    • Increased understanding when newcomers review source code
  • FD-1189
    • Add sim test where to-be-elected audit is behind at moment of election
    • Extending sim tests around election/consensus code allows for better regression or checking for un-expected behavior
  • FD-1194
    • Update build environments to golang 1.13
    • Release software using the latest version of golang
  • FD-1195
    • Add logging for every recovery call
    • extending logging to capture this previously silent behavior aids in debugging
  • FD-1198
    • MMR catchup delay
  • FD-1204
    • extend Logging to track sizes of Maps and Channels
  • FD-1206
    • Faster Catchup by MMR
    • Faster loading of the first block after boot built by MMR.
  • FD-1209
    • Add commit data to ack in wsapi
  • FD-1210
    • Change priority to have higher priority than acks.
  • FD-1215
    • Update API version to more closely reflect semver
    • Give factomd API users info on what services are offered by a particular version of factomd
  • FD-1221
    • Community Contribution – include some factomd changes from Factable Solutions
    • Add clarity around the code base for future developers
  • FD-1037
    • Community Contribution – More tests around Audit Server Brainswaps
    • Better debugging problems around brain swapping a follower to an Audit server.

 

Future Plans

Protocol Grant

  • Ongoing Performance Improvements
  • Ongoing Maintenance corrections
  • Continue supporting the protocol and issues that arise

Oracle Grant

  • Ongoing Maintenance corrections

Anchor Grant

  • Ongoing Maintenance corrections

 

Awareness