STEEMSQL - Update 12 - About locks and performances

SteemSQL is a public MS-SQL database with all the blockchain data in it.

Previous posts:

SteemSQL success and wide adoption

SteemSQL has been widely adopted by a lot of users for analytical processing and even for running bots. Indeed, it is often easier to perform SQL queries to obtain information on the content of the blockchain than to parse every block and perform yourself the aggregation of all these data .

SteemSQL use a “Database Injector” process that read the blockchain in real time and split the transactions made on the blockchain into differents tables. Today, an average of 9 transactions per second are generated with some peak at 184 transactions per second. Each transaction may also have an effect on the non-transactional Accounts andComments tables, which must also be updated.

To ensure the integrity of the database, the Database Injector needs to perform a lock on the updated tables each time it has to insert or update data.

Gimme a lock!

While the database injector is working, users of SteemSQL issue queries to the database to perform their analysis.

Therefore, the SQL server will also try to performs a lock on the table involved in the user’s query in order to ensure that the returned data is consistent.

As you can see, as both process runs simultaneously, we can enter in a race for acquiring a lock on the tables, either by the Database Injector to add data to the database or by the query issuer to read data from it.

The database injector is a cool process, because it often ask for locks, but only for a very short time as its update are made very quickly.

On the other side, it often happens that the execution time for queries issued by users requires several seconds, or even minutes to complete. I saw some requests that took more than half an hour to finish, because they were poorly written or not restrictive enough.

While these long queries are running, the Database Injector can not insert new data into the database. This is not only annoying in terms of performance because the database start to be out of sync with the blockchain, but it is also a source of spam to me because I receive a notification whenever there is a problem of data injection in the database.

Please, use the NOLOCK hint!

Members of the #steemSQL channel on steemit.chat started to see my repeated requests to them to use the SQL (NOLOCK) optimization hint.

The (NOLOCK) hint tells the SQL server that dirty reads are allowed. When a user issue a query with the NOLOCK hint, no shared locks are issued to prevent the Database Injector to add data to the database. Even if the last one issue an exclusive locks, it will not block the user’s query from reading the locked data.
As I explained to SteemSQL users, the usage of the NOLOCK hint not only stop them interrupting the work of the Database Injector, but it also improves their queries performance.

How users responded to my request

The Good

Some quickly understood their advantage in terms of performance (and mine in terms of tranquility) and started to do systematic use of the NOLOCK. They carefully added the hint to their queries and everybody went happy.

The Bad

Unfortunately, some did not hear my call or did not care enough. They continued to issue queries without the hint, or forgot to add it to all table in their queries (or subqueries).

Others, having no knowledge of how to write SQL queries, have used analysis tools like PowerBI to perform the task, and even if aware of the NOLOCK issue, weren’t able to tell their tool to use the hint.

The Ugly

Finally, some who I knew they saw my request for the NOLOCK use, simply ignored it. Lazy ones! I have to admit that I had to ban a few of them to preserve SteemSQL smooth work.

The need for strike back

More and more, I had to struggle with people issuing queries that were crashing SteemSQL. Those who are members of the #steemSQL channel on steemit.chat know what I am talking about.

To give you an idea of the problem, here is a graph that shows the number of lock requests per minutes over the last 30 days:

As you can see, the server has to deal with an average of 767K lock requests/min, sometimes rising up to 4075K lock requests/min.

I also received hundreds notification of deadlock situation per day. I started to get tired of having to continuously clean my mailbox. It was time to put an end to this anarchy.

But how to solve this issue without disturbing existing process (and it looks they are quite a few)?

James to the rescue

(Un)fortunately, I am not 007 and don’t own a licence to kill. Therefore, the only option to me was to use “A VIEW”!

In SQL, a view is a virtual table based on the result-set of an SQL statement. A view contains rows and columns, just like a real table. The fields in a view are fields from one or more real tables in the database.

Therefore, as from now, all tables have been moved behind SQL views that mimics the tables and use the NOLOCK hint. The underlying tables are no more accessible to users, but only to the Database Injector. As it is the only process to update the database, we do not have to fear data corruption.

For example, the table Comments has been renamed TComments and a view has been created with the following definition:

CREATE VIEW Comments AS SELECT * FROM TComments (NOLOCK)



This ensure that each query issued against to the table through the view will be using the NOLOCK hint.

Users have nothing to change in their existing process as this is fully transparent to them or to any application they use to query the database.

Full speed ahead

Despite the newly implemented structure will now hide how the tables are designed and linked together, this will allow SteemSQL to continue working at full speed and manage the future growth of the Steemit platform.

I will put some effort in creating and publishing documentation and tutorials for SteemSQL.

Moreover, I will now be able to release some very awaited new features that I was holding on to prevent putting the server on its knees.

Thank you to all those who regularly send me comments of encouragement and share their appreciation of my work with SteemSQL.

Thanks for reading.


Help SteemSQL stay public and running 24/7/365.

All payout from this post will be allocated to SteemSQL infrastructure.

Thanks for your support!

footer created with steemitboard - click any award to see my board of honor

Support me and my work as a witness by voting for me here!

H2
H3
H4
3 columns
2 columns
1 column
28 Comments
Ecency