In matters of financial relationships with the presence of a time interval for fulfilling obligations – when there are chairs in the morning and money in the evening – the standard procedure is to assess the financial well-being of our partners. But it is one thing when such a task occurs once every six months and concerning a dozen contractors. But what if there are hundreds of contractors and additional monitoring is required, for example, once a quarter? A possible solution may be a scoring model based on a neural network (or artificial intelligence, as it is more convenient), the use of which we will discuss in this article.
For the consideration of this issue to be based on real data and be close to everyone, I suggest that we choose a common partner for all of us – a credit institution, that is, a Bank. This will allow us to operate with up-to-date public reporting of banks and will not cause problems with the choice of reference points. Someone may notice that banks have a credit rating from professional rating agencies, and they will be right. But at the time of writing, the Central Bank of the Russian Federation publishes reports on 511 banks (September 2018), while Expert RA contains the current rating of 156 banks (which is almost 31% of the total number of banks in the country), and ACRA has 73 banks with credit ratings (14%, respectively). In other words, we will have something to look at.
So, to prepare a scoring model based on a neural network, we need a standardized data array. That is, an array of data that contains the values studied and the result of these studies – the final assessment of creditworthiness. The neural network must learn how to process and interpret input data following a given (or accepted) evaluation system.
As input data for the assessment of a legal entity, you can use standardized forms of strict reporting, namely form 1-balance sheet and form 2-statement of financial results. But since we have chosen banks as the object of research, we will use the current statement of accounting accounts published by the Central Bank of the Russian Federation (form 101), namely outgoing account balances (total in thousands of rubles). this form, unlike others, is published monthly and has sufficient detail to consider our example.
When preparing an array of data for neural network training, you can use the results of a previous evaluation as the evaluation result. We will focus on the Expert RA Rating Agency. The authority of this Agency will help avoid discussions about the quality of the assessment, and the scale of its activities will allow you to collect enough data to train the neural network. As noted earlier, 156 banks have the Expert RA rating against 73 for a younger ACRA, and the larger the data set for training the neural network, the more stable the scoring model.
The Expert RA national rating scale has 21 levels. For the convenience of further evaluation, I propose to transform the letter designation of the rating into a linear digital one: the highest level of ruAAA will be equal to 1, the lowest ruD-0 (zero), respectively, we get a step of one level of 0.05.
Before collecting an array of data for training a neural network, you need to understand how the evaluation is performed for the system we have chosen and what it includes. To do this, we need to look at the latest methodology for assigning credit ratings to Expert RA banks. At the time of writing this article, the version from 10.04.2018 was used. We are interested in Chapter II “weights of the significance of indicators” (p.7), which contains weight coefficients for different sections of the assessment when assigning a rating. This is important information for us, as it allows us to determine the share of public data available to us in the overall rating system. Since we will use only the 101st reporting form and omit the “tails” when considering our example, we can estimate the share of our data at about 75%. In other words, we will have most of the necessary data, but not all of it. Accordingly, when training a neural network, we will focus on a proportional error of 25%.
An array of data for neural network training
This process is simple to understand and difficult for technical implementation. We need to collect a systematic array of data, where the rows will be banks and the columns-outgoing balances in the context of available accounting accounts and digitized rating Expert RA. At the same time, the report from the 101st form must be one month before the date of rating assignment (so to speak, an objective picture at the time of the rating Agency’s analysis). In other words, it is necessary to systematize information from different reporting files published by the Central Bank of the Russian Federation.
From the data of the Central Bank of the Russian Federation on the 101st reporting form for September 2018, 1,084 unique account numbers are available to us, and to reduce the data array, we will agree that we will only operate with frequently used ones. To do this, we will have to build reports on all accounts, depending on the date of the rating assignment, and only then exclude those that are rarely used. As a result, I have 299 frequently used accounting accounts.
Based on the results of all preparatory operations, we get the array of data, which is presented (in compressed form) in the image below.
Explanation of columns:
- rep_code – code of the report on the 101st form in the format of the Central Bank of the Russian Federation. Account data is provided from this report for each Bank;
- rate_date – date of assigning the rating by Expert RA;
- reg_num – the number of the license;
- b_name – the name of the Bank;
- score_RA – digitized rating of Expert RA.
Columns with the designation “acc” -the numeric value corresponds to the number of the accounting account on the 101st form. The values in the columns correspond to the outgoing balance by the account number of the required reporting period rep_code.
We pass the generated data array to the prepared neural network for its training. During the training process, you need to focus on the value of the received error and change the configuration of the neural network to get an acceptable result.
In the course of several experiments, it was possible to obtain a neural network with a rating error of about 24%, which corresponds to the expected error size and is quite sufficient for the case under consideration. In other words, after configuring and training the neural network, we get a full-fledged scoring model. The results of scoring banks on the array of data prepared for neural network training are presented below.
For an objective review of the results of the scoring model, a column with the deviation of the rating level has been added. The fact is that by linearly digitizing the rating scale from 0 to 1 with a rating step of 0.05, we encountered a dimension problem, when the lower the scoring value with a constant step, the higher the % error. Therefore, PJSC Bank ALEXANDROVSKY with a scoring value of 0.37 and a rating value of 0.3 has a relative error of 24%, but the absolute deviation is only 0.07 (slightly more than a rating step), and thus the deviation of the rating level is only 1 step. The average error size by rating level for this sample is 2, which is 10% of the rating scale with a dimension of 21 levels.
Now let’s turn to the question of the reliability of the obtained scoring values. In this case, the best solution is to evaluate the correlation between the scoring value and the officially assigned rating. For these purposes, the banks “AVERS” and “MKB” were selected from our sample. The results of their scoring in the period from 01.05.2017 to 01.09.2018 are presented below.
These graphs show a good correlation between the scoring value and the officially assigned rating, which allows us to conclude that the resulting scoring model is reliable and applicable for operational monitoring. Now we can move on to scoring all 511 banks for which reports for September 2018 are available, published on the website of the Central Bank of the Russian Federation.
When considering the obtained Bank scoring values, it should be remembered that the purpose of this case is to demonstrate the practical application of a neural network for scoring standardized financial statements within an acceptable margin of error, which is why we operated with a simplified and limited set of public data. We did not take into account the quality of collateral and other data that is important for a full credit analysis due to their confidentiality: only professional rating agencies have access to them due to the concluded agreement and the relevant law. In other words, it is necessary to treat the results obtained with a degree of scepticism strictly in the amount of the error received.
P.S: when considering this case, I deliberately did not go into the technical wilds of building a neural network and do not plan to do this in the future, because, firstly, it is very tedious, and secondly, there are enough specialized resources for this. If someone is interested in the technical side of this issue – Habr to help you.