For the release of Snow License Manager (SLM) 8.2.2., one of Snow’s development teams was dedicated to the task of reducing the time it takes to calculate compliance. Unfortunately, some Snow customers, particularly those with large numbers of seats (>40,000), have been experiencing lengthy compliance calculations; taking up to 20 minutes and even up to several hours in some cases, depending on the number of assigned licenses.
So, we set ourselves a challenge. We took a staging environment with over 500,000 units, which due to hardware performance as well as size reasons was taking over 12 hours to calculate compliance. We set ourselves a target: compliance should take no more than four hours to complete, and ideally less than two. By setting a high improvement factor, we were sure that our investigation would reveal the root cause of the problem.
The team started by determining which steps of the calculation were taking time – in the hope that we would uncover obvious bottlenecks that could be optimized. Unfortunately, we discovered that each step was taking roughly the same amount of time, and so fine tuning specific steps was not a feasible approach. We needed another idea.
The compliance service has always worked by dividing the tasks it performs into batches. The number of batches depends on the batchSize value set in the configuration file, and on the number of seats. As batches are processed in serial, the runtime for compliance calculation rises with organization size and number of licenses.
So, the obvious solution is to run compliance batches in parallel. The more the batches, the greater the potential for reducing runtime. And this is perfect in theory, but in reality, the complexity of making this change and the potential impact was huge. Compliance calculation is a key component of Snow License Manager that is relied upon by thousands of our customers to determine their software licensing position. Failure was not option. Testing, which we carried out on internal environments and with some customers, was both lengthy and rigorous.
Our work paid off. We not only met the target we set ourselves, but our tests are showing a 60% reduction in the time to calculate compliance.
A by-product of running batches in parallel is a dramatic increase in the memory consumption of the service. For maximum performance, the batch size should be set to 5,000 applications to fetch and the number of parallel batches should be greater than 10. Making additional cores available to the service is a good way to complement parallel processing.
Customers can refer to Technical Description: Compliance Calculation Engine – Update revision SLM 8.2