forumNordic

Global Visibility for Nordic Innovations

SME Default Dynamics – Why Methods, Sectors, and “Soft” Data Now Matter More Than Ratios

Small and medium‑sized enterprises (SMEs) power employment, innovation, and local resilience, yet they are structurally more exposed to shocks and credit frictions. Hamid Cheraghali’s dissertation, SME Default Dynamics: Methodological Advances and the Value of Non‑Financial Information (UiS, 2025), is a timely intervention: it shows that how we build models, which firms we apply them to, and what information we feed them fundamentally change the signal we extract about distress. 

For academics, lenders, and policymakers, and for the wider business community, the message is clear: it’s time to move beyond single‑model, ratio‑only approaches and embrace methodological rigor, sectoral nuance, and non‑financial data.

A field that matured and then diversified

Historically, default prediction evolved from univariate heuristics to multivariate discriminant analysis (the Altman Z‑score) and logistic regression (Logit), bringing probabilistic interpretation and tractable economics (Altman, 1968; Ohlson, 1980). That lineage still matters: logit remains widely used because it is transparent and auditable. But Cheraghali’s work documents a decisive shift: modern ensemble methods, especially Light Gradient Boosting Machine (LightGBM) and Extreme Gradient Boosting (XGBoost), and other machine‑learning (ML) techniques consistently deliver higher out‑of‑sample accuracy in SME settings, if we evaluate them under realistic deployment conditions (strict hold‑outs and out‑of‑time tests).

In plain terms: ML captures non‑linearities and interactions that classic ratios and linear models miss; AUC (Area Under the ROC Curve), a threshold‑free measure of discrimination, improves meaningfully when these methods are properly tuned and validated (Cheraghali & Molnár, 2024b).

Critical takeaway

Performance claims that rely on in‑sample results or ad‑hoc validation inflate confidence and mislead credit decisions. Cheraghali’s survey of 145 studies found more than one‑quarter validated in‑sample only, and over one‑third skipped feature selection altogether, conditions ripe for overfitting and poor generalization (Cheraghali & Molnár, 2023).

Methods really matter, but so does methodological hygiene

Cheraghali’s large‑scale benchmarking (6,118 model configurations on U.S. SME data) demonstrates three practical points:

  1. Model choice: LightGBM and XGBoost typically top the AUC rankings; SVM can be competitive; Logit is respectable only when paired with thoughtful feature selection (e.g., LASSO or disciplined stepwise) and class rebalancing.
  • Class imbalance: Default is rare in most SME panels. ML ensembles often perform best without rebalancing (they learn class weights internally), whereas k‑NN, QDA, and CART benefit from undersampling toward balance. Blind oversampling can degrade generalization.
  • Validation design: Hold‑out (entity‑level) and out‑of‑time splits are non‑negotiable for operational scoring; mixing training and test information, e.g., selecting features on the full sample, quietly leaks signal and inflates AUC.

For financial institutions, the implication is operational as much as statistical: align your method stack and resampling strategy with the data’s class distribution, then lock in out‑of‑time validation. For supervisors, the implication is governance: mandate disclosure of validation design (split logic, time windows), class imbalance handling, and performance metrics beyond accuracy at a single cutoff (AUC, Type I/II errors).

Sectoral heterogeneity isn’t a detail – it’s a design principle

One of the dissertation’s most consequential findings is that financial SMEs and non‑financial SMEs exhibit different predictive structures, and models don’t transfer symmetrically. Non‑financial firms are most sensitive to revenue and efficiency metrics (e.g., Sales/Total Assets, Retained Earnings/Total Assets), whereas financial SMEs tilt toward profitability and balance‑sheet composition (e.g., Net Income/Total Assets, Working Capital/Total Assets). When you apply a model trained on non‑financial SMEs to financial SMEs, performance collapses toward AUC ≈ 0.50 (i.e., coin‑flip). The reverse transfer is better, but still suboptimal.

Policy and practice implication

Housing “all SMEs” in a single rating architecture is not just suboptimal, it can be misleading and pro‑cyclical. Sector‑specific early‑warning systems, stress‑tested across interest coverage thresholds (ICR), should be the default. Credit programs and guarantees should embed sector‑calibrated models or require portfolio segmentation prior to capital allocation.

The underused asset – contextual information

Perhaps the most headline‑worthy result is the predictive value of non‑financial data, not soft anecdotes, but structured variables on management and governance. CEO education, tenure, and age are associated with lower default probability; foreign CEO status correlates positively with risk; firm age, board structure, and ownership concentration also matter. 

Importantly, for young and small firms, non‑financial‑only models can outperform financial‑only models in out‑of‑time tests (Cheraghali, Paper 4). Why? New firms lack deep financial histories; ratios are noisier and less informative. Upper‑echelons theory and agency theory aren’t just academic references, they point to persistent managerial fingerprints in risk‑taking and resilience. In ML terms, these features add orthogonal signal that raises AUC and stabilizes rankings.

Practical guidance:

  • For micro and early‑stage SMEs, collect governance and CEO attributes at 

onboarding; treat them as primary features, not optional add‑ons.

  • In mature portfolios, use non‑financial data to complement ratios, improve rank‑ordering, and reduce Type II errors (missed defaults) without hiking false positives.

Interpretation vs accuracy

A common concern—especially in regulated banking—is interpretability. Cheraghali acknowledges this, but points to modern tools like SHAP (Shapley Additive Explanations) that decompose complex model predictions into feature contributions at the instance level. 

Combined with rigorous validation, these tools can make LightGBM/XGBoost explainable enough for audit and challenger models—without sacrificing out‑of‑time performance. The trade‑off is no longer binary; the governance challenge is to develop model risk frameworks that accept documented ML with transparent post‑hoc explanations.

Where to raise the bar

  • Academia: Standardize reporting: class imbalance, feature selection method, validation design (hold‑out vs cross‑validation vs out‑of‑time), and AUC + error trade‑offs. Benchmark against public datasets (e.g., Compustat) to enable replication and meta‑learning.
  • Practitioners: Build sector‑specific scorecards; integrate non‑financial data—especially for young firms; measure costs of misclassification (Type I vs Type II) explicitly and set cutoffs accordingly.
  • Policymakers: Facilitate access to high‑quality administrative and registry data (financial + governance + leadership demographics); encourage disclosure of model design choices; support national early‑warning systems that can be segmented by sector, size, and firm age.

The next step-change in SME risk assessment won’t come from yet another ratio, but from better methods, smarter segmentation, and richer information about the people and structures that run firms. If lenders and regulators embrace that trio, validated ML, sector‑specific models, and non‑financial features, credit allocation can become both more accurate and fairer across the SME spectrum.

Sources

Altman, E. I. (1968). Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. The Journal of Finance, 23(4), 589–609. https://doi.org/10.1111/j.1540-6261.1968.tb00843.x

Cheraghali, H. (2025). SME Default Dynamics: Methodological Advances and the Value of Non‑Financial Information(PhD thesis). University of Stavanger School of Business and Law.

Cheraghali, H., & Molnár, P. (2023). SME default prediction: A systematic methodology-focused review. Journal of Small Business Managementhttps://doi.org/10.1080/00472778.2023.2277426

Cheraghali, H., & Molnár, P. (2024b). SME default prediction: A systematic methods evaluation. Journal of Small Business Managementhttps://doi.org/10.1080/00472778.2024.2390500

Ohlson, J. A. (1980). Financial ratios and the probabilistic prediction of bankruptcy. Journal of Accounting Research, 18(1), 109–131. https://doi.org/10.2307/2490395

Additional theoretical references cited within the dissertation include Hambrick & Mason (1984), Jensen & Meckling (1976), and Yermack (1996) on managerial and governance influences; and SHAP explainability methods referenced in the context of ML interpretability.

Source: SME Default Dynamics: Methodological Advances and the Value of Non-Financial Information – Norwegian Research Information Repository

© 2024 forumNordic. All rights reserved. Reproduction or distribution of this material is prohibited without prior written permission. For permissions: contact (at) forumnordic.com