Knowledge Nugget

How can I optimize the ML model architecture after the training is done without negative impact of the model outcome quality?
person Author: Process Fellows
The ML model is typically considered as a black box! This question sounds like it would be impossible? Let's see what kind of techniques could be considered and used:
  • Pruning (weight thinning)
    • Idea: Removal of unimportant neurons or connections in the network (some weights have very little influence on the final result).
    • Various techniques (e.g. magnitude-based pruning) can be used to eliminate weights or entire neurons.
    • Test: Retraining or evaluation is used to check whether the prediction performance is maintained.
  • Quantization (reduction of the precision of the parameters)
    • Idea: Reducing the precision of the model parameters (e.g. instead of 32-bit float values for weights, reduce to 16-bit or even 8-bit).
    • Modern hardware (e.g. TensorFlow Lite or special AI accelerators) efficiently supports such quantized models.
    • Test: The accuracy of the quantized model is measured in comparison to the original model.
  • Knowledge distillation (knowledge transfer from large to small)
    • Idea: A large model (“teacher”) trains a smaller model (“student”).
    • A smaller model is trained to imitate the output of the larger model.
    • As a result, the smaller model can often make similar predictions, even if it has fewer parameters.
    • Test: Evaluation of the predictions of the “Student model” compared to the “Teacher model”.
    Tip: If TensorFlow or PyTorch is used, there are libraries such as TensorFlow Model Optimization or Torch Pruning that automate many of these techniques.
Mapped with these items:
  • Automotive SPICE 4.0
    • MLE.3.BP3 Create and optimize ML model.