Knowledge Nugget

Examples of the impact of data requirements on ML architecture
person Author: Process Fellows
  • Amount of data: A linear regression model requires significantly less data than a “deep neural network”
  • Faulty or noisy data (e.g. sensor data) require pre-processing.
  • Data structure in combination with the problem to be solved has an impact too:
    • Tabular data → Decision trees or gradient boosting (e.g. XGBoost)
    • Images → Convolutional Neural Networks (CNNs)
    • Texts → Transformer models (e.g. BERT, GPT)
    • Time series → Recurrent neural networks (RNNs) or Long Short-Term Memory (LSTMs) / Gated Recurrent Units (GRUs, similar to LSTM but less computational power needed)
  • Data availability:
    • Real time applications (e.g. autonomous driving): low latency, optimized neural networks
    • Batch processing (e.g. recommendation system for Netflix): More time for complex models, deeper architectures are possible.
  • Feature Dimensionality: Models for high-dimensional data require special techniques for reduction, e.g.
    • image data (millions of pixels per image) → CNNs with feature extraction.
Mapped with these items:
  • Automotive SPICE 4.0
    • MLE.2 Machine Learning Architecture
    • MLE.2.BP1 Develop ML architecture.