Examples of the impact of data requirements on ML architecture

Question

Accepted Answer

Amount of data: A linear regression model requires significantly less data than a “deep neural network”
Faulty or noisy data (e.g. sensor data) require pre-processing.
Data structure in combination with the problem to be solved has an impact too:
- Tabular data → Decision trees or gradient boosting (e.g. XGBoost)
- Images → Convolutional Neural Networks (CNNs)
- Texts → Transformer models (e.g. BERT, GPT)
- Time series → Recurrent neural networks (RNNs) or Long Short-Term Memory (LSTMs) / Gated Recurrent Units (GRUs, similar to LSTM but less computational power needed)
Data availability:
- Real time applications (e.g. autonomous driving): low latency, optimized neural networks
- Batch processing (e.g. recommendation system for Netflix): More time for complex models, deeper architectures are possible.
Feature Dimensionality: Models for high-dimensional data require special techniques for reduction, e.g.
- image data (millions of pixels per image) → CNNs with feature extraction.