Spice4Cars

Specify an ML test approach suitable to provide evidence for compliance of the trained ML model and the deployed ML model with the ML requirements. The ML test approach includes

ML test scenarios with distribution of data characteristics (e.g., gender, weather conditions, street conditions within the ODD) defined by ML requirements,
the distribution and frequency of each ML test scenario inside the ML test data set,
the expected test result per test datum,
the pass/fail criteria of the testing,
the entry and exit criteria of the testing,
the approach for data set creation and modification, and
the required testing infrastructure and environment setup.

Note 1: An expected test result per test datum might require the labeling of test data to support the comparison of the output of the ML model with the expected output.
Note 2: Test datum is the smallest amount of ML data which is processed by the ML model into only one output. E.g., one image in photo processing or an audio sequence in voice recognition.
Note 3: Data characteristic is one property of the ML data that may have different expressions in the ODD. E.g., weather condition may contain expressions like sunny, foggy or rainy.
Note 4: An ML test scenario is a combination of expressions of all defined data characteristics, e.g., weather conditions = sunny, street conditions = gravel road.

Create the ML test data set needed for testing of the trained ML model and testing of the deployed ML model from the ML data collection provided by SUP.11 considering the ML test approach. The ML test data set shall not be used for training.
Note 5: The ML test data set for the trained ML model might differ from the test data set of the deployed ML model.
Note 6: Additional data sets might be used for special purposes like assurance of safety, fairness, robustness.

Linked Knowledge Nuggets:
arrow_forward "What's the difference between "corner cases" and "unexpected cases"?"

arrow_forward "Which test data sets to be used for ML testing?"

person Author: Process Fellows
The ML test data set is used for the final testing of the trained ML model and the deployed ML model. The ML test data set must not be used for training! This means that no significant changes/optimizations may be made based on the ML test data set. This is because with every optimization, information about the data set quickly finds its way into the model, leading to overfitting to the data set used.
If the test fails and optimization of the ML model is required, it must be ensured that the ML test data set remains reliable in order to guarantee compliance with ML requirements. Therefore, a change to the ML test data set may be necessary.

Regression tests:

Aim: to ensure that the deployed model (i.e. transferred to the target hardware) delivers the same results as the original model (on the training platform).
Causes for deviations: numerical differences caused by hardware-specific implementations (e.g. floating-point arithmetic, quantization).
How is it done?
- Same test data set is used for testing the trained model and the deployed model.
- Outputs are compared: if there are deviations, they are analyzed to see if they are within an acceptable tolerance. Example: If the prediction error < 1%, the deployed model is considered as “stable” (i.e., acceptance criteria to be defined as part of MLE.1!)
- Remark: In safety-critical applications (e.g. autonomous driving, medical technology), even small differences may be unacceptable.

Additional test data:

The neural network is now running in a real environment, HW-specific aspects need to be tested, e.g.
- Performance tests: speed, memory consumption, latency times
- Robustness tests: behavior in case of heat, voltage fluctuations, memory errors
- Edge cases / malfunctions: e.g. how does the model behave with incorrect or noisy inputs on the target hardware?
How is it done? Test data sets can be extended by:
- Inputs with higher noise or extreme values
- Live data from the target hardware (e.g. sensor data instead of static test images)
- Performance tests under load

Test the trained ML model according to the ML test approach using the created ML test data set. Record and evaluate the ML test results.
Note 7: Evaluation of test logs might include pattern analysis of failed test data to support, e.g., trustworthiness.

Derive the deployed ML model from the trained ML model according to the ML architecture. The deployed ML model shall be used for testing and delivery to software integration.
Note 8: The deployed ML model will be integrated into the target and may differ from the trained ML model which often requires powerful and uses interpretative languages.

Test the deployed ML model according to the ML test approach using the created ML test data set. Record and evaluate the ML test results.

Ensure consistency and establish bidirectional traceability between the ML test approach and the ML requirements, and the ML test data set and the ML data requirements; and bidirectional traceability is established between the ML test approach and the ML test results.
Note 9: Bidirectional traceability supports consistency, and facilitates impact analyses of change requests, and verification coverage demonstration. Traceability alone, e.g., the existence of links, does not necessarily mean that the information is consistent.

Linked Knowledge Nuggets:
arrow_forward "Consistency vs. Traceability – What’s the Difference?"

arrow_forward "The role of traceability in risk control"

arrow_forward "The true benefit of traceability "

Summarize the ML test results of the ML model. Inform all affected parties about the agreed results and the deployed ML model.

Evidence of interpersonal communication.

Identifies:

Scope of information
Need for feedback, for example an expected confirmation within one week
Meta data, for example time when communication was done or how information was distributed.

Includes:

Personal information
Work-flows, for example within tools

Examples and References:

E-mails and other forms of memos
Verbal statements
Meeting minutes, for example in standups
Electronic media, for example webcasts, blog posts intranet forum
Chat protocols
Wiki pages
Photo protocol

Used by these processes:

ACQ.4 Supplier Monitoring
HWE.1 Hardware Requirements Analysis
HWE.2 Hardware Design
HWE.3 Verification against Hardware Design
HWE.4 Verification against Hardware Requirements
MAN.3 Project Management
MLE.1 Machine Learning Requirements Analysis
MLE.2 Machine Learning Architecture
MLE.3 Machine Learning Training
MLE.4 Machine Learning Model Testing
PIM.3 Process Improvement
REU.2 Reuse of Products
SUP.1 Quality Assurance
SUP.11 Machine Learning Data Management
SWE.1 Software Requirements Analysis
SWE.2 Software Architectural Design
SWE.3 Software Detailed Design and Unit Construction
SWE.4 Software Unit Verification
SWE.5 Software Component Verification and Integration Verification
SWE.6 Software Verification
SYS.1 Requirements Elicitation
SYS.2 System Requirements Analysis
SYS.3 System Architectural Design
SYS.4 System Integration and Integration Verification
SYS.5 System Verification
VAL.1 Validation

Used by these process attributes:

PA2.1 Process performance management process attribute

Evidence of information to be semantically coherent alongrelevant artifacts, ensuring completeness, purpose maturity of processes, and products throughout their lifecycle.

Identifies:

Traceability information, for example hyperlinks, repository location or editorial references.
Naming conventions
Relevant artifacts
Revision and revision history information
Change documentation and analysis information

Includes:

Meta-information, for example database identifiers notes in Git commits comments

Examples and References:

Evidence of Definition of Done (DoD) adherence.

Used by these processes:

HWE.1 Hardware Requirements Analysis
HWE.2 Hardware Design
HWE.3 Verification against Hardware Design
HWE.4 Verification against Hardware Requirements
MAN.3 Project Management
MLE.1 Machine Learning Requirements Analysis
MLE.2 Machine Learning Architecture
MLE.3 Machine Learning Training
MLE.4 Machine Learning Model Testing
SUP.8 Configuration Management
SUP.10 Change Request Management
SWE.1 Software Requirements Analysis
SWE.2 Software Architectural Design
SWE.3 Software Detailed Design and Unit Construction
SWE.4 Software Unit Verification
SWE.5 Software Component Verification and Integration Verification
SWE.6 Software Verification
SYS.2 System Requirements Analysis
SYS.3 System Architectural Design
SYS.4 System Integration and Integration Verification
SYS.5 System Verification
VAL.1 Validation

Deployed ML model

Identifies:

The source code derived from the trained ML model that shall be executed in the target (System = A collection of interacting components organized to accomplish a specific function or set of functions within a specific environment.).

Examples and references:

It may differ from the trained ML model which often requires powerful (Hardware = Assembled and interconnected electrical or electronic hardware components or parts which perform analog or digital functions or operations.) and uses interpretative languages.
The deployed ML model is usually written in programming languages like C/C++.

Used by these processes:

MLE.4 Machine Learning Model Testing

Selection of data for machine learning training and validation, or test of a machine learning model.

Identifies:

Patterns
Relationships
Features

Includes:

Annotations and labels
ML Training and Validation Data Set
ML Test Data Set

Used by these processes:

MLE.3 Machine Learning Training
MLE.4 Machine Learning Model Testing

Approach describing criteria and activities to test ML models.

Identifies:

ML test scenarios
Distribution of data characteristics, for example gender of persons or weather conditions
Related ML requirements
Pass/fail criteria
Entry and exit criteria
Environment setup and configuration

Includes:

References, for example with test data set

Used by these processes:

MLE.4 Machine Learning Model Testing

Results of ML test activities

Identifies:

Test data and logs
Test data with correct results
Test data with incorrect results
Test data not executed, and a rationale

Includes:

Information about the test execution (date, participants, model version etc.)
Abstraction or summary of ML test results

Used by these processes:

MLE.4 Machine Learning Model Testing

MLE.4 Machine Learning Model Testing