• Test Data Management

    Get more information from your test data!

09.12.2024 15:00

ASAM ODS enabling AI/ML-solutions

ASAM International Conference, December 4-5, 2024, Munich, Germany

In most situation standardization is seen as a blocker when it comes to innovation and integration with latest technology trends. However, in this article we explain that the ASAM ODS standard – suited for long term and future-proof data management – already provides the needed building blocks for preparing and storing test data in a way that it is suitable for machine learning and how the data can be accessed with standard analytics and machine learning tools as well as getting support by AI agents.

[In der Blog-Übersicht wird hier ein Weiterlesen-Link angezeigt]

Process of Data Analysis

To identify the needed and existing building blocks we first take a look into the overall process of data analysis. Starting from problem definition, the process defines the steps to gain data driven insights and improvements:

  1. Collecting data from 
    different sources

  2. Formatting and
    storing the data

  3. Providing access
    to the data

  4. Using AI Tools

  5. Generate AI Models

  6. Predict Results,
    Improve Workflows, …




The individual steps can be split
into two sections: Data Management Requirements and Machine Learning (ML) Applications.


Let’s start with the data management requirements.


Data Management Requirements for Machine Learning

The three major requirements cover the following areas of data management:

  • Data Ingestion – Collecting data from different sources
  • Data Storage and Data Format
  • Data Access


The art of data management is to integrate those with the existing IT landscape, store the data efficiently, maintain the data context and finally provide the data in a way suitable for connecting existing and future tools. Let’s look into these three requirements individually.


Data Ingestion

Data ingestion is the process to connect data silos of your organization with your data management solution. Considering ASAM ODS as one possible solution you’ll find various way to ingest or import data into an ASAM ODS Server. For the use-case of keeping existing toolchains intact, an import mechanism is needed which keeps the data (files) in its original format and location and which provides the needed meta data to bring the data into the right context of the ASAM ODS server.


The ASAM ODS Extended Data API or short ASAM ODS ExD Plugins fulfill these requirements. Implemented as gRPC-services, with a simple API for retrieving bulk and meta data they are the ideal fit. In short with ASAM ODS ExD Plugins you can:

  • Access “any” kind of data
  • Avoid data conversion and data duplication
  • Keep existing toolchains alive
  • Use “any” programming language
  • Keep your R&D investment small

You can find examples on how to develop and test ASAM ODS ExD Plugins here


Data Storage and Data Format

Finding the right way of storing and formatting data is a task which can take a lot of time and resources when starting from scratch. Instead of finding your own way of storing data, we suggest to have a look into the ASAM ODS standard first. ASAM ODS defines the physical data storage and the according data access in a generic way, which allows the definition of your specific data needs by the help of a so-called base data model:

The base model introduces additional Sematic to define the meaning and interpretation of the data. Furthermore, by adding a data context and catalogs for recurring content, you have already defined a data ontology


Data Access

Data Access or Data Connectivity is needed to provide the data stored in the ASAM ODS server to the individual clients. ASAM ODS already offers, besides other, data access via a HTML-API. However, the existing API is very ASAM-tool specific and doesn’t seamlessly integrate with Python analysis and ML libraries and tools.


Let’s see in the next chapter how this issue is being addressed…


Connect Machine Learning Tools

To detail out the requirements of ML tools and applications, let’s look into the job profile and skills of a typical data scientist:

The list of tools can be summarized as:

  • Python or R and SQL
  • Machine Learning Algorithms
  • Tableau and Power BI
  • Spark and Hadoop


Python and Machine Learning Algorithms

Python is the “lingua franca” of data scientists and most machine learning libraries are available in Python. To make use of those libraries the data needs to be provided as DataFrames. This is where the open-source ODSBox fits in. As a small wrapper on top of the ASAM ODS API, it converts the typical ODS data to a DataFrame. Once ODSBox is being (pip-)installed, you’re all set using the most powerful Python machine learning libraries.


Power BI

The solution for importing ASAM ODS data in Microsoft Power BI follows the same concept. Using the existing Python data import, the DataFrames provided by ODSBox can be directly loaded by Microsoft Power BI.


Spark and Hadoop

To make your ASAM ODS data available in the Apache Hadoop eco system a different approach is needed. Analysis tools used in the Hadoop eco system deal best with data formatted as Apache Avro or Apache Parquet. The solution in this case comes directly from the ASAM ODS component stack.


The ASAM Big ODS associated standard defines ways on how data stored in an ASAM ODS server can be exported as Apache Avro or Apache Parquet files with the help of an XML definition file. The automation of this export can be addressed using Spark. In addition to exporting the data to the desired Hadoop data formats, Spark SQL and DataFrames are additional building blocks for big data analytics. 


Get Support by AI Agents

Now that we’ve shown, that ASAM ODS data can be made available for machine learning we can even go one step further by using AI agents such as Google Gemini or Microsoft Copilot to support us in our data analytic tasks. When using Jupyter Notebooks like the ones available in the Data Management Learning Path you can load those Notebooks into Google Colab, Github Codespaces or Microsoft Fabric. Now you can use the AI agents of those platforms to help you creating more sophisticated plots or generating data queries for your analysis tasks.

Conclusion and Executive Summary

As a conclusion one can say that the Data Management technology components of ASAM ODS support the general requirements for Machine Learning. In combination with Open Source software such as ASAM ODSBox, typical Python Machine Learning Tools and Analysis Libraries can be used by Data Scientists. Furthermore, ASAM ODSBox also bridges the gap to Microsoft Copilot, Google Gemini or other AI Agents to create solutions faster and more efficient – even for non-Data Scientists.


This blog post summarizes the presentation at the ASAM International Conference on December 4-5, 2024 in Munich, Germany.
You can download the original presentation here

Connected solutions

You can click on the links to get more information about the individual components.

Related topics