PCG logo
Case Study

Automating Dachser document processing flows using serverless AI services

About Dachser

In today’s fast-paced logistics industry, efficient document processing plays a vital role in ensuring smooth operations and minimizing bottlenecks. Dachser, a leading global logistics provider, provides integrated transportation services in Europe and on an intercontinental level. Comprehensive contract logistics services and industry-specific solutions round out the company’s range. A seamless shipping network—both in Europe and overseas—and fully integrated IT systems ensure intelligent logistics solutions worldwide.

The Challenge

One of the major challenges faced by Dachser is accurately classifying multipage documents that span various languages. Since they provide logistics solutions across borders they have to process numerous documents without a unified structure such as invoices, bills of lading, and certificates of origin. With documents arriving in unstructured formats such as PDFs, the automated solution must effectively classify and categorize these documents based on their content and context, regardless of the number of pages or the language used.

Furthermore, Dachser frequently receives collections of documents bundled together in a single PDF file. The challenge lies in extracting and separating these individual documents for further processing. The proposed solution must be capable of intelligently identifying and isolating different documents within a single PDF file, considering variations in document structure, formatting, and language.

Since this task required knowledge in natural language processing, data engineering and DevOps, Dachser decided to seek support from PCG to ensure a state-of-the-art implementation inside the AWS ecosystem.

The Solution

In order to address these challenges, Dachser required a scalable solution that leverages technologies such as optical character recognition (OCR) and natural language processing (NLP). This solution must efficiently process a large volume of unstructured documents, classify them accurately based on their content and language, and intelligently separate collections of documents within a single PDF file.

The initial step involved extracting text from a set of documents provided by Dachser using AWS Textract. In order to overcome the multi-language challenge, all the extracted text within the documents was translated into English with AWS Translate. This process ensured a standardized language for subsequent analysis and classification tasks.

Using the translated text, a training set was created to facilitate the training of a custom classifier model. The training set consisted of labeled examples, where each document was associated with a specific document type.

This training set was then submitted to the AWS Comprehend, which trained a custom classification model based on the labeled data. The trained custom classifier model was rigorously evaluated to assess its performance and effectiveness. Various evaluation metrics, such as accuracy, and F1 score, were computed to gauge the model’s ability to correctly classify documents. In the final step an inference pipeline was implemented, which can classify an input document using the trained model’s predictions.

The diagram below shows the complete architecture, which was designed with scalability and industry standards in mind.

image-ec9e05e9d7fb

Results and Benefits

The designed solution could classify up to 1000 documents concurrently with an accuracy of 95%. In order to reduce complexity, development time and operational overhead, the entire solution was created using serverless services provided by AWS. This approach, along with the documentation, allowed an easily maintainable product and provided room for further improvement even without a dedicated team of machine learning experts.

Summary

Dachser, a global logistics provider, partnered with PCG and tested a scalable solution within the AWS ecosystem to streamline their document management processes. The solution utilized OCR and NLP technologies to accurately classify multipage documents in various languages and extract individual documents from bundled PDF files. By leveraging AWS Textract for text extraction and AWS Comprehend for training a custom classification model, Dachser, with PCG’s support, was able to build up a state-of-the-art intelligent document processing solution in just a few weeks.

About PCG

Public Cloud Group (PCG) supports companies in their digital transformation through the use of public cloud solutions.

With a product portfolio designed to accompany organisations of all sizes in their cloud journey and competence that is a synonym for highly qualified staff that clients and partners like to work with, PCG is positioned as a reliable and trustworthy partner for the hyperscalers, relevant and with repeatedly validated competence and credibility.

We have the highest partnership status with the three relevant hyperscalers: Amazon Web Services (AWS), Google, and Microsoft. As experienced providers, we advise our customers independently with cloud implementation, application development, and managed services.


Services Used

Continue Reading

Article
Protecting Lambda URLs with Cognito, IAM, Lambda@Edge and CDK

In this article, we’ll look at how to secure Lambda URLs using IAM access control. With complete code to try yourself!

Learn more
Case Study
Education
Cloud Migration
Education
Transforming Robotics Research: RCCL's Migration to AWS

Discover how the Robotics, Automatic Control, and Cyber-Physical Systems Laboratory (RCCL) leveraged AWS to support their advanced research in robotics and IoT data analysis. Learn how they managed real-time sensor data, machine learning techniques, and MATLAB computations on a scalable, secure platform.

Learn more
Article
Securing APIs in an AWS Cloud Environment

In 2019, a major financial services company, Capital One, experienced a severe security breach caused by a misconfigured API. This breach exposed the personal data of over 100 million customers, including sensitive information such as names, addresses, and social security numbers. The incident not only inflicted substantial financial and reputational damage on the company but also underscored the critical importance of securing APIs in today’s interconnected world.

Learn more
Article
AWS Lambda: Avoid these common pitfalls

It's a great offering to get results quickly, but like any good tool, it needs to be used correctly.

Learn more
See all

Let's work together

United Kingdom
Arrow Down