GitHub - DataCater/datacater: The developer-friendly ETL platform for transforming data in real-time. Based on Apache Kafka® and Kubernetes®.

Welcome to the real-time, cloud-native data pipeline platform based on Apache Kafka® and Kubernetes® that enables data and developer teams to unlock the full value of their data faster.

DataCater is a simple yet powerful approach to building modern, real-time data pipelines. According to reports of our users, data and dev teams save 40% of the time spent on crafting data pipelines and go from zero to production in a matter of minutes.

Users can choose from an extensive repository of filter functions, apply transformations, or code their own transforms in Python® to build their streaming data pipelines.

You find each component in this repository. See the File Structure section for orientation.

Please watch the following video if you are interested in a demo of our 2023.1 release:

Use Cases

DataCater excels at

Making real-time ETL pipelines accessible to data and developer teams
Supporting Python-based transforms for ETL and streaming use cases
Applying cloud-native principles to data development
Supporting a declarative pipeline definition, which enables DataOps and Continuous Delivery
Enabling the interactive development of ETL pipelines with minimal time to production

DataCater is not built for

EL or ELT pipelines with post-load transforms
Analytics use cases that make use of aggregations or multiple joins
Traditional batch processing

File Structure

├── .github            - Workflows for GitHub
├── filters            - Pre-defined filters
├── gradle             - Build configuration based on Gradle (https://spring.io/guides/gs/gradle/)
├── helm-charts        - Source code for public Helm Charts
│   ├── ct.yaml        - Chart Testing Configuration File (https://github.com/helm/chart-testing)
│   └── datacater      - The official DataCater Helm Chart
├── k8s-manifests      - Kubernetes (K8) resources
├── licenses           - Overview of the licenses of our dependencies
├── pipeline           - Reference implementation of a pipeline
├── platform-api       - The main application for DataCater's API
├── python-runner      - Our runner for Python-based filters and transforms
├── serde              - Our (de)serializers
├── transforms         - Pre-defined transforms
├── ui                 - A ReactJS application built on top of DataCater's API.
├── CONTRIBUTING.md    - Describes how you can contribute to the project
├── gradle.properties  - Build properties
├── gradlew            - Build Wrapper Script (https://docs.gradle.org/current/userguide/gradle_wrapper.html)
├── README.md          - The file you are reading
└── settings.gradle    - Build tool properties

Requirements

Make sure you have the following readily available before you proceed installing DataCater:

To start using DataCater

For the time being, we provide the following approach to start using DataCater in your infrastructure:

Via kubectl

Via kubectl

WARNING: Installation uses the default namespace!

The installation via kubectl uses the default namespace. If you wish to use a custom namespace, we recommend to install DataCater via Helm Chart or create the namespace upfront as described here.

kubectl apply -f k8s-manifests/minikube-with-postgres-ns-default.yaml

Wait until all services are running

kubectl get all --all-namespaces

Port-forward to service datacater-ui

kubectl port-forward svc/datacater-ui 8080:80

Browse to localhost:8080 in your browser. The default login credentials are admin:admin.

Uninstalling DataCater

If you ever want to remove DataCater or want to start over again, e.g. during development, we recommend the following steps depending on the installation routine you've chosen:

WARNING: We recommend to backup your data before proceeding

Via kubectl

kubectl delete -f k8s-manifests/minikube-with-postgres-ns-default.yaml

FAQ

How do I install DataCater into a dedicated namespace?

Create the namespace

kubectl create namespace datacater

Apply manifests with namespace option

kubectl apply --namespace=datacater -f <url>

How can I integrate DataCater with external data systems, like MySQL?

The open-core version of DataCater supports only Apache Kafka topics as sources and sinks for pipelines. If you need to integrate your pipelines with external data systems, please consider our Enterprise version, which offers connectors based on Kafka Connect. We can offer a trial to you.

How can I extend the list of transforms and filters?

You can introduce new transforms and filters by adding a folder to the directory transforms or filters. The new folder must contain a spec.yml and a transform.py or filter.py.

DataCater automatically loads all transforms and filters from these directories at startup time.

Please see our documentation for further information.

How can I contribute code changes?

Please have a look at our guide for contributors.

How can I submit feature requests?

Please open an issue in our GitHub repository. We will have a look at it to see whether it fits our product roadmap.

Do you offer a trial for the enterprise version?

Yes, please reach out to support@datacater.io to discuss options for a PoC project.

What are the features in Open Core vs. Enterprise version?

Feature	Open Core	Enterprise
API	✅
Interactive pipeline designer	✅
Pre-defined transforms	✅
Custom Python transforms	✅
Pre-defined filters	✅
Custom Python filters	✅
Declarative pipeline definitions	✅
User authentication	✅
CLI (coming soon)	✅
Collaboration and projects		✅
Plug & play connectors		✅
Data masking		✅
SAML/SSO		✅
RBAC		✅
Audit log		✅
Health notifications		✅

Support

We provide support and help in our Community Slack.

License

DataCater is source-available and licensed under the BSL 1.1, converting to the open-source Apache 2.0 license 4 years after the release.

Name		Name	Last commit message	Last commit date
Latest commit History 277 Commits
.github		.github
filters		filters
gradle/wrapper		gradle/wrapper
helm-charts		helm-charts
k8s-manifests		k8s-manifests
licenses		licenses
pipeline		pipeline
platform-api		platform-api
python-runner		python-runner
serde		serde
transforms		transforms
ui		ui
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.sdkmanrc		.sdkmanrc
CONTRIBUTING.md		CONTRIBUTING.md
INSTALLATION.md		INSTALLATION.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
load-image-k8s.bash		load-image-k8s.bash
logo.png		logo.png
settings.gradle		settings.gradle

License

DataCater/datacater

Folders and files

Latest commit

History

Repository files navigation

Use Cases

DataCater excels at

DataCater is not built for

File Structure

Requirements

To start using DataCater

Via kubectl

Uninstalling DataCater

Via kubectl

FAQ

How do I install DataCater into a dedicated namespace?

How can I integrate DataCater with external data systems, like MySQL?

How can I extend the list of transforms and filters?

How can I contribute code changes?

How can I submit feature requests?

Do you offer a trial for the enterprise version?

What are the features in Open Core vs. Enterprise version?

Support

License

About

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Languages