Data

← Back to All Tags

Data

GPT vs Traditional NLP Models

The field of Natural Language Processing (NLP) has seen remarkable advancements in recent years, and the emergence of the Generative Pre-trained Transformer (GPT) has revolutionized the way NLP models operate. GPT is a cutting-edge language model that employs deep learning to generate human-like text. Unlike conventional NLP models, which required extensive training on specific tasks, GPT is pre-trained on vast amounts of data and can be fine-tuned for various NLP tasks

GPT vs Traditional NLP Models

Data

Ask ChatGPT about 20 important concepts of Apache Spark

I asked ChatGPT to explain 20 important concepts of Apache Spark. Let's see what it has to say!

Ask ChatGPT about 20 important concepts of Apache Spark

Rust

Rust Data Engineering: Processing Dataframes with Polars

If you're interested in data engineering with Rust, you might want to check out Polars, a Rust DataFrame library with Pandas-like API.

Rust Data Engineering: Processing Dataframes with Polars

Data

Data Engineering Tools written in Rust

This blog post will provide an overview of the data engineering tools available in Rust, their advantages and benefits, as well as a discussion on why Rust is a great choice for data engineering.

Data Engineering Tools written in Rust

Data

Why ClickHouse Should Be the Go-To Choice for Your Next Data Platform?

Recently, I was working on building a new Logs dashboard at Fossil to serve our internal team for log retrieval, and I found ClickHouse to be a very interesting and fast engine for this purpose. In this post, I'll share my experience with using ClickHouse as the foundation of a light-weight data platform and how it compares to another popular choice, Athena. We'll also explore how ClickHouse can be integrated with other tools such as Kafka to create a robust and efficient data pipeline.

Why ClickHouse Should Be the Go-To Choice for Your Next Data Platform?

Data

Airflow Dataset (Data-aware scheduling)

Airflow since 2.4, in addition to scheduling DAGs based upon time, they can also be scheduled based upon a task updating a dataset. This will change the way you schedule DAGs.

Airflow Dataset (Data-aware scheduling)

Data

Good reasons to use ClickHouse

More than 200+ companies are using ClickHouse today. With many features support, it's equally powerful for both Analytics and Big Data service backend.

Good reasons to use ClickHouse

Data

Postgres Full Text Search

Postgres has built-in functions to handle Full Text Search queries. This is like a "search engine" within Postgres.

Postgres Full Text Search