Cirrus Minor - The nexus of technology, business & people

Skip to content

Star Schema vs. Flat Wide Tables

Star Schema vs. Flat Wide TablesWhen designing data warehouses for large-scale data analysis, two main structures emerge: star schema and flat wide tables. While both can store data, they serve different purposes and have unique advantages and considerations.Lets get into it !!
Star Schema:Structure: A star schema resembles a star with a central fact table containing transactional data…
How to scale databases

submitted by /u/milanm08 [link] [comments]
Salesforce Unveils Zero Copy Partner Network, Offering New Open Data Lake Access via Apache Iceberg - Datanami

SAN FRANCISCO, April 25, 2o24 — Salesforce today announced the Salesforce Zero Copy Partner Network, a global ecosystem of technology and solution providers building secure, bidirectional zero copy integrations with Salesforce Data Cloud so that data can be actioned across the Salesforce Einstein 1 Platform. Salesforce also unveiled innovations including new zero copy support for open data lakes and lakehouses utilizing the...
MemoryDB: Speed, Durability, and Composition.

MemoryDB: Speed, Durability, and Composition. Blocks are fun. Earlier this week, my colleagues Yacine Taleb, Kevin McGehee, Nan Yan, Shawn Wang, Stefan Mueller, and Allen Samuels published Amazon
MemoryDB: A fast and durable memory-first cloud database1. I’m excited about this paper, both because its a very cool system, and because it gives us an opportunity to talk about the power of composition...
Radius is Now a Cloud Native Compute Foundation (CNCF) Sandbox Project

The Microsoft Azure Incubations Team recently announced the approval of Radius as a Cloud Native Compute Foundation (CNCF) sandbox project. Radius is a cloud-native, cloud-agnostic application platform that the CNCF has recognized as having the potential to contribute to the cloud-native ecosystem. Radius empowers developers and platform engineers to collaborate easily in delivering and managing cloud-native applications that align with corporate...
Improving Vald Search Performance through Parameter Tuning

In our initial validation for Hybrid Search, we collectively ran a standalone search quality test using Vald and discovered a surprisingly low. With low search quality, it is not easy to pinpoint how much Hybrid Search impacts the overall search results. As a team, we decided first to optimize Vald’s parameters and improve its search quality to gain a clearer...

Software architecture workshop (slides)

Vasa Ship

Software architecture workshop (slides)

Published by Arnon Rotem-Gal-Oz on November 29, 2023

The title says it all – These are slides from a session I was working on to explain the basics of software architecture based on…

Continue reading

pandas on spark apply_batch/transform_batch broken? (tl;dr; No – but it isn’t well documented)

pandas on spark apply_batch/transform_batch broken? (tl;dr; No – but it isn’t well documented)

Published by Arnon Rotem-Gal-Oz on October 16, 2022

Using pypark’s pandas integration via apply_batch and transform_batch is very powerful but lacking documentation can cause hard to trace bugs – hopefully my experience (below)…

Continue reading

Replacing Docker Desktop with hyperkit + minikube

Replacing Docker Desktop with hyperkit + minikube

Published by Arnon Rotem-Gal-Oz on September 2, 2021

Edit June 2023: Added a section on Colima MacOS is a Unix but it isn’t a Linux so, unfortunately, if/when we need to use linux-y…

Continue reading

Intro to Apache Spark (slides)

Intro to Apache Spark (slides)

Published by Arnon Rotem-Gal-Oz on December 16, 2020

I gave a general overview of Apache Spark to our R&D teams. You can find the slides below

Where is Apache Spark heading?

Where is Apache Spark heading?

Published by Arnon Rotem-Gal-Oz on December 4, 2020

I watched (COVID19-era version of “attended”) the latest spark Summit and in one of the keynotes Reynold Xin from Databricks, presented the following two images…

Continue reading