Cirrus Minor - The nexus of technology, business & people

Skip to content

MemoryDB: Speed, Durability, and Composition.

MemoryDB: Speed, Durability, and Composition. Blocks are fun. Earlier this week, my colleagues Yacine Taleb, Kevin McGehee, Nan Yan, Shawn Wang, Stefan Mueller, and Allen Samuels published Amazon
MemoryDB: A fast and durable memory-first cloud database1. I’m excited about this paper, both because its a very cool system, and because it gives us an opportunity to talk about the power of composition...
Radius is Now a Cloud Native Compute Foundation (CNCF) Sandbox Project

The Microsoft Azure Incubations Team recently announced the approval of Radius as a Cloud Native Compute Foundation (CNCF) sandbox project. Radius is a cloud-native, cloud-agnostic application platform that the CNCF has recognized as having the potential to contribute to the cloud-native ecosystem. Radius empowers developers and platform engineers to collaborate easily in delivering and managing cloud-native applications that align with corporate...
Improving Vald Search Performance through Parameter Tuning

In our initial validation for Hybrid Search, we collectively ran a standalone search quality test using Vald and discovered a surprisingly low. With low search quality, it is not easy to pinpoint how much Hybrid Search impacts the overall search results. As a team, we decided first to optimize Vald’s parameters and improve its search quality to gain a clearer...
Insider Threat Protection: How DDR Can Help

In 2023, Tesla suffered a massive data breach that affected 75,000 employees whose data, including names, phone numbers, and Social Security Numbers were leaked. According to the media outfit to which the data was leaked, even billionaire CEO Elon Musk‘s Social Security number was included in the over 100 gigabytes of leaked data. Investigations identified two former employees as responsible for...
Event time skew in stream processing

As a data engineer you're certainly familiar with data skew. Yes, this bad phenomena where one task takes considerably more input than the others and often causes unexpected latency or failures. Turns out, stream processing also has its skew but more related to time.
Gap Analysis in Software Testing

submitted by /u/thumbsdrivesmecrazy [link] [comments]

Software architecture workshop (slides)

Vasa Ship

Software architecture workshop (slides)

Published by Arnon Rotem-Gal-Oz on November 29, 2023

The title says it all – These are slides from a session I was working on to explain the basics of software architecture based on…

Continue reading

pandas on spark apply_batch/transform_batch broken? (tl;dr; No – but it isn’t well documented)

pandas on spark apply_batch/transform_batch broken? (tl;dr; No – but it isn’t well documented)

Published by Arnon Rotem-Gal-Oz on October 16, 2022

Using pypark’s pandas integration via apply_batch and transform_batch is very powerful but lacking documentation can cause hard to trace bugs – hopefully my experience (below)…

Continue reading

Replacing Docker Desktop with hyperkit + minikube

Replacing Docker Desktop with hyperkit + minikube

Published by Arnon Rotem-Gal-Oz on September 2, 2021

Edit June 2023: Added a section on Colima MacOS is a Unix but it isn’t a Linux so, unfortunately, if/when we need to use linux-y…

Continue reading

Intro to Apache Spark (slides)

Intro to Apache Spark (slides)

Published by Arnon Rotem-Gal-Oz on December 16, 2020

I gave a general overview of Apache Spark to our R&D teams. You can find the slides below

Where is Apache Spark heading?

Where is Apache Spark heading?

Published by Arnon Rotem-Gal-Oz on December 4, 2020

I watched (COVID19-era version of “attended”) the latest spark Summit and in one of the keynotes Reynold Xin from Databricks, presented the following two images…

Continue reading