Hello and welcome to a milestone edition of the AWS open source newsletter, #150. Over two hundred thousand words later, thousands of contributors, hundreds of new open source projects, I hope this newsletter brings as much joy for readers as it does for me to put this together. Thank you all for your amazing support so far. What do we have in store for you this week? Yup, more great new projects and content. This week we have projects such "ZeusCloud" an open source solution for risk management, "balcony" a nice tool for cli wizards, "emr-trino-autoscale" for those of you running Trino, "amazon-kinesis-data-analytics-blueprints" a nice selection of ready made patterns for Apache Flink, and more! Also featured are some of your favourite open source projects, including Apache Hudi, Apache Iceberg, Kubernetes, FFMpeg, Apache Airflow, Protobuf, MySQL, MariaDB, Amazon Corretto, Delta Lake, and more.
Before diving into the newsletter, take a moment to read this post from Lukonde Mwila and Chris Short for those of you running Kubernetes workloads.
ACTION: Read this if you use Kubernetes
I have shared this before, but now my colleagues Lukonde Mwila and Chris Short have put together this post, Changes to the Kubernetes Container Image Registry to raise awareness of some critical changes to the Kubernetes image registries that are happening very soon and that you need to review and action. The post provides some tips on what you need to do, so do yourself a favour and make sure you read this before skipping on.
Please please please take 1 minute to complete this short survey and get some exclusive content as a thank you.
The articles and projects shared in this newsletter are only possible thanks to the many contributors in open source. I would like to shout out and thank those folks who really do power open source and enable us all to learn and build on top of what they have created.
So thank you to the following open source heroes: Ivica Kolenkaš, Franck Pachot, Guillaume Marchand, Daisy Riley, Oğuzhan Yılmaz, Veena Vasudevan, Soumil Shah, Sashi Varanasi, Lakshmi Peri, Noritaka Sekiyama, Scott Long, Sean Ma, Surendar Munimohan, Chelluru V N S S Vidyadhar, José Gardiazabal and Donato Azevedo
The great thing about open source projects is that you can review the source code. If you like the look of these projects, make sure you that take a look at the code, and if it is useful to you, get in touch with the maintainer to provide feedback, suggestions or even submit a contribution.
ZeusCloud is a new (to me at least) tool to help you discover, prioritise, and remediate your risks in the cloud. What does it do? Well according to the README, it: builds an asset inventory of your AWS accounts, continuously monitors your environments for misconfigurations and attack paths, lets you customise security and compliance controls to fit your needs, provides contextual information to help you prioritise and remediate security findings, and helps you to meet compliance standards such as PCI DSS, CIS, and more!
Check the repo for demo gifs of what this looks like.
balcony is an open source Python based CLI tool from Oğuzhan Yılmaz for cli wizards and fans, that simplifies the process of enumerating AWS resources. Use it to effortlessly automate the managed of the AWS resources in your account. I installed it and was up and running in minutes, using the excellent documentation to get me going. The README has plenty of examples of how you can use this tool, as well as a super nice documentation site complete with sample recipes.
cronrange.info this project is not AWS specific, but I love what Ivica Kolenkaš has put together here - the essential guide to undersanding cron patterns. Run your cron settings through his script, and it will helpfully output samples to make sure it is what you expect. The docs contain plenty of examples, but great for testing out before you put your crontab together.
visualize-iam-access-analyzer-policy-validation-findings provides code that helps you build a useful dashboard in Quicksight to visualise the policy validation findings from AWS Identity and Access Management (IAM) Access Analyzer. The repo shares a supporting blog post to help you get started deploying this project.
emr-trino-autoscale This project provides a custom auto-autoscaling for Amazon EMR on EC2 clusters running with Trino. The utility can be installed on external EC2 instances or on the Amazon EMR master of the cluster you want to control. The package support out of the box Instance Groups and Instance Fleets clusters with On Demand / SPOT instances. Check out the doc for features this project provides, such as scaling based on cluster CPU utilisation, metrics collection using Trino JMX REST API, scaling hints using Trino required_workers SESSION parameter, and more!
aws-glue-connector-google-spreadsheet This repository provides a tutorial on how to use AWS Glue for Python Shell to access your Data in Google Spreadsheet and import it to your Amazon S3 bucket.
amazon-kinesis-data-analytics-blueprints are a curated collection of Apache Flink applications. Each blueprint will walk you through how to solve a practical problem related to stream processing using Apache Flink. These blueprints can be leveraged to create more complex applications to solve your business challenges in Apache Flink.
aws-cdk-localgov-drupal-fargate-efs-auroraserverlessv2 provides a nice way to automate the deployment of Drupal using AWS CDK. The stack will deploy the infrastructure needed as well as deploying Drupal using Amazon ECS and Fargate.
sqs-fargate-eventbridge-pipe provides a sample for how to invoke a Fargate Task from SQS using Eventbridge Pipes. To make it super simple, rhlarora84 has packaged this up as a CDK app. Once setup, you can kick off your ECS Task as simple as sending an SQS message.
I have featured previous Kubernetes posts from Franck Pachot, and this time he is back with another instalment of his Kubernetes on AWS series. In this post, he dives deep into a powerful but less known feature of the Linux Kernel: Pressure Stall Information (PSI) and how you can use this to understand capacity and resource patterns in your workloads. [hands on]
I put pen to digital paper this week and wrote about a very cool open source project called Data on EKS (or DoEKS) which provides reference patterns for the installation of many common open source data tools. In this short post, Self managed Apache Airflow with Data on EKS, I share my experiences of getting this up and running (including the messy bits when it goes horribly wrong!). [hands on]
FFmpeg is an open source and widely used utility for handling video. I have used it myself many times to create time lapse videos capturing frames from my Raspberry Pi. In the post, Create a Managed FFmpeg Workflow for Your Media Jobs Using AWS Batch, Guillaume Marchand and Daisy Riley show you how to integrate FFmpeg with AWS Services to build a more easily managed FFmpeg. They have created an open source solution to deploy FFmpeg packaged in a container and managed by AWS Batch, which you can then execute an FFmpeg command as a job through a REST API. Very nice! [hands on]
Other posts and quick reads
- Optimize costs by moving your self-managed open source software (OSS) Redis workloads to Amazon ElastiCache looks at why you should consider moving your self-managed open source software (OSS) Redis workloads to managed Amazon ElastiCache for Redis
- How to build smart applications using Protocol Buffers with AWS IoT Core shares how you how working with Protobuf on AWS IoT Core is as simple as writing a SQL statement [hands on]
- Improve query performance using Optimized Reads on Amazon RDS for MySQL and Amazon RDS for MariaDB dives deep into how MySQL and MariaDB engines process complex queries and how the Optimized Reads feature can improve the performance of complex queries [hands on]
- Introducing native support for Apache Hudi, Delta Lake, and Apache Iceberg on AWS Glue for Apache Spark, Part 2: AWS Glue Studio Visual Editor shows you how to process Apache Hudi datasets using the AWS Glue Studio visual editor
- Interact with Apache Iceberg tables using Amazon Athena and cross account fine-grained permissions using AWS Lake Formation is a hands on, deep dive look at fine-grained access control policies in Amazon Athena queries for data stored in Apache Iceberg table format [hands on]
Amazon Corretto 20
You can now download Corretto 20 from the Amazon Corretto downloads page. This latest version supports the most recent OpenJDK feature release and is available on Linux, Windows, and macOS.
Highlights of OpenJDK 20 include a second preview of Record Patterns, which are used to more easily work with record-based objects. You can use record patterns and type patterns together to create more powerful data navigation. Virtual threads are also in their second preview. Virtual Threads will make it easier to write multi-threaded applications. OpenJDK 20 also introduces a new incubation feature for scoped values, which allows you to share data between threads. OpenJDK 20 includes a second preview of the Foreign Function & Memory API, which makes it easier to integrate with native code. There are updates to the Pattern Matching for switch statements preview feature and the incubating Vector API. Structured Concurrency, which makes tasks distributed over multiple threads appear as a single unit of work, is now in its second incubation release.
Amazon Relational Database Service (RDS) Proxy is a fully managed and highly available database proxy for Aurora and RDS databases. RDS Proxy helps improve application scalability, resiliency, and security. RDS Proxy now supports PostgreSQL major version 15. PostgreSQL 15 includes new features such as the SQL standard "MERGE" command for conditional SQL queries, performance improvements for both in-memory and disk-based sorting, support for two-phase commit, and row/column filtering for logical replication. The PostgreSQL 15 release also adds support for server-side compression with Gzip, LZ4, or Zstandard (zstd) using pg_basebackup. For more details about this release, refer to the PostgreSQL community announcement. You can also enforce SCRAM (Salted Challenge Response Authentication Mechanism) password-based authentication for your proxy.
.NET on AWS
Dive deep of Apache Iceberg on AWS
In this video, Veena Vasudevan looks at some of the common challenges of using traditional file formats on premises and how leveraging Apache Iceberg on AWS helps you overcome these challenges. You will also learn about the comprehensive and advanced features of Apache Iceberg with elaborate demos that showcase the unique capabilities of Apache Iceberg on AWS.
I discovered a new series of short videos from Soumil Shah that looks at how you can leverage Apache Hudi on AWS to build an open source data lake. There are lots of topics covered, and this is perfect for the technical viewer.
Check out the full playlist here and will leave you with the opening session.
Build on Open Source
For those unfamiliar with this show, Build on Open Source is where we go over this newsletter and then invite special guests to dive deep into their open source project. Expect plenty of code, demos and hopefully laughs. We have put together a playlist so that you can easily access all (eight) of the episodes of the Build on Open Source show. Build on Open Source playlist
If you are planning any events in 2023, either virtual, in person, or hybrid, get in touch as I would love to share details of your event with readers.
Build on Open Source
March 31st, twitch.tv/aws
The fourth episode of Build on Open Source features special guests Devansh Bawari and Saurav Pathak from Bagisto, a leading open source e-commerce technology. See you there on twitch.tv/aws, Friday 31st March at 9am GMT, 10am CET.
April 13th-15th, Singapore
FOSSASIA Summit 2023 returns as an in-person and online event, taking place from Thursday 13th April to Saturday 15th April at the Lifelong Learning Institute in Singapore.
If you are interested in attending in person, or virtually, find out more about the event at the FOSSASIA Summit 2023 page.
AWS Community Nordics
April, 20th Helsinki
The AWS Community Day Nordics is a free full day event for AWS users to come together to network, learn from each other and get inspired. The event is organised by the community - for the community. The cfp is currently open, so if you are in the area and want to talk then here is your chance. Check out the full event details and save your space here, AWS Community Nordics registration page
Reducing the costs of your openCypher applications
May 8th, 4pm UK - online
openCypher is an open-source project for creating graph applications. Neptune supports openCypher graph query language, and in this webinar you will learn more about the cost benefits for moving openCypher workloads to Neptune serverless. With Neptune serverless, customers can see up to 90% cost savings compared to provisioning for peak capacity. A demo of Neptune in action will be included in this session.
Head over to the You Tube holding page, Reducing the costs of your openCypher applications
Every other Thursday, next one 16th February
The Cortex community call happens every two weeks on Thursday, alternating at 1200 UTC and 1700 UTC. You can check out the GitHub project for more details, go to the Community Meetings section. The community calls keep a rolling doc of previous meetings, so you can catch up on the previous discussions. Check the Cortex Community Meetings Notes for more info.
Every other Tuesday, 3pm GMT
This regular meet-up is for anyone interested in OpenSearch & Open Distro. All skill levels are welcome and they cover and welcome talks on topics including: search, logging, log analytics, and data visualisation.
Sign up to the next session, OpenSearch Community Meeting