Welcome to regular and new readers alike, to the AWS open source newsletter episode #118.
This week we feature more new open source projects, such as "seed-farmer", and orchestration tool modelled after GitOps deployments, "aws-proton-plugins-for-backstage" Backstage plugins for interacting with AWS Proton, "dcv-gnome-shell-extension" is a GNOME Shell extension to provide functionalities required by NICE DCV, "simpleiot-arduino" an Arduino library to integrate with the SimpleIOT framework, "event-driven-weather-forecasts" an event driven weather forecasting demo, and many more.
We also have blog posts, tutorials and videos on topics that include Backstage, OpenSearch, Qiskit, Unified ID, Grafana, PostGraphile, CloudQuery, Apache Iceberg, Apache Airflow, AWS SAM, Babelfish for PostgreSQL and many more. Finally, make sure you check out the events section for the latest open source events, there are some great events coming up over the next week.
At Amazon we work backwards from our customers, and one of the ways we do that is collecting data to help us know what is important and what we should focus on.
To that end, please could you complete this simple, anonymous survey. The first 25 will get an AWS $25 credit code.
10th Annual Open Source Jobs Report
If you missed this last week, the Linux Foundation shared publication of its tenth iteration of the Linux Foundation’s Open Source Jobs Report for organisations, hiring managers, and, of course, open source professionals. It is well worth reading, you can grab the report via this link, The 10th Annual Open Source Jobs Report: Critical Skills, Hiring Trends, and Education. Always provides some interesting insights. One to pique your interest is that 86% of hiring managers say hiring open source talent is a priority for 2022. Read the report to find out even more interesting stats.
The articles and projects shared in this newsletter are only possible thanks to the many contributors in open source. I would like to shout out and thank those folks who really do power open source and enable us all to learn and build on top of what they have created.
So thank you to the following open source heroes: Syeda Marium Faheem, Niall Thomson, Siddharth Kothari, Kishore Dhamodaran, Mohit Mehta, Giovanni Matteo Fumarola, Jared Keating, Jack Ye, Romaric Philogène, Jordan Sullivan, Justin Leto, Sai Parthasaradhi, Rakesh Raghav, Veeranjaneyulu Grandhi, Prathap Thoguru, Kishore Dhamodaran, Yevgeny Pats, Arvind Raghu, Adam Solomon, J.D. Bean, Kanishk Prasad, Anuj Gupta, James Eastham , Álvaro Hernández and Bart Farrell.
The great thing about open source projects is that you can review the source code. If you like the look of these projects, make sure you that take a look at the code, and if it is useful to you, get in touch with the maintainer to provide feedback, suggestions or even submit a contribution.
seed-farmer is an open source orchestration tool that works with AWS CodeSeeder (see #97) and acts as an orchestration tool modelled after GitOps deployments. It has a CommandLine Interface (CLI) based in Python. It leverages modular code deployments (modules) leveraging manifests and deployspecs, keeping track of changes and applying changes as need / detected.
aws-proton-plugins-for-backstage This repository contains a set of Backstage plugins for interacting with AWS Proton. The plugins provide an entity card to display the status of an AWS Proton service, and A scaffolder action to create an AWS Proton service. See the blog post below for more details on getting started with this project.
dcv-gnome-shell-extension is a GNOME Shell extension to provide functionalities required by NICE DCV. NICE DCV is a high-performance remote display protocol that provides customers with a secure way to deliver remote desktops and application streaming from any cloud or data center to any device, over varying network conditions.
config-daily-report this repo contains a solution that you can use in order to generate a daily CSV report at specific time. The report will include new or changed resources, with a link to the AWS Config UI. The reporter is triggered using a Cloudwatch event, that will trigger a Lambda function. The Lambda will use SES to send an email.
simpleiot-arduino is an Arduino library is an easy way to connect and send/receive data to the cloud via the SimpleIOT framework. SimpleIOT abstracts out IoT device connectivity and hides the underlying details so you can focus on your application's unique features.
event-driven-weather-forecasts this demo is a fully automated cloud-native event driven weather forecasting. It uses the Unified Forecast System (UFS), a community-based, coupled, comprehensive Earth modelling system.
aws-nitro-enclaves-bidding-service is a Proof of Concept (POC) bidding service application will demonstrate the use of AWS Nitro Enclaves to perform computation on multiple sensitive datasets. We will utilise Nitro Enclaves with AWS Key Management Service (KMS) to create an isolated compute environment, allow the environment to process encrypted datasets from multiple parties, and return an output. The POC application will be centred around the scenario of real estate bidding where a bidding service will take in encrypted bids from two buyers and determine the highest bid on each property without disclosing the bid amounts to each buyer.
route53resolver-dns-firewall-automation-bring-your-own-lambda this project provides a serverless mechanism to update your AWS DNS Firewall lists.
amazon-redshift-streaming-workshop In this workshop, you will build a streaming analytics application using Amazon Redshift streaming ingestion. You will create a near-real time logistics dashboard using Amazon Managed Grafana to provide augmented intelligence and situational awareness for the logistics operations team. It connects to a Redshift cluster which uses Redshift streaming to analyse data from a Kinesis data stream.
Backstage is an open-source project, originally created at Spotify and now a CNCF incubating project, that provides a framework for building developer portals. AWS Proton is a managed service for platform engineers that helps you define, vend, and maintain infrastructure templates for self-service deployments. In this post, Provisioning infrastructure using the AWS Proton open-source Backstage plugin, Niall Thomson shares how AWS is contributing to the Backstage open source community with our AWS Proton plugin for Backstage. Check out the code repos above if you want to dive deeper, and follow along with the tutorial. [hands on]
In the post/tutorial, Tutorial: Build Search UI with OpenSearch and ReactiveSearch, Siddharth Kothari walks you through how to build search UIs powered by OpenSearch and ReactiveSearch.
Apache Iceberg is an open-source table format for data stored in data lakes. In the post, Build an Apache Iceberg data lake using Amazon Athena, Amazon EMR, and AWS Glue Kishore Dhamodaran, Mohit Mehta, Giovanni Matteo Fumarola, Jared Keating, and Jack Ye come together to share how to use Amazon EMR Spark to create an Iceberg table, load sample books review data, and use Athena to query, perform schema evolution, row-level update and delete, and time travel, all coordinated through the AWS Glue Data Catalog. [hands on]
Qovery Engine is an open-source abstraction layer library that turns easy application deployment on AWS (and other Cloud providers) in just a few minutes. The Qovery Engine is written in Rust and takes advantage of Terraform, Helm, Kubectl, and Docker to manage resources. If you have not checked this project out, I would encourage you to do so - it is pretty incredible what the team are doing. CEO and co-founder Romaric Philogène is a prolific blogger/vblogger, and last week he shared a super interesting post, The Top 10 AWS Architecture Built with Qovery in 2022. This weeks must read post.
Qiskit [kiss-kit] is an open-source SDK for working with quantum computers at the level of pulses, circuits, and application modules. In the post, Introducing the Qiskit provider for Amazon Braket, Jordan Sullivan shares an open sourced solution (a Qiskit provider for Amazon Braket) that allows users to take their existing algorithms written in Qiskit and run them directly on Amazon Braket. [hands on]
We have a number of posts this weeks covering PostgreSQL. First up we have Optimized bulk loading in Amazon RDS for PostgreSQL where Justin Leto presents several bulk data loading strategies optimised for Amazon Relational Database Service (Amazon RDS) for PostgreSQL.
Following that, Sai Parthasaradhi, Rakesh Raghav, and Veeranjaneyulu Grandhi walk you through how to validate database schema objects migrated from Db2 LUW to Amazon RDS for PostgreSQL or Aurora PostgreSQL in their post, Validate database objects after migrating from IBM Db2 LUW to Amazon Aurora PostgreSQL or Amazon RDS for PostgreSQL [hands on]
And finally, in the post Modernize database stored procedures to use Amazon Aurora PostgreSQL federated queries, pg_cron, and AWS Lambda Prathap Thoguru and Kishore Dhamodaran demonstrate how to modernise your stored procedures using Aurora PostgreSQL extensions such as postgres_fdw, pg_cron, and aws_lambda. [hands on]
PostGraphile & CloudQuery
Yevgeny Pats shares how to setup CloudQuery (an open-source cloud asset inventory tool powered by SQL) to build your cloud asset inventory in PostgreSQL and build a GraphQL API query layer with PostGraphile (an open source project that enables you to provide a lightning-fast GraphQL API backed primarily by your PostgreSQL database) on top of it, in his post/tutorial How to expose CloudQuery with PostGraphile
Unified ID 2.0
Unified ID 2.0 (UID2) is a deterministic identifier based on PII (for example, email or phone number) with user transparency and privacy controls. In the post, Introducing Unified ID 2.0 Private Operator Services on AWS Using Nitro Enclaves, Arvind Raghu, Adam Solomon, J.D. Bean, Kanishk Prasad, and Anuj Gupta collaborate to provides a brief overview of UID2 functionality and then show you how you can deploy this on AWS.
Other posts you might like from the past week
- Create cross-account, custom Amazon Managed Grafana dashboards for Amazon Redshift is a a step-by-step tutorial to use the Amazon Redshift data source plugin to visualize metrics from your Amazon Redshift clusters hosted in different AWS accounts using [hands on]
- Build and simulate a Mini Pupper robot in the cloud without managing any infrastructure uses a simple Mini Pupper sample application to demonstrate how to use AWS RoboMaker for robotics application development, testing, and simulation. [hands on]
- Troubleshooting Amazon EKS API servers with Prometheus walks you through some nice dashboards that can help you when you run into operational issues with your Kubernetes clusters
Amazon Aurora PostgreSQL-Compatible Edition now supports PostgreSQL major version 14 (14.3). PostgreSQL 14 includes performance improvements for parallel queries, heavily-concurrent workloads, partitioned tables, logical replication, and vacuuming. PostgreSQL 14 also improves functionality with new capabilities. For example, you can cancel long-running queries if a client disconnects and you can close idle sessions if they time out. Range types now support multi-ranges, allowing representation of non-contiguous data ranges, and stored procedures can now return data via OUT parameters. This release includes new features for Babelfish for Aurora PostgreSQL version 2.1.
Amazon Managed Streaming for Apache Kafka (Amazon MSK) now supports Apache Kafka version 3.1.1 and 3.2.0 for new and existing clusters. Apache Kafka 3.1.1 and Apache Kafka 3.2.0 includes several bug fixes and new features that improve performance. Some of the key features include enhancements to metrics and the use of topic IDs. MSK will continue to use and manage Zookeeper for quorum management in this release for stability.
A great summary and links to the videos from the Apache Airflow Summit from Jarek Potiuk, in his post Airflow Summit 2022 — The Best Of. Make sure you go through those, as there are some great sessions.
Babelfish for PostgreSQL
AWS Data Hero Álvaro Hernández and Bart Farrell show how you can use Postgres Babelfish (a SQL Server-compatible Postgres flavour) and Timescaledb (time-series extension for Postgres) we can deliver open source time-series native capabilities on top of SQL Server protocol, available to SQL Server users.
AWS SAM and .NET
James Eastham shares how to get started using AWS SAM to build serverless applications using .NET 6.
Apache Hudi: EMR on EKS
From the Apache Hudi community call, Syeda Marium Faheem shares the success story of how Bazaar Technologies leveraged Apache Hudi to build robust cost efficient data pipelines using EMR on EKS.
Build and Operate Containers Apps with AWS Copilot
June 25th 6pm Singapore time
In this session of the Angular PH and AWS Siklab Monthly Meetup, learn how to Build and Operate Containers Apps with AWS Copilot with speaker Donnie Prakoso, Senior Developer Advocate at AWS.
You still have time to register for this event as it will be live cast. Register via this link here
Observability: Open Source Solutions
June 28th, 10:00am - 2:15pm PDT
The AWS Monitoring and Observability Team invites you to participate in a hands on session with Amazon Managed Service for Prometheus, Amazon Managed Service for Grafana and AWS Distro for Open Telemetry. During this session you will use these services in creating workspaces, ingesting/querying metrics, logs and trace data and viewing in a dashboard you will set up. The afternoon will close with demo of what you have done and highlighting the value in MTTD, MTTI, MTTR and application performance.
This event is designed for those looking to implement AWS Observability using open-source services to visualize their data with native or 3rd party tools. Site reliability engineers, operations engineers, systems engineers, and DevOps. Familiarity with monitoring concepts such as logs, metrics, traces, alarms, and the dashboard is recommended, but not required.
Register via this page.
Distributed ML training using PyTorch on High-Performance Computing (HPC) clusters
June 29th 7:00am
By using PyTorch Fully Sharded Data Parallel (FSDP) library for distributed training with powerful Amazon EC2 instances and AWS ParallelCluster, you can easily implement distributed training architectures to accelerate training for large ML models. Attend this hands-on workshop to learn about best practices when deploying distributed training architectures on AWS using EC2 and PyTorch FSDP library.
Attend this webinar to learn how to get started with AWS on PyTorch, learn how to create a distributed training architecture using AWS ParallelCluster, and lLearn about PyTorch Fully Sharded Data Parallel (FSDP) library for distributed training.
Get more info and sign up, on the Building Deep learning applications using PyTorch on AWS registration page.
German CDK Happy Hour
June 29th 5pm CET
One for any German speaking open source readers, the CDK Happy Hour is an hour of sharing of the open source project AWS CDK.
Find out more and sign up by checking out the meetup registration page.
Deploy NextJS to AWS Amplify
June 30th, 7:00 pm Indonesia time
From our German community to our Indonesian community, join AWS Community Builder Rogers Dwiputra Setiady as he shares how to create a website using Next.js and deploy via CI/CD to AWS Amplify.
Find out more and sign up by clicking on the eventbrite link
AWS Tech Conference
30th June, 9am
Join this online conference that is being organised by the AWS User Group Ukraine. Among the talks you will find Darko who will be sharing three open source tools that will make your life working with AWS easier. Find out more about the other sessions and register, by heading over to the AWS User Group Ukraine home page. The event is free to attend, but you can also buy a number of different sponsorship tickets which will help fund the Ukrainian charity fund.
July 13-14, Madison, Wisconsin, USA
The Bioinformatics Open Source Conference (BOSC) has been held annually since 2000, and this year AWS is proud to be a platinum sponsor for this event. BOSC covers all aspects of open source bioinformatics software and open science, including (but not limited to) these topics, Open Science and Reproducible Research, Open Biomedical Data, Citizen/Participatory Science, Standards and Interoperability, Data Science Workflows, Open Approaches to Translational Bioinformatics, Developer Tools and Libraries, Inclusion, and Outreach and Training. This is a hybrid event (in person/virtual) and you find out more by checking out the event page, BOSC 2022
Every other Tuesday, 3pm GMT
This regular meet-up is for anyone interested in OpenSearch & Open Distro. All skill levels are welcome and they cover and welcome talks on topics including: search, logging, log analytics, and data visualisation.
Sign up to the next session, OpenSearch Community Meeting
Sept 21st, 2022 Seattle
Come to the first annual OpenSearchCon!
This day-long conference will be packed with presenters who build and innovate with OpenSearch. It doesn’t matter if you’re just getting started on your OpenSearch journey, running giant clusters, or contributing tons of code; the event is for everyone. Join us to celebrate the progress and look into the future of the project. Admission is free, and registration will be open in the next few weeks. All you will need to do is sign up, and get to Seattle!
Check out the full details, including signing up and location, at the meetup page here.