<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Serverless Folks Blogs]]></title><description><![CDATA[Serverless Folks Blogs]]></description><link>https://blogs.serverlessfolks.com</link><image><url>https://cdn.hashnode.com/res/hashnode/image/upload/v1740051290900/1b798210-95f8-45f1-8301-518a6a79b50a.png</url><title>Serverless Folks Blogs</title><link>https://blogs.serverlessfolks.com</link></image><generator>RSS for Node</generator><lastBuildDate>Mon, 13 Apr 2026 05:41:32 GMT</lastBuildDate><atom:link href="https://blogs.serverlessfolks.com/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Amazon Bedrock Agentcore &  System Design]]></title><description><![CDATA[The recent general availability of Amazon Bedrock Agentcore marks a significant milestone in the evolution of AI-powered applications on AWS. While Bedrock has already established itself as a leading platform for building and scaling generative AI so...]]></description><link>https://blogs.serverlessfolks.com/amazon-bedrock-agentcore-and-system-design</link><guid isPermaLink="true">https://blogs.serverlessfolks.com/amazon-bedrock-agentcore-and-system-design</guid><category><![CDATA[distributed systems]]></category><category><![CDATA[bedrock agentcore]]></category><category><![CDATA[architecture]]></category><category><![CDATA[System Design]]></category><dc:creator><![CDATA[Omid Eidivandi]]></dc:creator><pubDate>Sat, 22 Nov 2025 10:18:36 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1763806450217/9c4f3ca4-9881-479d-94ac-1ffce5d9a274.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The recent <strong>general availability of Amazon Bedrock Agentcore</strong> marks a significant milestone in the evolution of AI-powered applications on AWS. While Bedrock has already established itself as a leading platform for building and scaling generative AI solutions, Agentcore pushes the ecosystem further by offering a more unified runtime for orchestrating agents, managing memory, integrating tools, and enabling complex workflows. This GA release doesn’t just expand AWS’s AI portfolio—it accelerates how quickly teams can design, deploy, and iterate on intelligent, autonomous systems.</p>
<p>Over the past months, a wave of technical content has emerged—deep dives into the Agentcore runtime, gateway integrations, memory persistence models, tool management, and more. These resources are incredibly valuable for understanding how to <em>use</em> the platform. But as the adoption curve steepens, it’s equally important to step back and look at the bigger picture: we are heading toward a world where <strong>distributed, specialized AI systems</strong> become the norm.</p>
<p>Agentcore makes it easier to build these components, but it also increases the risk of teams assembling complex systems without a solid architectural foundation. Without thoughtful <strong>system design, standardization, and distributed-systems thinking</strong>, organizations may unintentionally create fragile, siloed, or overly coupled AI ecosystems. The challenge—and opportunity—now is to balance speed of development with robust design principles that ensure these emerging agent-based architectures remain scalable, observable, secure, and maintainable.</p>
<p>This article explores that broader context: what Agentcore’s GA really means for distributed system design, why standardization matters now more than ever, and how teams can avoid the architectural pitfalls that come with rapid innovation.</p>
<h1 id="heading-distributed-design">Distributed Design</h1>
<p>Modern enterprise products are, at their core, <strong>distributed systems</strong>—whether teams consciously design them that way or simply evolve into them over time. As organizations scale, their applications naturally break into interconnected services, workflows, and data domains that must communicate reliably. In this environment, <strong>boundaries matter</strong>: clear ownership, well-defined responsibilities, and stable interfaces determine whether a system grows sustainably or accumulates silent complexities that surface only during failures.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763773970229/6b9397f0-618e-43aa-9937-28550b580a39.png" alt class="image--center mx-auto" /></p>
<p>Distributed systems introduce challenges that are fundamentally different from writing business logic. Engineers may thoroughly understand the domain, the models, and the rules they implement—yet still find themselves frustrated when the real difficulty emerges not from the logic itself, but from the interactions between components. Latency, partial failures, race conditions, version drift, inconsistent state, and unclear integration contracts can turn even well-designed services into unpredictable systems. This is often what makes distributed systems feel maddening: you can master the business logic and still get blindsided by emergent behavior rooted in architectural decisions.</p>
<p>These pitfalls are not signs of poor engineering but symptoms of systems where boundaries are blurred, responsibilities overlap, and integrations evolve informally rather than intentionally. Without deliberate system-design thinking—defining boundaries, standardizing communication patterns, enforcing contracts, and anticipating failure modes—distributed architectures tend to drift. Over time, they accumulate complexity that becomes harder to reason about, harder to extend, and harder to trust.</p>
<p>In today’s enterprise landscape, understanding distributed-systems principles is no longer optional. It is the foundation that ensures the entire product ecosystem remains resilient, observable, and adaptable as it grows.</p>
<h1 id="heading-core-concepts-of-distributed-systems"><strong>Core Concepts of Distributed Systems</strong></h1>
<p>Distributed systems succeed or fail not because of any single component, but because of how those components interact, evolve, and respond to real-world conditions. Mastering a few foundational concepts helps teams design systems that stay reliable as they grow, adapt to change, and avoid the hidden complexity that often emerges at scale. The following principles offer a baseline for building resilient, predictable, and maintainable distributed architectures.</p>
<p><strong>Communication:</strong> Communication is an unavoidable glue of distributed systems, and every interaction carries risks: latency, timeouts, and partial failures. Choosing clear patterns—sync, async, events—helps manage uncertainty and keep components predictable.</p>
<p><strong>Boundaries:</strong> Boundaries create separation of concerns, ensuring each service has a clear purpose. Well-defined boundaries reduce coupling and prevent hidden dependencies that make systems fragile and hard to evolve.</p>
<p><strong>Specialization and Ownership:</strong> As systems grow, components must specialize to solve focused problems. Strong ownership ensures each one is maintained, monitored, and improved consistently, preventing drift and conflicting assumptions.</p>
<p><strong>Contracts:</strong> Contracts define how components interact—what they expect, provide, and guarantee. Strong, explicit contracts reduce integration surprises and make changes safer in evolving architectures.</p>
<p><strong>Scalability:</strong> Scalability ensures systems can handle growth smoothly. Components should scale independently and avoid shared bottlenecks to prevent performance drops during spikes or expansion.</p>
<h1 id="heading-unintelligent-distributed-system">Unintelligent Distributed System</h1>
<p>Traditional distributed systems are powerful but inherently <strong>unintelligent</strong>: every service is specialized, yet entirely dependent on human-designed logic, rules, and constraints. Their behavior reflects the limitations of the engineers who built them—their time, cognitive bandwidth, domain knowledge, and ability to anticipate edge cases. Even well-architected systems can only operate within the boundaries explicitly encoded into them. They cannot adapt, reason, or make decisions beyond their predefined paths. This creates a landscape where services are efficient at what they were built for but rigid to change, reliant on manual intervention, and limited by the expertise available at design time.</p>
<h1 id="heading-doctor-vs-pharmacist">Doctor vs Pharmacist</h1>
<p>A helpful way to understand knowledge limitations in decision-making is to compare a doctor and a pharmacist. A pharmacist has deep, specialized expertise in medications—their composition, interactions, and safe usage. But when asked to diagnose a condition, their view is constrained; they lack the broader medical context required to interpret symptoms, evaluate risks, or consider alternative explanations. A doctor, on the other hand, combines a wide range of medical knowledge with diagnostic reasoning, allowing them to form a comprehensive picture before making a treatment decision. Both professionals are highly skilled, but the scope of their knowledge determines the quality and safety of their decisions. In complex systems, just like in healthcare, <strong>narrow expertise without broader understanding increases risk</strong>, while wider knowledge enables deeper analysis and more reliable outcomes.</p>
<h1 id="heading-amazon-bedrock-agentcore">Amazon Bedrock Agentcore</h1>
<p>Amazon Bedrock Agentcore introduces a standardized foundation for building intelligent, agent-driven systems within an enterprise ecosystem. The <strong>Agentcore runtime</strong> provides a unified execution environment where agents can reason, call tools, manage memory, and orchestrate multi-step workflows without developers having to reimplement these patterns for every use case. It abstracts away the complexity of handling context, chaining operations, and maintaining state, enabling teams to focus on the logic that defines their domain rather than the mechanics of agent interaction.</p>
<p>Complementing the runtime, the <strong>Agentcore gateway</strong> streamlines communication between agents, services, and external systems. It acts as a controlled interface layer—managing routing, authentication, rate limits, and standardized behavior across all agent interactions. This consistency reduces integration overhead and ensures that agents can be safely exposed within larger architectures.</p>
<p>Together, the runtime and gateway dramatically accelerate product development. Instead of building custom orchestration layers, memory managers, or tool-integration pipelines, teams can assemble specialized AI-powered services much faster. The result is a shift from manual glue code and bespoke patterns to a repeatable platform where intelligent, domain-specific components can be created, tested, and deployed with greater speed and reliability.</p>
<h1 id="heading-e-commerce-system">E-Commerce System</h1>
<p>To illustrate how Agentcore can accelerate intelligent system design, consider a typical e-commerce ecosystem. At its core are <strong>specialized services</strong>:</p>
<ul>
<li><p><strong>Product Service</strong> – manages catalog information, attributes, and inventory.</p>
</li>
<li><p><strong>Order Service</strong> – handles shopping carts, checkout, and payment processing.</p>
</li>
<li><p><strong>Shipping Service</strong> – coordinates delivery options, tracking, and logistics.</p>
</li>
<li><p><strong>Listing Page</strong> – serves as the customer-facing interface for product search and ordering.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763770507073/d9f10489-7f82-4717-b1f0-5ad7e98d015d.png" alt class="image--center mx-auto" /></p>
<p>In a traditional distributed setup, these services communicate via well-defined APIs. The Listing Page queries the Product Service to display items, calls the Order Service to process purchases, and interacts with the Shipping Service to show delivery estimates. Each service is specialized and efficient within its domain, but interactions are <strong>linear and rigid</strong>, and the customer experience is constrained by the explicit logic programmed into each component.</p>
<p>Agentcore introduces the potential for <strong>agentic orchestration</strong> across these services. Agents can dynamically coordinate tasks, enrich queries, and reason over multiple services simultaneously. For example, a conversational search agent could interact with the Product, Order, and Shipping services to provide contextual recommendations, personalized bundling, or delivery-aware product suggestions—all in a single, seamless interaction. This approach transforms rigid service chains into <strong>adaptive, intelligent workflows</strong>, enabling faster innovation and more engaging user experiences.</p>
<h1 id="heading-adopting-an-agentic-approach-with-agentcore">Adopting an Agentic Approach with Agentcore</h1>
<p>Agentcore enables an <strong>agentic layer</strong> over existing services using its <strong>runtime</strong> and <strong>gateway</strong> components. Here’s how the e-commerce system can leverage them:</p>
<ul>
<li><p><strong>Conversation Search Chat UI Component</strong> – Acts as the frontend interface for customers. It communicates with the <strong>Listing Agent</strong> via the <strong>Agentcore gateway</strong>, ensuring safe, standardized interaction with the agent layer without exposing backend complexity.</p>
</li>
<li><p><strong>Listing Agent (Runtime)</strong> – Maintains context and memory for the conversation, coordinates between agents, and manages multi-step workflows. It handles complex tasks like interpreting user queries, maintaining session state, and orchestrating data retrieval across the system.</p>
</li>
<li><p><strong>Ordering Agent (Runtime)</strong> – Interacts with both the Product and Shipping services to generate accurate, context-aware order recommendations. The runtime allows it to reason over inventory, pricing, and delivery constraints dynamically, providing qualified results to the Listing Agent.</p>
</li>
<li><p><strong>Product Gateway</strong> – Serves as a bridge between the Product Service and agents. Using OpenAPI-spec integration, the gateway ensures that requests from agents are routed correctly and consistently, without agents needing to manage low-level API details.</p>
</li>
<li><p><strong>Shipping Gateway</strong> – Functions similarly for the Shipping Service. It provides standardized access for agents, enabling safe queries and updates regarding delivery options, tracking, and logistics.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763770967596/1d0d8557-5a08-4b34-a6e2-4b24b16491ee.png" alt class="image--center mx-auto" /></p>
<p>By combining <strong>Agentcore runtime</strong> for backend orchestration (Listing and Ordering Agents) with <strong>Agentcore gateways</strong> for safe, standardized access to existing APIs (Product and Shipping), the system gains a <strong>layer of intelligence on top of existing services</strong>. This setup enables conversational search, adaptive ordering workflows, and dynamic coordination between services—all without modifying the underlying APIs or services.</p>
<h1 id="heading-traditional-service-integrations">Traditional Service Integrations</h1>
<p>Traditional service integrations rely heavily on the foundational concepts of <strong>distributed systems</strong>:</p>
<ol>
<li><p><strong>Communication</strong> – Teams coordinate API contracts, call patterns, and failure handling.</p>
</li>
<li><p><strong>Boundaries</strong> – Clear separation of responsibilities between services and teams prevents accidental coupling.</p>
</li>
<li><p><strong>Specialization and Ownership</strong> – Each team owns a service domain, ensuring consistent updates and reliability.</p>
</li>
<li><p><strong>Contracts</strong> – Deterministic API specifications define exactly how services interact.</p>
</li>
<li><p><strong>Scalability</strong> – Human planning ensures that services can grow sustainably, avoiding bottlenecks or cascading failures.</p>
</li>
</ol>
<p>In these systems, <strong>half of the safety and stability comes from human oversight</strong>: collaboration, code reviews, discussions on edge cases, and shared understanding of trade-offs. Engineers act as the implicit safety net, catching inconsistencies, reasoning about system-wide effects, and maintaining trust between services.</p>
<p>Allowing agents to fully satisfy these integrations without careful supervision introduces significant risk. Unlike humans, agents may <strong>interpret, combine, or explore service interactions in unexpected ways</strong>, bypassing implicit constraints and assumptions. This can result in invalid requests, unpredictable workflows, or even cascading failures—putting the reliability and safety of the system at stake. Clear human-defined boundaries, specifications, and guardrails remain essential when introducing agentic layers into established service ecosystems.</p>
<h1 id="heading-agentic-contract-pitfalls">Agentic Contract Pitfalls</h1>
<p>In traditional distributed systems, <strong>contracts are deterministic</strong>: each service exposes a set of well-defined operations, with clear input and output formats. Consumers of these services rely on the documented capabilities, and trust comes from a shared, stable understanding of what each service can provide. Exploration and communication are tightly bound to this deterministic model, which minimizes surprises and ensures predictable behavior.</p>
<p>In an <strong>agentic system</strong>, the dynamics change. Agents—often powered by foundation models (LLMs)—attempt to reason, analyze, and construct the “best match” for a given task. This introduces uncertainty: even if the underlying services remain stable, the agent’s interpretation, reasoning, or creative combination of operations can produce <strong>unintended behaviors</strong>, which may propagate through the system. Without careful constraints, the agent itself becomes a potential <strong>single point of failure</strong>, capable of generating invalid requests, misusing service capabilities, or creating inconsistent workflows.</p>
<p>To avoid these pitfalls, <strong>well-defined service specifications are critical</strong>. Each service should provide:</p>
<ul>
<li><p><strong>Clean summarization and description</strong> – a concise overview of what the service does and the intended use cases.</p>
</li>
<li><p><strong>Examples</strong> – illustrative input-output pairs showing correct usage.</p>
</li>
<li><p><strong>Data formats and types</strong> – clear definitions of fields, expected types, ranges, and constraints.</p>
</li>
<li><p><strong>Possible values and enumerations</strong> – to restrict agent actions to valid options.</p>
</li>
</ul>
<p>By providing <strong>precise, structured, and machine-readable specifications</strong>, teams can guide agents to safely explore and reason over services, reducing the risk of unintentional behavior. Agentic contracts require <strong>both human clarity and machine interpretability</strong> to ensure that agents enhance workflows without introducing instability, while preserving the trust that traditional deterministic contracts provided.</p>
<h1 id="heading-agentic-communication-pitfalls">Agentic Communication Pitfalls</h1>
<p>In traditional distributed systems, communication is <strong>deterministic and predictable</strong>: services exchange well-defined requests and responses, and errors or retries are explicitly handled according to agreed protocols. Agents, however, introduce a fundamentally different mode of communication.</p>
<p>Agents can decide <strong>on their own how to handle situations</strong>, which may lead to unexpected behaviors. For example:</p>
<ul>
<li><p><strong>Ignoring contract priorities</strong> – an agent may remove search criteria or skip fields without understanding their importance, violating implicit assumptions in the service contract.</p>
</li>
<li><p><strong>Infinite retries</strong> – in response to failures, an agent may loop endlessly, potentially overwhelming services or consuming unnecessary resources.</p>
</li>
<li><p><strong>Unintended actions</strong> – agents may execute operations that deviate from the original user request, misinterpreting intent or optimizing incorrectly.</p>
</li>
</ul>
<p>These behaviors are non-deterministic and can propagate across the system, creating <strong>unpredictable workflows and system instability</strong>. Unlike human operators, agents lack an innate sense of context, risk, and priority unless explicitly constrained. Proper <strong>guardrails, monitoring, and well-defined service specifications</strong> are essential to prevent agentic communication from becoming a source of errors or cascading failures.</p>
<h1 id="heading-agents-are-not-doctors">Agents Are Not Doctors</h1>
<p>Agents, even when powered by advanced foundation models, <strong>cannot become truly specialized in a domain without a supporting agentic ecosystem</strong>. Unlike human experts, their reasoning depends entirely on the data and context they are provided. GenAI models come pre-trained on vast amounts of information—both accurate and misleading, relevant and irrelevant—which creates a <strong>noise-to-signal challenge</strong>. This information overload makes it difficult for agents to consistently focus on what truly matters, increasing the risk of misinterpretation, inconsistent decisions, or unsafe actions.</p>
<p>What is often missing is <strong>clean, structured, and meaningful data</strong>: well-defined decision rules, product and service specifications, clear contracts, and curated examples. Without these foundations, agents may produce plausible-sounding but incorrect results, overlook priorities, or misapply rules. In short, <strong>agents cannot become “doctors”</strong>: they lack the domain expertise and curated knowledge that human specialists acquire over years. Specialization requires not just intelligence, but a carefully constructed ecosystem that constrains, guides, and informs agent behavior—turning raw potential into reliable, context-aware expertise.</p>
<h1 id="heading-conclusion">Conclusion</h1>
<p>The adoption of an <strong>agentic approach</strong> is becoming increasingly evident, as enterprises look to leverage generative AI and intelligent agents to build more adaptive, responsive, and specialized systems. However, success depends on more than just deploying agents: it requires <strong>clean and optimized data, detailed product specifications, and precise service contracts</strong>. Teams must also focus on <strong>reusability</strong> and <strong>well-defined boundaries</strong> to maintain clarity, reduce coupling, and ensure predictable behavior. Rushing adoption without these foundations risks <strong>cascading, non-deterministic communication</strong>, which can compromise reliability and system integrity. Thoughtful design, structured data, and disciplined system practices remain essential to unlock the full potential of agentic architectures while maintaining control and trust.</p>
]]></content:encoded></item><item><title><![CDATA[Egress Rate Controlling in Distributed Systems (Part 2)]]></title><description><![CDATA[In the precedent part of this series, some scenarios of the producer’s impact on downstream service were explored. This part will focus on some patterns and practices in asynchronous design to reduce the impact on downstream service.
This part focus ...]]></description><link>https://blogs.serverlessfolks.com/egress-rate-controlling-in-distributed-systems-part-2</link><guid isPermaLink="true">https://blogs.serverlessfolks.com/egress-rate-controlling-in-distributed-systems-part-2</guid><category><![CDATA[rate-controlling]]></category><category><![CDATA[ratelimit]]></category><category><![CDATA[distributed system]]></category><category><![CDATA[partnership]]></category><category><![CDATA[System Design]]></category><category><![CDATA[backpressure]]></category><dc:creator><![CDATA[Omid Eidivandi]]></dc:creator><pubDate>Fri, 28 Mar 2025 21:34:09 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1743174518435/f89de716-0577-44a4-9927-fafba7813584.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In the <a target="_blank" href="https://hashnode.com/edit/cm8bv2h0s000209le0n9w3iri">precedent part</a> of this series, some scenarios of the producer’s impact on downstream service were explored. This part will focus on some patterns and practices in asynchronous design to reduce the impact on downstream service.</p>
<p>This part focus on Asynchronous design and related patterns and practices</p>
<h2 id="heading-real-scenarios">Real Scenarios</h2>
<p>Before starting patterns and practices, let’s look at a real scenario where the producer rate could impact downstream services.</p>
<h3 id="heading-migrating-provisioned-ddb-to-ondemand">Migrating Provisioned DDB to OnDemand</h3>
<p>Recently, AWS reduced the cost of DynamoDb onDemand by 50%. This announcement led to many migration decisions. When we moved to On-Demand mode for one of our critical and core services, we recognized a higher processing latency and error rate in one of downstream services. While the overall service’s functionality was not impacted ( an old serverless system ), the delays and latency introduced by a higher rate of errors had a real business impact.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1742600961036/f8298455-9e95-4a0b-87d4-aa1492965018.png" alt class="image--center mx-auto" /></p>
<p>The above diagram demonstrates a typical design in many companies and works perfectly at some scale. Still, it can be dangerous when the processing rate exceeds the predicted rate. This design introduced significant OpenSearch queries’ latencies by migrating Dynamodb table to OnDemand.</p>
<p>Some metrics during investigations showed a higher rate of successful write requests compared to the provisioned mode before migration. In Provisioned mode, requests were throttled but succeeded after retries and an exponential backoff strategy for errors allowed balancing the instant peak to a wider time window by delaying it.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1742599559697/2125fbcc-3d80-41ad-9bec-8c45a0cab29d.png" alt class="image--center mx-auto" /></p>
<p>Fortunately ( or unfortunately ), experimenting with throttling and retrying for UpdateB allowed a decrease in the processing rate at a given time, helping downstream to continue processing or by giving downstream the chance of recovery. Migrating to OnDemand gave more capacity at initial migration to the producer (4 partitions 12000 RCU, 4000 WCU).</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1742600040680/38b351e4-9bf1-4858-a0f7-7081ddc8aaca.png" alt class="image--center mx-auto" /></p>
<p>Too many changes increased the load on the downstream service, which in turn led to degraded performance of the downstream service using OpenSearch, the impact was due to the increased number of queries in OS internal buffer and, consequently, increased latency. These latencies lead to increased error rates for Lambda triggers. While not a real-world issue, using Bisect and a large batch size does increase the load on the downstream side.</p>
<h3 id="heading-design-problems">Design Problems</h3>
<ul>
<li><p>Distributing many changes at the same time</p>
</li>
<li><p>Introducing the database changes instead of Concrete events representing the final and viable state at a given time (processing time)</p>
</li>
<li><p>Introducing Coupling between two specific services DDB stream and OpenSearch</p>
</li>
<li><p>Propagating Errors</p>
</li>
<li><p>Big BatchSize with Bisect On Failure</p>
</li>
</ul>
<h2 id="heading-high-level-async-design">High-Level Async Design</h2>
<p><strong><em>In a Timely-Balanced scenario</em></strong>, the <strong>Count</strong> of instantaneous occurrences and <strong>Speed</strong> are reduced. <strong>Concurrency</strong> can consequently be improved but needs to be controlled at a layer based on possibilities.</p>
<p><strong><em>In an Instantaneous scenario</em>,</strong> the Count, speed, and concurrency are increased, which leads to more capacity needed on the downstream side.</p>
<p>While at some scale, <strong>Time-Balanced</strong> and <strong>Instantaneous</strong> scenarios produce some close functional results, at a larger scale, more challenges can be considered and refined intentionally. The following high-level diagram can better represent the challenge.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1742647711633/32599ff9-657c-44a9-878e-fb193fa3abc5.png" alt class="image--center mx-auto" /></p>
<p>In <a target="_blank" href="https://serverlessfolks.com/egress-rate-controlling-in-distributed-systems-part-1">Part 1</a> of this series related to Rate Controlling, four attributes were explored: <strong>Time Window</strong>, <strong>Concurrency</strong>, <strong>Processing Speed</strong>, and <strong>Count</strong>, but another important attribute, <strong>State</strong> was also under focus.</p>
<h2 id="heading-producer-amp-consumer-integrations">Producer &amp; Consumer Integrations</h2>
<p>While AWS provides many services that allow and simplify integrations in an Event-drive design, each service is purpose-built and has dedicated pros and cons. The following diagram showcases some available integrations. (The Lambdas Presented in this diagram are just to demonstrate a computing resource, but this can be a Fargate container or any other service with computing/processing capability)</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1742661871907/9ac38bbf-bc73-4d4f-b87d-6592624aef75.png" alt class="image--center mx-auto" /></p>
<p>The “Invocation Models and Service Integrations” topic is deeply explained in Chapter 3 of <a target="_blank" href="https://serverlessfolks.com/mastering-serverless-computing-book">Mastering Serverless Computing with AWS Lambda</a></p>
<p>In above diagram, three category of integrations are discussed:</p>
<ul>
<li><p>Streams</p>
</li>
<li><p>Pub/Sub</p>
</li>
<li><p>Queues</p>
</li>
</ul>
<h3 id="heading-streams">Streams</h3>
<p>Streams such as DynamoDb streams or kinesis are ideal for near realtime data streaming even in high throughput scenarios with massive volumes of data. They guarantee ordering, provide means to manage highest Write throughput requirements, and support handling large amount of data.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1742688117203/05d4a53c-efb2-475f-87fa-8f41eeae31f8.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-pubsub">Pub/Sub</h3>
<p>Services providing Publ/Sub such as Event-Brdige or SNS have the capacity to distribute any single messages to 1 or many subscribers near realtime and offer high throughput allowing to manages high number of distinct messages, per design they operate in an fully asynchronous way which means they dont track the consumer’s processing status.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1742688185065/302ec7eb-80cf-47d1-911e-ab4de1ed1372.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-queues">Queues</h3>
<p>Queue systems such as SQS guarantee delivery and provide mechanisms for resiliency, load leveling, and decoupling. They provide simplified means to prevent message loss such as keeping messages in the queue till expirations and redrive policies.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1742688418676/1e52d0af-9e5a-4a75-8400-8ac67465af0e.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-general-recommendations">General Recommendations</h2>
<p>This section provides general guideline to better control the rate and load on downstreams and not focusing on design practices but adopting some level of practices to develop some safeguards.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1742739645244/3d2c423e-a4ea-432e-94f3-9f68a6b9e63d.png" alt class="image--center mx-auto" /></p>
<p><strong>Batching</strong> allows to reduce the rate of statelessness by providing related changes side by side that let the software code have more context to take decisions. <strong>Reducing the concurrency</strong> can be achieved in different way for each category but at the end a strict concurrency configuration such as <strong>ConcurrencyLimit</strong> with SQS, <strong>ReservedConcurrency</strong> with SNS/EB , or <strong>Partitioning</strong> with Streams allows reducing the execution rate and consequently the risk of overloading the downstream. <strong>Synthesizing</strong> messages allows being able to produce the final state at that given time by looking at a time ordered batch of related records.</p>
<h3 id="heading-source-code">Source Code</h3>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://github.com/XaaXaaX/aws-egress-ratelimiter">https://github.com/XaaXaaX/aws-egress-ratelimiter</a></div>
<p> </p>
<h3 id="heading-sqs-integration">SQS integration</h3>
<p>Describing how Amazon SQS works internally is out of scope, but at a high level the polling process of AWS lambda with SQS can be illustrated by following diagram</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1742767803706/6cad5138-be99-4ddd-b248-b34e964e3ac4.png" alt class="image--center mx-auto" /></p>
<p>Aws lambda Asks for Messages and SQS returns available messages from sample nodes, and this is the SQS default behavior, By configuring a longer wait time than zero (default value for short polling), long polling will be applied and SQS will do a best effort to gather more messages from all nodes.</p>
<p><strong>Batching:</strong> if a BatchSize specified this will improve the chance of proper batching, allowing more messages to be grouped and allows lambda to have more knowledge about state, while this state is yet distributed and not 100% localized, it will be useful and improve processing quality at some level. Using <code>maxConcurrency</code> option alongside batching allows to have more localized state by increasing the chance of having messages in the same batch. This can be configured using CDK as below.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> queue = <span class="hljs-keyword">new</span> Queue(<span class="hljs-built_in">this</span>, <span class="hljs-string">'Queue'</span>, {
   receiveMessageWaitTime: Duration.seconds(<span class="hljs-number">20</span>),
   deadLetterQueue: {
     maxReceiveCount: <span class="hljs-number">3</span>,
     queue: <span class="hljs-keyword">new</span> Queue(<span class="hljs-built_in">this</span>, <span class="hljs-string">'DeadLetterQueue'</span>)
   }
});

lambdaFunction.addEventSource(<span class="hljs-keyword">new</span> SqsEventSource(queue, {
   batchSize: <span class="hljs-number">10</span>,
   maxBatchingWindow: Duration.seconds(<span class="hljs-number">60</span>),
   maxConcurrency: <span class="hljs-number">2</span>,
}));
</code></pre>
<p>But Lambda Event Source Mapping provides another configuration option <code>maxBatchingWindow</code> that can be used allowing the lambda service to defer invocation up to 5 minutes while asking for more and more messages form SQS and a function is invoked when one of the following conditions is met as <code>payload size</code> reaches 6MB, <code>Max Batching Window</code> reaches its maximum value, or the <code>Batch Size</code> reaches its maximum value.</p>
<p>The lambda now has more context and can apply more control and synthesize messages. The following snippet shows how in typescript this can be achieved.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> records = (event.Records <span class="hljs-keyword">as</span> SQSRecord[])
  .sort(<span class="hljs-function">(<span class="hljs-params">a, b</span>) =&gt;</span> { <span class="hljs-keyword">return</span> a.attributes.ApproximateFirstReceiveTimestamp.localeCompare(b.attributes.ApproximateFirstReceiveTimestamp); })
  .map(<span class="hljs-function">(<span class="hljs-params">record</span>) =&gt;</span> { <span class="hljs-keyword">return</span> { ...record, body: <span class="hljs-built_in">JSON</span>.parse(record.body) }});

<span class="hljs-keyword">const</span> grouped = <span class="hljs-built_in">Object</span>.groupBy(records, <span class="hljs-function">(<span class="hljs-params">currentValue: recordType</span>) =&gt;</span> currentValue.body.subject);

<span class="hljs-keyword">const</span> results: SQSRecord[] = [];

grouped.forEach(<span class="hljs-function">(<span class="hljs-params">entry</span>) =&gt;</span> {
    <span class="hljs-keyword">const</span> entryValues = entry?.[<span class="hljs-number">1</span>] <span class="hljs-keyword">as</span> <span class="hljs-built_in">any</span>[];
    <span class="hljs-keyword">let</span> firstEvent = entryValues?.splice(<span class="hljs-number">0</span>,<span class="hljs-number">1</span>)?.[<span class="hljs-number">0</span>];
    <span class="hljs-keyword">const</span> initialType = firstEvent.body.type;

    entryValues.forEach(<span class="hljs-function">(<span class="hljs-params">current: <span class="hljs-built_in">any</span></span>) =&gt;</span> {
        <span class="hljs-keyword">const</span> currentData = current.body.data;
        <span class="hljs-keyword">const</span> currentType = current.body.type;

        <span class="hljs-keyword">if</span>(initialType == <span class="hljs-string">'user.signedup'</span>) {
          <span class="hljs-keyword">if</span>(currentType == <span class="hljs-string">'user.profile_updated'</span>) {
            firstEvent.body.data.profile = {
              ...firstEvent.body.data.profile, 
              age: currentData.age,
              nickname : currentData.nickname,
              prefered_channels: currentData.prefered_channels
            };
          }
          <span class="hljs-keyword">else</span> <span class="hljs-keyword">if</span>(currentType == <span class="hljs-string">'user.unsubscribed'</span>) {
              firstEvent = <span class="hljs-literal">null</span>;
          }
        }
    });
    results.push(firstEvent);
});
</code></pre>
<p>Sending two messages related to the same id will group them and by synthesizing will avoid distributing or doing unnecessary processes. The following json example represents two distinct events’ payload sent to SQS.</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"id"</span>: <span class="hljs-string">"dmPvPSlwFZOTOjefpu8m2"</span>,
  <span class="hljs-attr">"time"</span>: <span class="hljs-string">"2025-03-23T17:13:32+00:00"</span>,
  <span class="hljs-attr">"source"</span>: <span class="hljs-string">"user.management"</span>,
  <span class="hljs-attr">"subject"</span>: <span class="hljs-string">"omid_unique_user_id"</span>,
  <span class="hljs-attr">"data"</span>: {
    <span class="hljs-attr">"name"</span>: <span class="hljs-string">"omid"</span>,
    <span class="hljs-attr">"profile"</span>: {
       <span class="hljs-attr">"age"</span>: <span class="hljs-number">40</span>
    }
  },
  <span class="hljs-attr">"type"</span>: <span class="hljs-string">"user.signedup"</span>
},
{
  <span class="hljs-attr">"id"</span>: <span class="hljs-string">"TAF5aEQeAh7GX1q8Cvam3"</span>,
  <span class="hljs-attr">"time"</span>: <span class="hljs-string">"2025-03-23T17:14:32+00:00"</span>,
  <span class="hljs-attr">"source"</span>: <span class="hljs-string">"user.management"</span>,
  <span class="hljs-attr">"subject"</span>: <span class="hljs-string">"omid_unique_user_id"</span>,
  <span class="hljs-attr">"data"</span>: {
    <span class="hljs-attr">"name"</span>: <span class="hljs-string">"omid"</span>,
    <span class="hljs-attr">"age"</span>: <span class="hljs-number">43</span>,
    <span class="hljs-attr">"nickname"</span>: <span class="hljs-string">"xaaxaax"</span>,
    <span class="hljs-attr">"prefered_channels"</span>: [ <span class="hljs-string">"EMAIL"</span> ]
  },
  <span class="hljs-attr">"type"</span>: <span class="hljs-string">"user.profile_updated"</span>
}
</code></pre>
<p>The Example will distribute a final and single <code>user.signedup</code> event representing a combination of both events.</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"id"</span>: <span class="hljs-string">"dmPvPSlwFZOTOjefpu8m2"</span>,
  <span class="hljs-attr">"time"</span>: <span class="hljs-string">"2025-03-23T17:13:32+00:00"</span>,
  <span class="hljs-attr">"source"</span>: <span class="hljs-string">"user.management"</span>,
  <span class="hljs-attr">"subject"</span>: <span class="hljs-string">"omid_unique_user_id"</span>,
  <span class="hljs-attr">"data"</span>: {
     <span class="hljs-attr">"name"</span>: <span class="hljs-string">"omid"</span>,
     <span class="hljs-attr">"profile"</span>: {
        <span class="hljs-attr">"age"</span>: <span class="hljs-number">43</span>,
        <span class="hljs-attr">"nickname"</span>: <span class="hljs-string">"xaaxaax"</span>,
        <span class="hljs-attr">"prefered_channels"</span>: [ <span class="hljs-string">"EMAIL"</span> ]
     }
  },
  <span class="hljs-attr">"type"</span>: <span class="hljs-string">"user.signedup"</span>
}
</code></pre>
<p>While the above example provide some details to better control the rate and reducing the risk of overloading downstreams but this approach has no guarantee of control on top of single entities even if in above example the goal was achieved, this means it is possible to have two above messages in two distinct invocations while each invocation receives the respective batch but random entities.</p>
<p><strong>Partitioning:</strong> While Batching can be useful but at some points two related messages can be presented in two distinct invocation ( execution environment ) even they are close in terms of time, Grouping related entities and guarantee that they will reach the same invocation (exp. <code>user.signedup</code> and <code>user.profile_updated</code> events related to same user). This can be configured using CDK as below.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> queue = <span class="hljs-keyword">new</span> Queue(<span class="hljs-built_in">this</span>, <span class="hljs-string">'Queue'</span>, {
   receiveMessageWaitTime: Duration.seconds(<span class="hljs-number">20</span>),
   fifo: <span class="hljs-literal">true</span>,
   deadLetterQueue: {
     maxReceiveCount: <span class="hljs-number">3</span>,
     queue: <span class="hljs-keyword">new</span> Queue(<span class="hljs-built_in">this</span>, <span class="hljs-string">'DeadLetterQueue'</span>, { fifo: <span class="hljs-literal">true</span> })
   }
});

lambdaFunction.addEventSource(<span class="hljs-keyword">new</span> SqsEventSource(queue, {
   batchSize: <span class="hljs-number">10</span>,
}));
</code></pre>
<p>Sending the previous message payloads presented in batching section, following SQS messages requests must be send to SQS, the presence of <code>MessageGroupId</code> guarantees that at a given time all messages for the same user ( MessageGroupId ) will be strictly received together.</p>
<pre><code class="lang-json"> {
    <span class="hljs-attr">"Id"</span>: <span class="hljs-string">"Test-0001-2015-09-16T140731Z"</span>,
    <span class="hljs-attr">"MessageGroupId"</span>: <span class="hljs-string">"omid_user_id"</span>,
    <span class="hljs-attr">"MessageDeduplicationId"</span>: <span class="hljs-string">"dmPvPSlwFZOTOjefpu8m2"</span>,
    <span class="hljs-attr">"MessageBody"</span>: <span class="hljs-string">"{\r\n  \"id\":\"dmPvPSlwFZOTOjefpu8m2\",\r\n  \"subject\":\"omid_user_id\",\r\n     \"time\":\"2025-03-23T17:13:32+00:00\",\r\n      \"data\":{\r\n         \"name\":\"omid\",\r\n         \"profile\":{\r\n            \"age\":34,\r\n            \"nickname\":\"omid\"\r\n         }\r\n      },\r\n      \"type\":\"user.signedup\"\r\n   }"</span>
 },
 {
    <span class="hljs-attr">"Id"</span>: <span class="hljs-string">"Test-0002-2015-09-16T140930Z"</span>,
    <span class="hljs-attr">"MessageGroupId"</span>: <span class="hljs-string">"omid_user_id"</span>,
    <span class="hljs-attr">"MessageDeduplicationId"</span>: <span class="hljs-string">"2dokCjxmQvUk2brjLuus1"</span>,
    <span class="hljs-attr">"MessageBody"</span>: <span class="hljs-string">"{\r\n  \"id\":\"2dokCjxmQvUk2brjLuus1\",\r\n  \"subject\":\"omid_user_id\",\r\n    \"time\":\"2025-03-23T17:14:32+00:00\",\r\n      \"data\":{\r\n         \"name\":\"omid\",\r\n         \"age\":43,\r\n         \"nickname\":\"xaaxaax\",\r\n     \"prefered_channels\": [ \"EMAIL\" ]},\r\n      \"type\":\"user.profile_updated\"\r\n   }"</span>
  }
</code></pre>
<p>This example apply a mix of Partitioning and Batching allowing to give more change to lambda have more consistent state which is managed by SQS internally.</p>
<h3 id="heading-dynamodb-kinesis-stream-integration">DynamoDb / Kinesis Stream Integration</h3>
<p>Before deep diving into Streams integrations, this section elaborate on how the integration works. The following diagram illustrates how Items, Item Collections, and stream records are organized.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1742948200739/38437106-c31b-4ab6-a2c5-5aa42a48f0b2.png" alt class="image--center mx-auto" /></p>
<p>Aws lambda Asks for List of shards ( aka. Partition in Dynamodb ), Now Lambda handle each shard process in isolation, and starts by getting the Shard Iterator ( aka. start pointer in streaming concept ), giving the iterator the stream returns the items in response. Knowing that each item collection ( items related to same PartitionKey ) is always located in a single Shard the lambda already benefit from having all related data as by default a single lambda execution is dedicated to each shard at a given time which means at least for a single PartitionKey there is a single running invocation.</p>
<pre><code class="lang-mermaid">sequenceDiagram
    participant A as Event Source Mapping
    participant B as Stream
    participant C as Shard
    A-&gt;&gt;B: Ask for list of Shards
    par Shard 1
        A-&gt;&gt;C: ask for Shard Iterator
        A-&gt;&gt;C: Get Records in a Shard using Iterator
        C-&gt;&gt;A: Respond back batch of messages
    and Shard 2
        A-&gt;&gt;C: ask for Shard Iterator
        A-&gt;&gt;C: Get Records in a Shard using Iterator
        C-&gt;&gt;A: Respond back batch of messages
    end
</code></pre>
<p>Using AWS CDK, the integration can be refined to give more flexibility to make concrete decisions. The following snippet shows how the event source mapping can be configured.</p>
<pre><code class="lang-typescript">lambdaFunction.addEventSource(<span class="hljs-keyword">new</span> DynamoEventSource(table, {
      startingPosition: StartingPosition.LATEST,
      batchSize: <span class="hljs-number">10</span>,
      maxBatchingWindow: Duration.seconds(<span class="hljs-number">60</span>),
      bisectBatchOnError: <span class="hljs-literal">true</span>,
      retryAttempts: <span class="hljs-number">3</span>,
      maxRecordAge: Duration.minutes(<span class="hljs-number">15</span>),
      parallelizationFactor: <span class="hljs-number">1</span>,
}));
</code></pre>
<p>A batch of 10 records and a 60 seconds batching window is configured, allowing reception of more related records that consequently will allow to reduce the overload on downstreams.</p>
<p>Sending a new item to DynamoDb table and updating it</p>
<pre><code class="lang-json"><span class="hljs-comment">// New Item</span>
{
    <span class="hljs-attr">"PutRequest"</span>: {
        <span class="hljs-attr">"Item"</span>: {
            <span class="hljs-attr">"pk"</span>: {<span class="hljs-attr">"S"</span>: <span class="hljs-string">"omid"</span>},
            <span class="hljs-attr">"sk"</span>: {<span class="hljs-attr">"S"</span>: <span class="hljs-string">"t0000"</span>},
            <span class="hljs-attr">"nickname"</span>: {<span class="hljs-attr">"S"</span>: <span class="hljs-string">"Omid"</span>},
            <span class="hljs-attr">"age"</span>: {<span class="hljs-attr">"N"</span>: <span class="hljs-string">"30"</span>},
            <span class="hljs-attr">"prefered_channels"</span>: {<span class="hljs-attr">"S"</span>: <span class="hljs-string">"EMAIL"</span>},
            <span class="hljs-attr">"status"</span>: {<span class="hljs-attr">"S"</span>: <span class="hljs-string">"ACTIVE"</span>},
        }
    }
}

<span class="hljs-comment">// Update Item</span>
{
    <span class="hljs-attr">"PutRequest"</span>: {
        <span class="hljs-attr">"Item"</span>: {
            <span class="hljs-attr">"pk"</span>: {<span class="hljs-attr">"S"</span>: <span class="hljs-string">"omid"</span>},
            <span class="hljs-attr">"sk"</span>: {<span class="hljs-attr">"S"</span>: <span class="hljs-string">"t0000"</span>},
            <span class="hljs-attr">"nickname"</span>: {<span class="hljs-attr">"S"</span>: <span class="hljs-string">"XaaXaaX"</span>},
            <span class="hljs-attr">"age"</span>: {<span class="hljs-attr">"N"</span>: <span class="hljs-string">"43"</span>},
            <span class="hljs-attr">"prefered_channels"</span>: {<span class="hljs-attr">"S"</span>: <span class="hljs-string">"EMAIL"</span>},
            <span class="hljs-attr">"status"</span>: {<span class="hljs-attr">"S"</span>: <span class="hljs-string">"MDIFIED"</span>},
        }
    }
}
</code></pre>
<p>Looking at handler code the grouping is done per PK and verification based on eventName being <code>INSERT</code> , <code>MODIFY</code> , or <code>REMOVE</code>.</p>
<pre><code class="lang-typescript">
  <span class="hljs-keyword">const</span> grouped = <span class="hljs-built_in">Object</span>.entries(
    <span class="hljs-built_in">Object</span>.groupBy(records, <span class="hljs-function">(<span class="hljs-params">currentValue: DynamoDBRecord</span>) =&gt;</span> currentValue.dynamodb?.Keys?.pk.S || <span class="hljs-string">''</span>)
  );

  <span class="hljs-keyword">const</span> results: DynamoDbInternalRecord[] = [];

  grouped.forEach(<span class="hljs-function">(<span class="hljs-params">entry</span>) =&gt;</span> {
    <span class="hljs-keyword">let</span> key = entry?.[<span class="hljs-number">0</span>];
    <span class="hljs-keyword">let</span> entryValues = entry?.[<span class="hljs-number">1</span>] <span class="hljs-keyword">as</span> DynamoDBRecord[];
    <span class="hljs-keyword">let</span> firstEvent = entryValues?.splice(<span class="hljs-number">0</span>,<span class="hljs-number">1</span>)?.[<span class="hljs-number">0</span>];
    <span class="hljs-keyword">const</span> initialType = firstEvent.eventName;

    <span class="hljs-keyword">let</span> data: Record&lt;<span class="hljs-built_in">string</span>, <span class="hljs-built_in">any</span>&gt; = {};
    <span class="hljs-keyword">if</span>(initialType == <span class="hljs-string">'INSERT'</span> || initialType == <span class="hljs-string">'MODIFY'</span>)
      data = unmarshall(result.dynamodb?.NewImage <span class="hljs-keyword">as</span> DynamoDbInternalRecord);
    <span class="hljs-keyword">else</span> <span class="hljs-keyword">if</span>(initialType == <span class="hljs-string">'REMOVE'</span>)
      data = unmarshall(result.dynamodb?.OldImage <span class="hljs-keyword">as</span> DynamoDbInternalRecord);

    entryValues.forEach(<span class="hljs-function">(<span class="hljs-params">current: <span class="hljs-built_in">any</span></span>) =&gt;</span> {
      <span class="hljs-keyword">const</span> currentType = current.eventName;
      <span class="hljs-keyword">let</span> currentData: Record&lt;<span class="hljs-built_in">string</span>, <span class="hljs-built_in">any</span>&gt; = {};
      <span class="hljs-keyword">if</span>(initialType == <span class="hljs-string">'INSERT'</span> || initialType == <span class="hljs-string">'MODIFY'</span>)
        currentData = unmarshall(current.dynamodb?.NewImage <span class="hljs-keyword">as</span> DynamoDbInternalRecord);
      <span class="hljs-keyword">else</span> <span class="hljs-keyword">if</span>(initialType == <span class="hljs-string">'REMOVE'</span>)
         currentData = unmarshall(current.dynamodb?.OldImage <span class="hljs-keyword">as</span> DynamoDbInternalRecord);

      <span class="hljs-keyword">if</span>(initialType == <span class="hljs-string">'INSERT'</span>) {
        <span class="hljs-keyword">if</span>(currentType == <span class="hljs-string">'MODIFY'</span>) {
          data.age = currentData.age;
          data.nickname = currentData.nickname;
          data.prefered_channels = currentData.prefered_channels;
        }
        <span class="hljs-keyword">else</span> data = {};

      }
    });
    results.push(data);
  });
</code></pre>
<p>The printed result for the above records will be as below</p>
<pre><code class="lang-json">{
    <span class="hljs-attr">"pk"</span>: <span class="hljs-string">"omid"</span>,
    <span class="hljs-attr">"sk"</span>: <span class="hljs-string">"t0000"</span>,
    <span class="hljs-attr">"nickname"</span>: <span class="hljs-string">"XaaXaaX"</span>,
    <span class="hljs-attr">"age"</span>: <span class="hljs-number">43</span>,
    <span class="hljs-attr">"prefered_channels"</span>: <span class="hljs-string">"EMAIL"</span>,
    <span class="hljs-attr">"status"</span>: <span class="hljs-string">"ACTIVE"</span>
}
</code></pre>
<h3 id="heading-eventbridge-sns-integration">EventBridge / Sns Integration</h3>
<p>SNS and EventBridge services will allow a Fanout design with multiple consumers (subscribers). SNS/EB, even though the event contract for SNS contains an array of records ( Records[] ), the lambda always receive a single message in the array ( ref in FAQ: <a target="_blank" href="https://aws.amazon.com/sns/faqs/">here</a> ). There is no possibility of refined batch processing and custom batching configuration that derives a higher level of concurrent execution of lambdas and a load without the possibility of control. The distribution is an asynchronous call, This means the Channel will not be aware of processing results in consumer side.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1743077617095/7302beed-3fab-4655-94dc-b409d4581795.png" alt class="image--center mx-auto" /></p>
<ul>
<li><p><strong>Reserved Concurrency</strong>: In cases of direct integration with SNS or Eventbridge, the solution to limit downstream impact is to apply reserved concurrency.</p>
</li>
<li><p><strong>OnFailure Destination</strong>: Integrate a DLQ. The default behavior is 3 retries with default intervals between each invocation retry (unchangeable):</p>
<ul>
<li><p>1st Retry: 1 minute</p>
</li>
<li><p>2nd: 2 minutes</p>
</li>
</ul>
</li>
<li><p><strong>Load Leveling</strong>: Use an SQS as an intermediate layer to facilitate load balancing.</p>
</li>
</ul>
<pre><code class="lang-typescript">    <span class="hljs-keyword">const</span> lambdaFunction = <span class="hljs-keyword">new</span> NodejsFunction(<span class="hljs-built_in">this</span>, <span class="hljs-string">'LambdaZipFunction'</span>, {
      reservedConcurrentExecutions: <span class="hljs-number">1</span>,
      ...
    });

    <span class="hljs-keyword">const</span> topic = <span class="hljs-keyword">new</span> Topic(<span class="hljs-built_in">this</span>, <span class="hljs-string">'Topic'</span>);

    lambdaFunction.addEventSource(<span class="hljs-keyword">new</span> SnsEventSource(topic, {
      deadLetterQueue: <span class="hljs-keyword">new</span> Queue(<span class="hljs-built_in">this</span>, <span class="hljs-string">'DeadLetterQueue'</span>),
    }));
</code></pre>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Driving design decisions often lies on top of scalability and High-Throughput, While this is an ideal approach, this can become tricky and will impact the business process when boundaries are not well defined and communications are not well thought ( Which is often the case ). Keeping track of surroundings of a service while making such decisions is a must to have and will save time and effort in long term. While crossing out-of-control boundaries ( external system, partners, Saas, etc. ) is not a major case but when exists will cost companies effort, downtime and disruption.</p>
<p>Asynchronous designs give more power and control at communication boundaries but they come with some process oriented decisions by reducing concurrency as a safeguard but also speed for the sake of having more on the fly State consistency.</p>
<p>This part of series explored some details about typical async services and some integration details to control the load on downstream services. The next part of this series will focus on patterns and practices related to rate control in synchronous scenarios where the final product is highly coupled to an external service or downstream with limitations and constraints.</p>
]]></content:encoded></item><item><title><![CDATA[Egress Rate Controlling in Distributed Systems (Part 1)]]></title><description><![CDATA[Once upon a time, companies thought they could do everything in-house—like being the chef, the waiter, and the dishwasher at a restaurant. But soon, they realized their secret recipe for success was drowning in unimportant tasks like setting up email...]]></description><link>https://blogs.serverlessfolks.com/egress-rate-controlling-in-distributed-systems-part-1</link><guid isPermaLink="true">https://blogs.serverlessfolks.com/egress-rate-controlling-in-distributed-systems-part-1</guid><category><![CDATA[distributed system]]></category><category><![CDATA[rate-controlling]]></category><category><![CDATA[ratelimit]]></category><category><![CDATA[System Design]]></category><category><![CDATA[backpressure]]></category><category><![CDATA[partnership]]></category><dc:creator><![CDATA[Omid Eidivandi]]></dc:creator><pubDate>Sun, 16 Mar 2025 16:58:44 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1742144132288/e7a440a4-5163-4c9a-abdc-986ed497e709.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Once upon a time, companies thought they could do everything in-house—like being the chef, the waiter, and the dishwasher at a restaurant. But soon, they realized their secret recipe for success was drowning in unimportant tasks like setting up email servers, building payment systems, and reinventing the wheel (which, spoiler alert, no one wants to do). So, they started outsourcing. They realized buying a pre-made pizza dough (API) is way more efficient than grinding the flour, kneading it, and praying it rises correctly every time. With APIs and SaaS, companies can focus on what they do best—serving their core business, while someone else worries about the sauce.</p>
<p>While focusing on core business is essential, external services often present challenges and highlight several considerations. Those services introduce quotas, limits, or constraints that vary based on Business plan, Architecture, or Infrastructure. Whatever the reason, any consumer must plan enough guardrails to isolate the internal process from external services, and those considerations differ based on the layer in the internal system that lives with that coupling.</p>
<h1 id="heading-coupling-level">Coupling Level</h1>
<p>It is not avoidable to decouple all the external dependencies, and sometimes, a business needs to integrate some external service to gain complicated expertise such as payment, financing, insurance, etc. But at the same time, there is no way to persist all data for systems such as payment, and even for others such as insurance and financing, there is no possibility to predict all end-user interactions. The coupling can be established in two ways, Direct or Indirect.</p>
<h2 id="heading-direct">Direct</h2>
<p>Direct here does not mean Request / Response but more about how the end user product is coupled to the external systems. Coming back to the insurance example, the scenario can be described as:</p>
<ul>
<li><p>A Product is available for a user.</p>
</li>
<li><p>The Product is eligible for N years’ insurance.</p>
</li>
<li><p>The User can modify the insurance default details.</p>
</li>
<li><p>The insurance price is computed per modified details.</p>
</li>
<li><p>The user makes the decision.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1742138109854/a953b737-44a5-4686-a642-13aa95762150.png" alt class="image--center mx-auto" /></p>
<p>When a user changes the insurance duration, this results in a price change and the system needs to interact with the insurance partner service, which, brings the risk of introducing a new point of failure in the system.</p>
<h2 id="heading-indirect">Indirect</h2>
<p>The above scenario mentions the default value for any single product, While these defaults are essential for the end user, they are often close to being static for a while. However, those values don’t relate to the product but to the product category and characteristics. As an example, a Combined Refrigerator ( Frigo/Freezer) with an E-level energy class can have a higher insurance price than an A-class ( just assumptions ), Maybe that price can be lower for well-manufactured and popular brands like SIMENS, AEG, and BOSCH.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1742138204234/9455c000-0278-4172-b884-7276bff920cc.png" alt class="image--center mx-auto" /></p>
<p>The above diagram illustrates a backend service interacting with the partner for product Creation/Modifications. This helps reduce the process coupling level to partner for all product detail page hits by persisting the defaults related to any product Category projected to all available products.</p>
<h1 id="heading-coupling-layer">Coupling Layer</h1>
<p>Focusing again on the article’s topic, defining a rate-controlling strategy depends on the layer's position interacting with the external system, which drives the corresponding design and implementation patterns. Each layer offers distinct possibilities to solve the problem.</p>
<h2 id="heading-client-side">Client Side</h2>
<p>When a client-side application, such as a Web or Mobile application, interacts with an external system directly or via a gateway proxy, any external system behavior will be replicated in the client app. This can be any latency, throttling, or internal failure problems. In case of rate issues, the client app will receive a 429 HTTP status code, indicating “Too Many Requests”. The client can retry the call and hope for a successful response. The problem with the client side is that there is no visibility about the overall rate of errors at a given time, and each client app runs with a local state.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1742139109958/ea10d0e2-75e2-4ce6-8aed-ebc1a6c44273.png" alt class="image--center mx-auto" /></p>
<p>Having millions of users means many users can find a single product details page, so they all need to interact with external insurance partners directly or via a gateway proxy. However, the important takeaway is that no single user knows about the overall load on the systems and how many retries are inflight to the external system.</p>
<h2 id="heading-edge-side">Edge Side</h2>
<p>If the process layer interacting with the external system is an edge layer, an edge-level state can help to keep track of the state, this allows to lead better the client apps and also avoids external system calls by applying some level of caching. But this layer adds a challenge of How zonal distributed edge locations can answer the regional rate challenge, while the rate is dedicated to a single regional communication boundary, many edge locations can send requests. However, this problem is many times less important than the client-side problem.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1742138401667/7794271e-d3c9-4e25-b3d7-60c060deb039.png" alt class="image--center mx-auto" /></p>
<p>Earlier about the edge, the paragraph mentioned that the edge locations yet follow the same distributed state problems. While this is true, in the real world, Geographic insurance and pricing are definitively different, and this can be an unrealistic challenge except for some rare use cases.</p>
<p>Another option when using content delivery networks ( CDN ) is the “<strong>Request Collapsing</strong>” feature; it stands for serving the same origin response for many inflight requests, so the first request reaches the origin while the same requests at a given time are paused while waiting for the response to the first request. this reduces the origin load and, consequently, the external service load. ( The topic of CDN Request Collapsing is deeply discussed in Chapter 9 of <a target="_blank" href="https://serverlessfolks.com/mastering-serverless-computing-book">Mastering Serverless Computing with AWS Lambda</a> )</p>
<h2 id="heading-server-side">Server Side</h2>
<p>The server layer is where all the autonomy and power reside. The server-side layer has access to computing power and is closer to the external system. It can be the best for keeping track of the rate limit's actual state and monitoring the actual consumed threshold. Using the server layer helps reduce the challenge of statelessness. However, this may be a more difficult layer for applying retries and exponential backoff of throttled requests, as this will increase the rate of suspended processor resources. The server layer allows synchronous or asynchronous interactions with the external systems. The choice of async or sync varies based on different constraints, and the Coupling level that can be Direct or Indirect.</p>
<p>Direct coupling choice will be synchronous as the end user product waits for the response, this means the interaction with the external service must be done before sending back the response.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1742138529667/be52f9fd-0de7-4c29-9deb-0ae91503b158.png" alt class="image--center mx-auto" /></p>
<p>All these assumptions can be real when a single container serves the backend service. However, it will suffer from the <strong>Concurrency</strong> challenge that will be explored later in this article and distributed state if the service is designed for high availability and scalability.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1742138584589/14b786d5-7e3a-4618-938c-0e2cdf863f7d.png" alt class="image--center mx-auto" /></p>
<p>In the case of indirect coupling, an asynchronous backend process is a more flexible solution for interacting with an external system. Often, the async processes are on top of managed services such as SQS, Event Bridge, etc., and react to changes via messaging. This can be an ideal design, but the scaling capacity of managed services can become a point of reflection when interacting with external systems. This can be a real concern if the internal system is designed on top of Function as a service, in which the risk of high concurrency becomes a real concern.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1742138637323/8b5dba1c-437f-46bf-8e0c-703ed573e1fd.png" alt class="image--center mx-auto" /></p>
<p>Although the invocations are driven by pollers, each execution environment can receive a batch of records, giving more control to the consumer at the container level. However, the fact of batching by default is related to many factors such as speed, time window, and number of changes.</p>
<h1 id="heading-processing-main-attributes">Processing Main Attributes</h1>
<p>A variety of attributes can play a role in the rate of communication toward the external systems such as:</p>
<ul>
<li><p>Time Window</p>
</li>
<li><p>Concurrency</p>
</li>
<li><p>Speed</p>
</li>
<li><p>Count</p>
</li>
</ul>
<h2 id="heading-time-window">Time Window</h2>
<p>A time window represents the time duration that a constraint applies, it is often a per-second metric, but there are services with per-minute or per-hour constraints. Whenever using a service with time window constraints, the base challenge will be, “How do we adapt the internal rate in a dedicated time window based on external service constraints?”</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1742138762377/0812d740-4eb5-4736-8d3b-db7ef893a37c.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-concurrency">Concurrency</h2>
<p>Concurrency is the ability to run many tasks simultaneously to increase some quality attributes such as Throughput and Scalability. A higher concurrency level can put more pressure on downstream, so crossing the rate constraints and thresholds. Controlling concurrency is a real challenge in distributed systems, and to achieve that control, a decrease in processing speed or throughput will be necessary. Another challenge in highly concurrent processes is state consistency, which can only be shared ( not done ) in a consistent way by context switching that is a complex process and, if achievable, will not allow the control of treatment but only sharing states by the complexity and cost of the real-time context switching process.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1742138831023/d7fd4fe0-6d60-4ed8-ba2b-73f6359b4124.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-speed">Speed</h2>
<p>The speed represents how fast a system treats the demands. A higher speed means a higher level of statelessness, and achieving state means reducing the treatment speed. The system speed must be aligned with the overall business value, and there is no interest in driving fast while putting everyone in danger.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1742138868875/bb627cbc-8cbc-4f8d-898e-dc011ac8034c.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-count">Count</h2>
<p>The number of demands significantly impacts downstream services. Responding to a high number of demands is toward responsiveness, but how downstream services can handle that rate brings more design-related discussions. However, dealing with external systems becomes tricky because an external system has many more customers than our internal system. Sometimes, a subscription change and, sometimes, a move to another vendor can be the solution. Aligning two roadmaps is not achievable. Controlling the number of demands depends on the layer of coupling.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1742138900300/b08832cc-8c00-41e1-a7a0-68a9b63be83c.png" alt class="image--center mx-auto" /></p>
<h1 id="heading-trustable-state">Trustable State</h1>
<p>The rate controlling in distributed systems belongs to consistent, performant, and highly available persistent storage. <strong><em>Why those quality attributes are important?</em></strong></p>
<p>Achieving a shared state consistently relates to many factors when dealing with distributed storage and will solve the most important challenge of controlling the State.</p>
<p><strong>Availability:</strong> is important as unavailability in one node can lead to a stale state. so this will be important to fetch from the most consistent node or fetch and cache the state if any node is replaced. Yet, With all efforts, having a real-time trusted and available state is not an option if we talk about nano or milliseconds.</p>
<p><strong>Performance:</strong> leads to lower latency, and lower latency means less overhead on storage, however, high latency will result in some shared state synchronization problems.</p>
<p><strong>Consistency:</strong> The state must be consistent over different nodes in near real-time, updating the persisted state must be consistent for all subsequent reads.</p>
<p>DynamoDb, S3, and ElasticCache are some available options on AWS. Many other interesting options, such as Momento, can help achieve a distributed share state.</p>
<h1 id="heading-conclusion">Conclusion</h1>
<p>Designing distributed systems and business processes at scale leads to more inter-service communication needs, while in the best cases, the value is evident, there will be the worst cases that will lead to frustration because of the pressure a service puts on downstream services.</p>
<p>Some services add guardrails such as rate limiting to safeguard their infrastructure health, but this impacts the consumers and based on the presented coupling can impact the final products and users.</p>
<p>In this part of “<strong>Rate Controlling in Distributed Systems</strong>”, some concepts related to Rate Controlling, such as <strong>Coupling Level</strong>, <strong>Coupling Layer</strong>, <strong>Processing Main Attributes</strong>, and <strong>Trustable</strong> <strong>State,</strong> are explored to give an overview of how the rate is measured.</p>
<p>The new Part of the Series will explore some patterns and technical details to reduce the risk and control the rate in both Sync and Async designs.</p>
]]></content:encoded></item><item><title><![CDATA[Automating Dependencies Upgrade]]></title><description><![CDATA[One of the biggest challenges in software development is keeping track of the dependencies’ version releases and following the most up-to-date versions as soon as possible. While this is critical toward reducing technical debt in the long term, it ca...]]></description><link>https://blogs.serverlessfolks.com/automating-dependencies-upgrade</link><guid isPermaLink="true">https://blogs.serverlessfolks.com/automating-dependencies-upgrade</guid><category><![CDATA[dependencies]]></category><category><![CDATA[upgrade]]></category><category><![CDATA[dependabot]]></category><category><![CDATA[technical-debt]]></category><dc:creator><![CDATA[Omid Eidivandi]]></dc:creator><pubDate>Wed, 19 Feb 2025 19:50:50 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1740948880398/524dad89-8cda-49bf-b5f4-3b415396b9dc.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>One of the biggest challenges in software development is keeping track of the dependencies’ version releases and following the most up-to-date versions as soon as possible. While this is critical toward reducing technical debt in the long term, it can raise some challenging moments and a lot of effort to check, upgrade, merge, test, and release. Those dependencies can have different levels of criticality and can impact the build or run phase of software.</p>
<p>Some typical dependencies are:</p>
<ul>
<li><p>Libraries and Frameworks</p>
</li>
<li><p>Container Images</p>
</li>
<li><p>Runtimes</p>
</li>
</ul>
<h2 id="heading-importance-and-necessity">Importance and necessity</h2>
<p>It seems important, on paper, to keep everything up to date all the time, but in reality, the importance relies on the following axis:</p>
<ul>
<li><p>Criticality: <code>How Critical is that update?</code></p>
</li>
<li><p>Life Cycle: <code>How frequently are new versions released?</code></p>
</li>
<li><p>Effort: <code>How many projects shall be changed?</code></p>
</li>
</ul>
<p>Let’s explore a bit more. Some dependencies are used in many projects, such as <code>@types/node</code>, but it is used in the build phase, and the new version release rate is low enough. It is rarely released as a major version upgrade and follows the node runtime’s version, but minor and patch versions are released regularly. Another example is <code>@aws-sdk\client-dynamodb</code>, while the rate of releases is lower, this impacts the runtime, and once a change is definitively required, the whole organization will put lots of effort into updating. Another important note is how critical updates are rolled out for any package. Does it respond to any security issue or not?.</p>
<h2 id="heading-enterprise-level-ecosystem">Enterprise Level Ecosystem</h2>
<p>An ecosystem represents the overall leading system to any level of revenue. That system runs on top of many components that collaborate to achieve the final asset, and all those components communicate via defined interfaces called contracts. Not talking about contracts here, but the released assets that are settled on two sides of any single line.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1739913226086/d468cfb4-3192-4a03-a4d0-fac5d31c91af.png" alt class="image--center mx-auto" /></p>
<p>At first glance, those boxes are some autonomous services communicating together, but what a service means to every engineer counts a lot. Some consider a service a dynamic component with a database residing on a separate isolated tier with some business logic, and some others consider a service can have different levels of granularity and flexibility. There are many options. It might be only some complex business logic without any database, protocol, or memory, such as a consumed library at runtime to help extract some metadata from a web URL path, or it can be a container running as a gateway proxy to some partner services.</p>
<p>Looking at a single box and imagining the potential dependencies it may have can be a fun part of the game. The following figure tries to explode a single box to as many pieces as possible.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1739921436582/05fef1a0-9f34-4526-a52b-242173649be0.png" alt class="image--center mx-auto" /></p>
<p>The above list is not an exclusive list of all dependencies but ones I suffered in experience with. Let’s explore the challenges each one brings</p>
<p><strong>Build time:</strong></p>
<ul>
<li><p><strong>Execution Container</strong>: These are the application Dockerfiles built from base images such as <code>public.ecr.aws/docker/library/node:22.13.1-slim</code>. The base images for runtime are less evolved than base images such as <code>public.ecr.aws/awsguru/aws-lambda-adapter:0.9.0</code>, which evolve more frequently.</p>
</li>
<li><p><strong>CDK</strong>: The CDKs are used at development time and evolve frequently as they encapsulate various services and infrastructure. However, they also release versions if any security issue is reported.</p>
</li>
<li><p><strong>Delivery Dependencies</strong>: They evolve less frequently (Such as GitHub Actions) but can be hard to track and keep up to date as they are often declarative and out of sight.</p>
</li>
<li><p><strong>Compiler &amp; Bundler:</strong> In reality, bundlers and compilers are often once configured and keep working for years, but the rate of Bundler releases is high enough that it can never be tracked to define the real upgrade necessity.</p>
</li>
<li><p><strong>Test Runners and Linters:</strong> Are always some packages/libraries that are at a normal change rate; they never impact the real environment. However, it is nice to keep them updated.</p>
</li>
<li><p><strong>L3 constructs:</strong> These are custom and internal CDK constructs developed internally, and they might have a higher or lower release rate based on the problem they solve. The tricky part is when they rely on official CDKs, so not keeping them updated gives some dependency version conflicts.</p>
</li>
</ul>
<p><strong>Run time:</strong></p>
<ul>
<li><p><strong>Execution Runtime:</strong> The release of new runtimes is not as often as other options, but EOL must be tracked, and necessary action and upgrades must be done.</p>
</li>
<li><p><strong>Libraries</strong>: Keeping them up-to-date in the JS ecosystem is a must. The JS ecosystem is big enough and often driven by the community, which means more people with different knowledge and culture can contribute, and hidden issues can be produced in terms of functionality or security. But the interesting point is that the resolution of those issues is often fast enough to cover the produced issue.</p>
</li>
<li><p><strong>SDKs</strong>: They provide some simple-to-use interfaces to abstract the real complexity. These are similar to libraries but are often maintained in a structured way and under an enterprise ownership. Often, the rate of releases is low.</p>
</li>
<li><p><strong>Frameworks</strong>: It is hard to name Frameworks as runtime dependencies as they are transpired to JS, however, they can impact the runtime environment.</p>
</li>
<li><p><strong>Database Engines:</strong> This category of engines is the hardest part per experience, as often the upgrades need a full regression testing. Some challenging Engines are PG-SQL, My-SQL, and OpenSearch. The rate of major releases is often low, such as once or twice yearly. The minor and patches are more frequently released. There are options to auto upgrade patch or minors while using managed services, but some instability can be experienced based on the engine and other factors.</p>
</li>
</ul>
<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">In this article we try to explore the automation possibilities of the above list except the Database Engines and Execution Runtime that need to be applied with control and planification.</div>
</div>

<h2 id="heading-github-dependabot">GitHub Dependabot</h2>
<p>One of the best tools to simplify the process and keep dependencies up to date is <code>Dependabot</code>. It is a free tool maintained by GitHub. It covers a variety of package managers and bundlers. For dependencies, it covers most known such as <code>npm</code>, <code>pnpm</code>, and <code>yarn</code> but also <code>docker</code>.</p>
<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">The abovementioned dependencies will be explored in this article.</div>
</div>

<p>The configuration gives enough flexibility to manage a variety of scenarios such as single or mono-repo, private or public dependencies, development or production dependencies , etc.</p>
<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">UPDATE: <a target="_self" href="https://www.linkedin.com/in/andmoredev/">Andres Moreno</a> raised an important point related to dependabot access to Environment Variables and Secrets, The Only Secret that Dependabot has access to , is GITHUB_TOKEN, for all other needs, the variables and secrets must be separately configured for dependabot as it has no access to repository Secrets or Env Variables</div>
</div>

<h2 id="heading-npm-dependencies">Npm Dependencies</h2>
<p>To manage npm dependencies the dependabot looks at repository and detect the package manager by presence of <code>package-lock.json</code> and looks at any single package version on registry to find if a new version available. Following example shows a simple dependabot config.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">version:</span> <span class="hljs-number">2</span>

<span class="hljs-attr">updates:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">package-ecosystem:</span> <span class="hljs-string">'npm'</span>
    <span class="hljs-attr">directory:</span> <span class="hljs-string">'/'</span>
    <span class="hljs-attr">schedule:</span>
      <span class="hljs-attr">interval:</span> <span class="hljs-string">'weekly'</span>
    <span class="hljs-attr">commit-message:</span>
      <span class="hljs-attr">prefix:</span> <span class="hljs-string">"chore"</span>
      <span class="hljs-attr">include:</span> <span class="hljs-string">"scope"</span>
</code></pre>
<h2 id="heading-mono-repo-dependencies">Mono Repo dependencies</h2>
<p>Dependabot will determine the pnpm process if finds the <code>pnpm-lock.yaml</code> , the <code>package-ecosystem</code> for <code>pnpm</code> has the same as <code>npm</code> , both must have the <code>npm</code> value. By default using the above config it will run into all modules and will raise pullrequests for upgrading packages. But if any grouping is configured those are applied correctly to root level package.json but all modules will have single PRs per dependency which is not ideal.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">version:</span> <span class="hljs-number">2</span>

<span class="hljs-attr">updates:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">package-ecosystem:</span> <span class="hljs-string">'npm'</span>
    <span class="hljs-attr">directories:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">'packages/**'</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">'/'</span>
    <span class="hljs-attr">schedule:</span>
      <span class="hljs-attr">interval:</span> <span class="hljs-string">'weekly'</span>
    <span class="hljs-attr">commit-message:</span>
      <span class="hljs-attr">prefix:</span> <span class="hljs-string">"chore"</span>
      <span class="hljs-attr">include:</span> <span class="hljs-string">"scope"</span>
</code></pre>
<h2 id="heading-grouping-dependencies">Grouping Dependencies</h2>
<p>By default Dependabot creates a Pull Request per package that can become cumbersome and hard to manage and need to deal with conflicts , etc. Grouping allows to raise single Pull Request for a configured set of dependencies as shown below.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">version:</span> <span class="hljs-number">2</span>

<span class="hljs-attr">updates:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">package-ecosystem:</span> <span class="hljs-string">'npm'</span>
    <span class="hljs-attr">directories:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">'packages/**'</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">'/'</span>
    <span class="hljs-attr">schedule:</span>
      <span class="hljs-attr">interval:</span> <span class="hljs-string">'weekly'</span>
    <span class="hljs-attr">groups:</span>
      <span class="hljs-attr">dependencies:</span>
        <span class="hljs-attr">update-types:</span>
          <span class="hljs-bullet">-</span> <span class="hljs-string">'patch'</span>
          <span class="hljs-bullet">-</span> <span class="hljs-string">'minor'</span>
          <span class="hljs-bullet">-</span> <span class="hljs-string">'major'</span>
    <span class="hljs-attr">commit-message:</span>
      <span class="hljs-attr">prefix:</span> <span class="hljs-string">"chore"</span>
      <span class="hljs-attr">include:</span> <span class="hljs-string">"scope"</span>
</code></pre>
<h2 id="heading-dev-and-prod-dependencies">Dev and Prod Dependencies</h2>
<p>To separate the build and run dependencies, you can give those dependencies different grouping and have separate PR per group, this allows to better take actions and test based on the group PR.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">version:</span> <span class="hljs-number">2</span>

<span class="hljs-attr">updates:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">package-ecosystem:</span> <span class="hljs-string">'npm'</span>
    <span class="hljs-attr">directories:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">'packages/**'</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">'/'</span>
    <span class="hljs-attr">schedule:</span>
      <span class="hljs-attr">interval:</span> <span class="hljs-string">'weekly'</span>
    <span class="hljs-attr">groups:</span>
      <span class="hljs-attr">prod-dependencies:</span>
        <span class="hljs-attr">dependency-type:</span> <span class="hljs-string">'production'</span>
      <span class="hljs-attr">dev-dependencies:</span>
        <span class="hljs-attr">dependency-type:</span> <span class="hljs-string">'development'</span>
    <span class="hljs-attr">commit-message:</span>
      <span class="hljs-attr">prefix:</span> <span class="hljs-string">"chore"</span>
      <span class="hljs-attr">include:</span> <span class="hljs-string">"scope"</span>
</code></pre>
<p>The above config group development ad production dependencies in different processes and raise separate PRs for each group.</p>
<h2 id="heading-isolating-major-minor-and-patch-versions">Isolating Major, Minor and Patch versions</h2>
<p>It is a good practice to separate the production dependencies ( those impacting runtime ) from development ones, however, in each category the Major versions are the most fearful and separating them from the minor and path version updates will save the time , energy and mental health.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">version:</span> <span class="hljs-number">2</span>

<span class="hljs-attr">updates:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">package-ecosystem:</span> <span class="hljs-string">'npm'</span>
    <span class="hljs-attr">directories:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">'packages/**'</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">'/'</span>
    <span class="hljs-attr">schedule:</span>
      <span class="hljs-attr">interval:</span> <span class="hljs-string">'weekly'</span>
    <span class="hljs-attr">groups:</span>
      <span class="hljs-attr">prod-dependencies:</span>
        <span class="hljs-attr">dependency-type:</span> <span class="hljs-string">'production'</span>
        <span class="hljs-attr">update-types:</span>
          <span class="hljs-bullet">-</span> <span class="hljs-string">"minor"</span>
          <span class="hljs-bullet">-</span> <span class="hljs-string">"patch"</span>
      <span class="hljs-attr">dev-dependencies:</span>
        <span class="hljs-attr">dependency-type:</span> <span class="hljs-string">'development'</span>
        <span class="hljs-attr">update-types:</span>
          <span class="hljs-bullet">-</span> <span class="hljs-string">"minor"</span>
          <span class="hljs-bullet">-</span> <span class="hljs-string">"patch"</span>
    <span class="hljs-attr">ignore:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">dependency-name:</span> <span class="hljs-string">"*"</span>
        <span class="hljs-attr">update-types:</span> [ <span class="hljs-string">"version-update:semver-major"</span> ]
    <span class="hljs-attr">commit-message:</span>
      <span class="hljs-attr">prefix:</span> <span class="hljs-string">"chore"</span>
      <span class="hljs-attr">include:</span> <span class="hljs-string">"scope"</span>

  <span class="hljs-bullet">-</span> <span class="hljs-attr">package-ecosystem:</span> <span class="hljs-string">"npm"</span>
    <span class="hljs-attr">directories:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">'packages/**'</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">'/'</span>
    <span class="hljs-attr">target-branch:</span> <span class="hljs-string">"prod"</span>
    <span class="hljs-attr">schedule:</span>
      <span class="hljs-attr">interval:</span> <span class="hljs-string">"monthly"</span>
    <span class="hljs-attr">groups:</span>
      <span class="hljs-attr">prod-dependencies:</span>
        <span class="hljs-attr">dependency-type:</span> <span class="hljs-string">"production"</span>
        <span class="hljs-attr">update-types:</span>
          <span class="hljs-bullet">-</span> <span class="hljs-string">"major"</span>
      <span class="hljs-attr">dev-dependencies:</span>
        <span class="hljs-attr">dependency-type:</span> <span class="hljs-string">"development"</span>
        <span class="hljs-attr">update-types:</span>
          <span class="hljs-bullet">-</span> <span class="hljs-string">"major"</span>
    <span class="hljs-attr">ignore:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">dependency-name:</span> <span class="hljs-string">"*"</span>
        <span class="hljs-attr">update-types:</span> [ 
          <span class="hljs-string">"version-update:semver-minor"</span>,
          <span class="hljs-string">"version-update:semver-patch"</span>
        ]
    <span class="hljs-attr">commit-message:</span>
      <span class="hljs-attr">prefix:</span> <span class="hljs-string">"chore"</span>
      <span class="hljs-attr">include:</span> <span class="hljs-string">"scope"</span>
</code></pre>
<p>The above config ignores all semver major and only looks for minor and patch updates.</p>
<h2 id="heading-docker-dependencies">Docker Dependencies</h2>
<p>Dependabot upgrade Dockerfiles by looking at stages ( FROM ) and apply updates as configured. In following example the major versions are excluded imperatively as the example relies on nodejs base image, but, a Major config ( excluding minor and patch ) can be added if the base image Major releases are desired.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">version:</span> <span class="hljs-number">2</span>

<span class="hljs-attr">updates:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">package-ecosystem:</span> <span class="hljs-string">"docker"</span>
    <span class="hljs-attr">directories:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">'packages/**'</span>
    <span class="hljs-attr">schedule:</span>
      <span class="hljs-attr">interval:</span> <span class="hljs-string">"weekly"</span>
    <span class="hljs-attr">ignore:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">dependency-name:</span> <span class="hljs-string">"*"</span>
        <span class="hljs-attr">update-types:</span> [ <span class="hljs-string">"version-update:semver-major"</span> ]
    <span class="hljs-attr">commit-message:</span>
      <span class="hljs-attr">prefix:</span> <span class="hljs-string">"chore"</span>
      <span class="hljs-attr">include:</span> <span class="hljs-string">"scope"</span>
</code></pre>
<h2 id="heading-private-and-public-registries">Private and Public Registries</h2>
<p>By default the public registers are verified such as <code>https://registery.npmjs.com/</code> , but private registers can be considered by adding them in dependabot <code>registries</code> section. When using <code>pnpm</code> adding a <code>.npmrc</code> file helps to eliminates the dependabot confusions and looking at both public and private dependencies and applying the fallback pattern if package is not found.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">version:</span> <span class="hljs-number">2</span>

<span class="hljs-attr">registries:</span>
  <span class="hljs-attr">npm-github:</span>
    <span class="hljs-attr">type:</span> <span class="hljs-string">npm-registry</span>
    <span class="hljs-attr">url:</span> <span class="hljs-string">https://npm.pkg.github.com</span>
    <span class="hljs-attr">token:</span> <span class="hljs-string">${{secrets.MY_GITHUB_TOKEN}}</span>

<span class="hljs-attr">updates:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">package-ecosystem:</span> <span class="hljs-string">'npm'</span>
    <span class="hljs-attr">directories:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">'packages/**'</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">'/'</span>
    <span class="hljs-attr">schedule:</span>
      <span class="hljs-attr">interval:</span> <span class="hljs-string">'weekly'</span>
    <span class="hljs-attr">registries:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">npm-github</span>
    <span class="hljs-attr">ignore:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">dependency-name:</span> <span class="hljs-string">"*"</span>
        <span class="hljs-attr">update-types:</span> [ <span class="hljs-string">"version-update:semver-major"</span> ]
    <span class="hljs-attr">groups:</span>
      <span class="hljs-attr">prod-dependencies:</span>
        <span class="hljs-attr">dependency-type:</span> <span class="hljs-string">'production'</span>
        <span class="hljs-attr">update-types:</span>
          <span class="hljs-bullet">-</span> <span class="hljs-string">"minor"</span>
          <span class="hljs-bullet">-</span> <span class="hljs-string">"patch"</span>
      <span class="hljs-attr">dev-dependencies:</span>
        <span class="hljs-attr">dependency-type:</span> <span class="hljs-string">'development'</span>
        <span class="hljs-attr">update-types:</span>
          <span class="hljs-bullet">-</span> <span class="hljs-string">"minor"</span>
          <span class="hljs-bullet">-</span> <span class="hljs-string">"patch"</span>
    <span class="hljs-attr">commit-message:</span>
      <span class="hljs-attr">prefix:</span> <span class="hljs-string">"chore"</span>
      <span class="hljs-attr">include:</span> <span class="hljs-string">"scope"</span>
</code></pre>
<p>The <code>.npmrc</code> file must be like the following example</p>
<pre><code class="lang-yaml"><span class="hljs-string">registry=https://registry.npmjs.org</span>
<span class="hljs-string">@xaaxaax:registry=https://npm.pkg.github.com</span>
</code></pre>
<p>This helps to lead the Dependabot to recognize the scoped packages ( starting with @xaaxaax ) from GitHub packages and others from public registry.</p>
<h2 id="heading-other-docker-image-registries">Other Docker image registries</h2>
<p>In scenarios when the base images are from other registries such as <code>public.ecr.aws</code> , this can be configured as a registry. Dependabot force having a username and password for docker registeries but some of public registries dont need that, however providing fake values for these two parameters is the solution.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">version:</span> <span class="hljs-number">2</span>

<span class="hljs-attr">registries:</span>
  <span class="hljs-attr">ecr-publichub:</span>
    <span class="hljs-attr">type:</span> <span class="hljs-string">"docker-registry"</span>
    <span class="hljs-attr">url:</span> <span class="hljs-string">"https://gallery.ecr.aws"</span>
    <span class="hljs-attr">username:</span> <span class="hljs-string">"fakeit"</span>
    <span class="hljs-attr">password:</span> <span class="hljs-string">"fakeit"</span>

<span class="hljs-attr">updates:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">package-ecosystem:</span> <span class="hljs-string">"docker"</span>
    <span class="hljs-attr">directories:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">'packages/**'</span>
    <span class="hljs-attr">schedule:</span>
      <span class="hljs-attr">interval:</span> <span class="hljs-string">"weekly"</span>
    <span class="hljs-attr">registries:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">ecr-publichub</span>
    <span class="hljs-attr">ignore:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">dependency-name:</span> <span class="hljs-string">"*"</span>
        <span class="hljs-attr">update-types:</span> [ <span class="hljs-string">"version-update:semver-major"</span> ]
    <span class="hljs-attr">commit-message:</span>
      <span class="hljs-attr">prefix:</span> <span class="hljs-string">"chore"</span>
      <span class="hljs-attr">include:</span> <span class="hljs-string">"scope"</span>
</code></pre>
<h2 id="heading-automating-pull-request-actions">Automating Pull Request Actions</h2>
<p>If the project has enough barriers to validate the viability of dependencies’ upgrades some last parts of the process can be automated, such as approving, and merging pull requests. This can be achieved using the GitHub workflows, The process looks at pull request metadata, validates some details, approves, and merges the Pr.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">name:</span> <span class="hljs-string">Dependabot</span> <span class="hljs-string">Automated</span> <span class="hljs-string">Pull</span> <span class="hljs-string">Requests</span>
<span class="hljs-attr">on:</span> 
  <span class="hljs-attr">pull_request:</span>
    <span class="hljs-attr">types:</span> [<span class="hljs-string">opened</span>, <span class="hljs-string">reopened</span>]

<span class="hljs-attr">permissions:</span>
  <span class="hljs-attr">pull-requests:</span> <span class="hljs-string">write</span>
  <span class="hljs-attr">contents:</span> <span class="hljs-string">write</span>
  <span class="hljs-attr">issues:</span> <span class="hljs-string">write</span>
  <span class="hljs-attr">repository-projects:</span> <span class="hljs-string">write</span>

<span class="hljs-attr">env:</span>
  <span class="hljs-attr">DEPS_SCOPE:</span> <span class="hljs-string">'production'</span>
  <span class="hljs-attr">MAJOR_UPDATE:</span> <span class="hljs-string">'false'</span>

<span class="hljs-attr">jobs:</span>
  <span class="hljs-attr">dependabot:</span>
    <span class="hljs-attr">runs-on:</span> <span class="hljs-string">ubuntu-latest</span>
    <span class="hljs-attr">if:</span> <span class="hljs-string">github.event.pull_request.user.login</span> <span class="hljs-string">==</span> <span class="hljs-string">'dependabot[bot]'</span>
    <span class="hljs-attr">steps:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Checkout</span> <span class="hljs-string">code</span>
        <span class="hljs-attr">uses:</span> <span class="hljs-string">actions/checkout@v4</span>
        <span class="hljs-attr">with:</span>
          <span class="hljs-attr">fetch-depth:</span> <span class="hljs-number">0</span>
          <span class="hljs-attr">token:</span> <span class="hljs-string">${{</span> <span class="hljs-string">secrets.GITHUB_TOKEN</span> <span class="hljs-string">}}</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Install</span> <span class="hljs-string">pnpm</span>
        <span class="hljs-attr">uses:</span> <span class="hljs-string">pnpm/action-setup@v4</span>
        <span class="hljs-attr">with:</span>
          <span class="hljs-attr">version:</span> <span class="hljs-string">latest</span>
          <span class="hljs-attr">run_install:</span> <span class="hljs-literal">false</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Install</span> <span class="hljs-string">dependencies</span>
        <span class="hljs-attr">run:</span> <span class="hljs-string">pnpm</span> <span class="hljs-string">install</span> <span class="hljs-string">--no-frozen-lockfile</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Build</span> 
        <span class="hljs-attr">run:</span> <span class="hljs-string">pnpm</span> <span class="hljs-string">run</span> <span class="hljs-string">build:all</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Synth</span>
        <span class="hljs-attr">run:</span> <span class="hljs-string">pnpm</span> <span class="hljs-string">run</span> <span class="hljs-string">synth:all</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Test</span>
        <span class="hljs-attr">run:</span> <span class="hljs-string">pnpm</span> <span class="hljs-string">run</span> <span class="hljs-string">test:all</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Dependabot</span> <span class="hljs-string">metadata</span>
        <span class="hljs-attr">id:</span> <span class="hljs-string">metadata</span>
        <span class="hljs-attr">uses:</span> <span class="hljs-string">dependabot/fetch-metadata@v2</span>
        <span class="hljs-attr">with:</span>
          <span class="hljs-attr">github-token:</span> <span class="hljs-string">"$<span class="hljs-template-variable">{{ secrets.GITHUB_TOKEN }}</span>"</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Define</span> <span class="hljs-string">Dependencies</span> <span class="hljs-string">scope</span>
        <span class="hljs-attr">run:</span> <span class="hljs-string">|
          echo "DEPS_SCOPE=${{ steps.metadata.outputs.dependency-type == 'direct:development' &amp;&amp; 'development' || 'production' }}" &gt;&gt; $GITHUB_ENV
          echo "MAJOR_UPDATE=${{ steps.metadata.outputs.update-type == 'version-update:semver-major' &amp;&amp; 'true' || 'false' }}" &gt;&gt; $GITHUB_ENV
</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Create</span> <span class="hljs-string">${{</span> <span class="hljs-string">env.DEPS_SCOPE</span> <span class="hljs-string">}}</span> <span class="hljs-string">label</span>
        <span class="hljs-attr">continue-on-error:</span> <span class="hljs-literal">true</span>
        <span class="hljs-attr">run:</span> <span class="hljs-string">gh</span> <span class="hljs-string">label</span> <span class="hljs-string">create</span> <span class="hljs-string">${{</span> <span class="hljs-string">env.DEPS_SCOPE</span> <span class="hljs-string">}}</span>
        <span class="hljs-attr">env:</span>
          <span class="hljs-attr">GH_TOKEN:</span> <span class="hljs-string">${{secrets.GITHUB_TOKEN}}</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Add</span> <span class="hljs-string">a</span> <span class="hljs-string">label</span> <span class="hljs-string">for</span> <span class="hljs-string">all</span> <span class="hljs-string">${{</span> <span class="hljs-string">env.DEPS_SCOPE</span> <span class="hljs-string">}}</span> <span class="hljs-string">dependencies</span>
        <span class="hljs-attr">continue-on-error:</span> <span class="hljs-literal">true</span>
        <span class="hljs-attr">run:</span> <span class="hljs-string">gh</span> <span class="hljs-string">pr</span> <span class="hljs-string">edit</span> <span class="hljs-string">"$PR_URL"</span> <span class="hljs-string">--add-label</span> <span class="hljs-string">${{</span> <span class="hljs-string">env.DEPS_SCOPE</span> <span class="hljs-string">}}</span>
        <span class="hljs-attr">env:</span>
          <span class="hljs-attr">PR_URL:</span> <span class="hljs-string">${{github.event.pull_request.html_url}}</span>
          <span class="hljs-attr">GH_TOKEN:</span> <span class="hljs-string">${{secrets.GITHUB_TOKEN}}</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Approve</span> <span class="hljs-string">a</span> <span class="hljs-string">PR</span>
        <span class="hljs-attr">run:</span> <span class="hljs-string">gh</span> <span class="hljs-string">pr</span> <span class="hljs-string">review</span> <span class="hljs-string">--approve</span> <span class="hljs-string">"$PR_URL"</span>
        <span class="hljs-attr">env:</span>
          <span class="hljs-attr">PR_URL:</span> <span class="hljs-string">${{github.event.pull_request.html_url}}</span>
          <span class="hljs-attr">GH_TOKEN:</span> <span class="hljs-string">${{secrets.GITHUB_TOKEN}}</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Enable</span> <span class="hljs-string">auto-merge</span> <span class="hljs-string">for</span> <span class="hljs-string">Dependabot</span> <span class="hljs-string">PRs</span>
        <span class="hljs-attr">if:</span> <span class="hljs-string">${{</span> <span class="hljs-string">env.MAJOR_UPDATE</span> <span class="hljs-type">!=</span> <span class="hljs-string">'true'</span> <span class="hljs-string">}}</span>
        <span class="hljs-attr">run:</span> <span class="hljs-string">gh</span> <span class="hljs-string">pr</span> <span class="hljs-string">merge</span> <span class="hljs-string">--auto</span> <span class="hljs-string">--rebase</span> <span class="hljs-string">--delete-branch</span> <span class="hljs-string">"$PR_URL"</span>
        <span class="hljs-attr">env:</span>
          <span class="hljs-attr">PR_URL:</span> <span class="hljs-string">${{github.event.pull_request.html_url}}</span>
          <span class="hljs-attr">GH_TOKEN:</span> <span class="hljs-string">${{secrets.GITHUB_TOKEN}}</span>
</code></pre>
<p>The above workflow will be triggered each time a Pull Request is opened or closed by <code>dependabot[bot]</code> user. This is the Dependabot default user. The Pull Requests are build and tested and in case of not a major version bump , they are automatically merged.</p>
<h1 id="heading-conclusion">Conclusion</h1>
<p>While software development benefits from internally shared or community-driven dependencies, they can become an obstacle when an upgrade is required. The decision is based on many factors that lead to consider a dependency upgrade necessary or not.</p>
<p>While the factors are important, they often make things more complex than upgrading the dependencies frequently. Dependency upgrades just need to be categorized and automated as much as possible. Dependency upgrades can force some software changes or not; whatever the case, small changes are easier than looking at the whole implementation, so keeping track of versions and upgrading gives more chance to keep the minimum of effort and mental health.</p>
]]></content:encoded></item><item><title><![CDATA[Sidecar Pattern In Serverless Design]]></title><description><![CDATA[Observability becomes significantly more challenging when transitioning to distributed systems, particularly in Serverless architectures. While serverless design is beneficial for decomposition and scalability, its granular nature imposes challenges ...]]></description><link>https://blogs.serverlessfolks.com/sidecar-pattern-in-serverless-design</link><guid isPermaLink="true">https://blogs.serverlessfolks.com/sidecar-pattern-in-serverless-design</guid><category><![CDATA[lambda-web-adapter]]></category><category><![CDATA[lambda custom image]]></category><category><![CDATA[sidecar-container]]></category><category><![CDATA[Lambda Extension]]></category><category><![CDATA[serverless]]></category><category><![CDATA[aws-fargate]]></category><dc:creator><![CDATA[Omid Eidivandi]]></dc:creator><pubDate>Sat, 01 Feb 2025 17:45:39 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1738431768757/c7aa50e7-675c-4337-9b9a-4d4de6febd64.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Observability becomes significantly more challenging when transitioning to distributed systems, particularly in Serverless architectures. While serverless design is beneficial for decomposition and scalability, its granular nature imposes challenges for observability. Therefore, it is important always to find ways to instrument the software without tightly coupling the instrumentation to the dedicated environment for core software processing.</p>
<p>This article explores two Serverless computing services and offers a straightforward method to centralize logging for Serverless applications:</p>
<ul>
<li><p>Cloudwatch Subscription Filters</p>
</li>
<li><p>AWS Lambda Zip package with Extensions</p>
</li>
<li><p>AWS Lambda Custom Image with Extensions</p>
</li>
<li><p>AWS Lambda Web Adapter Image with Extensions</p>
</li>
<li><p>AWS Fargate with sidecar</p>
</li>
</ul>
<h2 id="heading-about-provided-source-code">About Provided Source Code</h2>
<p>The whole examples can be found in source code <a target="_blank" href="https://github.com/XaaXaaX/aws-serverless-sidecar-logs-aggregation">GitHub repository</a></p>
<p>The source code is designed as mono-repo using <code>NX</code>, and <code>pnpm</code>. The <code>core</code> package is a private lib that shares some central helpers with other modules. <code>observability-core</code> module is the prerequisite for all other modules and provides:</p>
<ul>
<li><p><strong>Lambda Extension Layer</strong></p>
</li>
<li><p><strong>ECR Repository</strong></p>
</li>
<li><p><strong>Base Container Image with Extension</strong></p>
</li>
<li><p><strong>Kinesis data Stream</strong></p>
</li>
<li><p><strong>IAM managed policy.</strong></p>
</li>
</ul>
<p>The dependencies are configured via <code>nx.json</code> file in root of the repository for the <code>cdk</code> and <code>build</code> targets. This will force the build and deployment of prerequisite module before the other modules.</p>
<pre><code class="lang-json"><span class="hljs-string">"targetDefaults"</span>: {
    <span class="hljs-attr">"cdk"</span>: {
      <span class="hljs-attr">"dependsOn"</span>: [
        {
          <span class="hljs-attr">"projects"</span>: <span class="hljs-string">"@xaaxaax/observability-core"</span>,
          <span class="hljs-attr">"target"</span>: <span class="hljs-string">"cdk"</span>,
          <span class="hljs-attr">"params"</span>: <span class="hljs-string">"forward"</span>,
          <span class="hljs-attr">"required"</span>: [ <span class="hljs-string">"projects"</span>, <span class="hljs-string">"target"</span> ]
        }
      ]
    },
    <span class="hljs-attr">"build"</span>: {
      <span class="hljs-attr">"dependsOn"</span>: [
        {
          <span class="hljs-attr">"projects"</span>: <span class="hljs-string">"@xaaxaax/observability-core"</span>,
          <span class="hljs-attr">"target"</span>: <span class="hljs-string">"build"</span>,
          <span class="hljs-attr">"params"</span>: <span class="hljs-string">"forward"</span>,
          <span class="hljs-attr">"required"</span>: [ <span class="hljs-string">"projects"</span>, <span class="hljs-string">"target"</span> ]
        }
      ]
    }
  },
</code></pre>
<p>The targets are defined in the root <code>package.json</code> file scripts section as below:</p>
<pre><code class="lang-json">{
  ...
  <span class="hljs-attr">"scripts"</span>: {
     <span class="hljs-attr">"nx:build:all"</span>: <span class="hljs-string">"nx run-many --target=build --output-style static --skip-nx-cache"</span>,
     <span class="hljs-attr">"nx:cdk:all"</span>: <span class="hljs-string">"nx run-many --target=cdk --output-style static --skip-nx-cache --require-approval never"</span>,
  },
  ...
}
</code></pre>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://github.com/XaaXaaX/aws-serverless-sidecar-logs-aggregation">https://github.com/XaaXaaX/aws-serverless-sidecar-logs-aggregation</a></div>
<p> </p>
<p>For simplicity the created functions are configured with function url and can be triggered easily. The only hint is a function gets triggered two times when triggering from browser as an invocation will be for <code>favicon.ico</code>.</p>
<h2 id="heading-cloudwatch-subscription-filters">Cloudwatch Subscription Filters</h2>
<p>AWS services like Lambda and Fargate have native integration with CloudWatch, but for critical workloads, the cost of log ingestion can become prohibitive. Depending on usage needs, CloudWatch logs can be utilized selectively, with different approaches available.</p>
<p>When using CloudWatch, Subscription Filters offer a way to forward logs to various destinations, including OpenSearch, Kinesis Data Streams, Amazon Data Firehose, or AWS Lambda.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1737066078177/f576e8a3-e6ee-45c2-a462-68f7a3002e58.png" alt class="image--center mx-auto" /></p>
<p>In this section, CloudWatch Subscription Filters are used to stream logs to an Amazon Kinesis Data Stream for further processing and analysis.</p>
<p>The following code snippet showcases how to implement this log forwarding mechanism using AWS CDK:</p>
<pre><code class="lang-typescript">    <span class="hljs-keyword">const</span> logGropup = <span class="hljs-keyword">new</span> LogGroup(<span class="hljs-built_in">this</span>, <span class="hljs-string">'LogGroup'</span>, {
      logGroupName: <span class="hljs-string">`/aws/lambda/<span class="hljs-subst">${lambdaWithCloudwatch.functionName}</span>`</span>,
      retention: RetentionDays.ONE_DAY,
      removalPolicy: RemovalPolicy.DESTROY,
    });

    <span class="hljs-keyword">const</span> logsDeliveryRole = <span class="hljs-keyword">new</span> Role(<span class="hljs-built_in">this</span>, <span class="hljs-string">`LogsDeliveryRole`</span>, { 
      assumedBy: <span class="hljs-keyword">new</span> ServicePrincipal(<span class="hljs-string">'logs.amazonaws.com'</span>),
    });

    logGropup.addSubscriptionFilter(<span class="hljs-string">'SubscriptionFilter'</span>, {
      destination: <span class="hljs-keyword">new</span> KinesisDestination(LogStream,{
        role: logsDeliveryRole
      }),
      filterPattern: {
        logPatternString: <span class="hljs-string">' '</span>, <span class="hljs-comment">// this configure all logs to be filtered</span>
      }
    })

    LogStream.grantWrite(logsDeliveryRole);
</code></pre>
<p>Invoking the lambda function will result sending log records in kinesis via cloudwatch subscription filters as shown is the following figure.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1737109967515/a317fe9e-1a14-47b3-b1dd-2915c9ae5630.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-lambda-telemetry-api">Lambda Telemetry Api</h2>
<p>Lambda offers a Telemetry API, which will be excellent choice to capture function log records without using the cloudwatch logs. The logs received through the Telemetry API follow a straightforward format, as shown below.</p>
<pre><code class="lang-json">{
    <span class="hljs-attr">"time"</span>: <span class="hljs-string">"2025-01-30T00:00:00.000Z"</span>, 
    <span class="hljs-attr">"type"</span>: <span class="hljs-string">"function"</span>, <span class="hljs-comment">// function, extension, platform</span>
    <span class="hljs-attr">"record"</span>: {
       <span class="hljs-attr">"timestamp"</span>: <span class="hljs-string">"2025-01-30T00:00:09.429Z"</span>,
       <span class="hljs-attr">"level"</span>: <span class="hljs-string">"INFO"</span>,
       <span class="hljs-attr">"requestId"</span>: <span class="hljs-string">"79b4f56e-95b1-4643-9700-2807f4e68189"</span>,
       <span class="hljs-attr">"message"</span>: <span class="hljs-string">"Log Message HERE"</span>
    }
}
</code></pre>
<p>If The lambda LogFormat is TEXT the received format will be as the following snippet.</p>
<pre><code class="lang-json">{
    <span class="hljs-attr">"time"</span>: <span class="hljs-string">"2025-01-30T00:00:00.000Z"</span>, 
    <span class="hljs-comment">// function, extension, platform</span>
    <span class="hljs-attr">"type"</span>: <span class="hljs-string">"function"</span>, 
    <span class="hljs-comment">//  Timestamp \t RequestId \t Type \t Message</span>
    <span class="hljs-attr">"record"</span>: <span class="hljs-string">"2025-01-30T00:00:09.429Z 79b4f56e-95b1-4643-9700-2807f4e68189 [INFO] Log Message HERE"</span> 
    }
}
</code></pre>
<p><img src="https://github.com/aws-samples/aws-lambda-extensions/blob/main/nodejs-example-telemetry-api-extension/sample-extension-seq-diagram.png?raw=true" alt="sample-extension-seq-diagram.png" /></p>
<h2 id="heading-lambda-extensions">Lambda Extensions</h2>
<p>For high-throughput applications, relying on CloudWatch Logs can lead to substantial costs. One way to mitigate this is by denying CloudWatch Logs permissions in the Lambda execution role. This prevents the Lambda service from sending logs to CloudWatch, and the consequent will be preventing the use of "Subscription Filters."</p>
<p>However, Lambda provides a Telemetry API that captures all logs, even when CloudWatch logging is disabled. By using the Extension API, you can subscribe to the Telemetry API and register for specific log categories, such as platform, function, or extension logs.</p>
<p>The Source code repository provides the extension module <a target="_blank" href="https://github.com/XaaXaaX/aws-serverless-sidecar-logs-aggregation/tree/main/packages/observability-core">here</a>, that will be used for Lambda Zip package, Custom Image and Web Adapter image sections</p>
<p>The execution of extensions, whether as a standard ZIP package or a custom image, is managed by the Lambda runtime. The Lambda service scans the <code>/opt/extensions</code> directory and automatically executes any extensions found in that location.</p>
<p>For ZIP package deployments, this attachment occurs during the Lambda initialization phase, where the extensions path is constructed by aggregating all attached layers. However, for custom images, this structure must be manually set up during the container image build process.</p>
<p>This project generates two final assets from the same extension source, along with other previously mentioned resources. The extension itself is built using <code>esbuild</code> and bundled as JavaScript, with a post-build script handling the folder structure setup.</p>
<p>The provided final assets are:</p>
<ul>
<li><p>A Lambda Layer</p>
</li>
<li><p>A ECR Base Container Image</p>
</li>
</ul>
<h3 id="heading-lambda-layer">Lambda Layer</h3>
<p>For the layer the build process do all necessary steps. The only required step is to use CDK for creating the layer. The following snippet demonstrates the way to create a layer using CDK.</p>
<pre><code class="lang-typescript">   <span class="hljs-keyword">const</span> extension = <span class="hljs-keyword">new</span> LayerVersion(<span class="hljs-built_in">this</span>, <span class="hljs-string">'kinesis-telemetry-api-extension'</span>, {
      layerVersionName: <span class="hljs-string">`<span class="hljs-subst">${props?.extensionName}</span>`</span>,
      code: Code.fromAsset(resolve(process.cwd(), <span class="hljs-string">`build`</span>)),
      compatibleArchitectures: [
        Architecture.X86_64,
        Architecture.ARM_64
      ],
      compatibleRuntimes: [
        Runtime.NODEJS_20_X,
        Runtime.NODEJS_22_X,
      ],
      description: props?.extensionName
    });

    <span class="hljs-comment">// Exporting the Layer Arn to parameter store</span>
    <span class="hljs-keyword">new</span> StringParameter(<span class="hljs-built_in">this</span>, <span class="hljs-string">`LambdaExtensionArnParam`</span>, {
      parameterName: <span class="hljs-string">`/<span class="hljs-subst">${props.contextVariables.stage}</span>/<span class="hljs-subst">${props.contextVariables.context}</span>/telemetry/kinesis/extension/arn`</span>,
      stringValue: extension.layerVersionArn,
    });
</code></pre>
<p>The <code>LayerVersion</code> resource point to the build directory generated by build script, The underlying build folder structure is as below</p>
<pre><code class="lang-markdown"><span class="hljs-bullet">-</span> build
<span class="hljs-bullet">  -</span> extensions
<span class="hljs-bullet">    -</span> kinesis-telemetry-extension
<span class="hljs-bullet">  -</span> kinesis-telemetry-extension
<span class="hljs-bullet">    -</span> index.js
</code></pre>
<p>The <code>kinesis-telemetry-extension</code> file under <code>extensions</code> folder is an executable file that will be the entry point for lambda service to detect extension and execute it. The file name must be equal to the extension directory name for this example executable.</p>
<pre><code class="lang-bash"><span class="hljs-meta">#!/bin/bash</span>
<span class="hljs-built_in">set</span> -euo pipefail

OWN_FILENAME=<span class="hljs-string">"<span class="hljs-subst">$(basename $0)</span>"</span>
LAMBDA_EXTENSION_NAME=<span class="hljs-string">"<span class="hljs-variable">$OWN_FILENAME</span>"</span>

<span class="hljs-built_in">echo</span> <span class="hljs-string">"[extension:bash] launching <span class="hljs-variable">${LAMBDA_EXTENSION_NAME}</span>"</span>
<span class="hljs-built_in">exec</span> <span class="hljs-string">"/opt/<span class="hljs-variable">${LAMBDA_EXTENSION_NAME}</span>/index.js"</span>
</code></pre>
<h3 id="heading-base-container-image">Base Container Image</h3>
<p>The base custom image with extension included is created using a <code>Dockerfile</code>. The <code>Dockerfile</code> simply use the built asset ( <code>build</code> folder ) contents and move it to the <code>/opt/</code> directory in built image.</p>
<pre><code class="lang-dockerfile"><span class="hljs-keyword">FROM</span> node:<span class="hljs-number">22.13</span>.<span class="hljs-number">1</span>-slim

<span class="hljs-keyword">COPY</span><span class="bash"> build /opt/</span>

<span class="hljs-keyword">WORKDIR</span><span class="bash"> /opt/extensions</span>
</code></pre>
<p>The Image will be built and pushed to the ECR repository created via <code>cdk</code> which is a prerequisite for pushing the image to ECR. This is done using a <code>post</code> script.</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"name"</span>: <span class="hljs-string">"@xaaxaax/observability-core"</span>,
  ...
  <span class="hljs-attr">"scripts"</span>: {
    <span class="hljs-attr">"build:docker"</span>: <span class="hljs-string">"docker buildx build --platform linux/arm64 --no-cache -t $ECR_REPOSITORY:latest ."</span>,
    <span class="hljs-attr">"postbuild:docker"</span>: <span class="hljs-string">"pnpm run build:docker:login &amp;&amp; pnpm run build:docker:tag &amp;&amp; pnpm run build:docker:push"</span>,
    <span class="hljs-attr">"build:docker:login"</span>: <span class="hljs-string">"aws ecr get-login-password --region $REGION --profile admin@dev | docker login --username AWS --password-stdin $ECR_URI"</span>,
    <span class="hljs-attr">"build:docker:tag"</span>: <span class="hljs-string">"docker tag $ECR_REPOSITORY:latest $ECR_URI/$ECR_REPOSITORY:latest"</span>,
    <span class="hljs-attr">"build:docker:push"</span>: <span class="hljs-string">"docker push $ECR_URI/$ECR_REPOSITORY:latest"</span>,
    <span class="hljs-attr">"cdk"</span>: <span class="hljs-string">"cdk --profile admin@dev --app 'tsx ./cdk/bin/app.ts' -c env=dev"</span>,
    <span class="hljs-attr">"postcdk"</span>: <span class="hljs-string">"cross-env REGION=eu-west-1 ECR_URI=904233108557.dkr.ecr.eu-west-1.amazonaws.com ECR_REPOSITORY=lambda-telemetry-image pnpm run build:docker"</span>
  },
  ...
}
</code></pre>
<h2 id="heading-lambda-zip-package-with-extensions">Lambda Zip package with Extensions</h2>
<p>Dealing with Zip lambda package is the simplest option by attaching the layer to the lambda function. but also giving the required permissions to the associated role for underlying infrastructure that the extension need to interact with that is kinesis data stream in the provided example.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1738410360089/4c803939-273f-4e02-a704-3a15c43e33ba.png" alt class="image--center mx-auto" /></p>
<p>The following CDK represents how the extension layer can be attached to the function and what are the required permissions.</p>
<pre><code class="lang-typescript">
    <span class="hljs-keyword">const</span> extensionArn = StringParameter.fromStringParameterName(
      <span class="hljs-built_in">this</span>, <span class="hljs-string">'extensionId'</span>, <span class="hljs-string">`/<span class="hljs-subst">${props.contextVariables.stage}</span>/logs-collector-lambda-extension/telemetry/kinesis/extension/arn`</span>).stringValue;

    <span class="hljs-keyword">const</span> managedPolicyArn = StringParameter.fromStringParameterName(
      <span class="hljs-built_in">this</span>, <span class="hljs-string">'policyName'</span>, <span class="hljs-string">`/<span class="hljs-subst">${props.contextVariables.stage}</span>/logs-collector-lambda-extension/telemetry/kinesis/runtime/policy/arn`</span>).stringValue;

    <span class="hljs-keyword">const</span> functionRole = <span class="hljs-keyword">new</span> Role(<span class="hljs-built_in">this</span>, <span class="hljs-string">'LambdaFunctionRole'</span>, {
      assumedBy: <span class="hljs-keyword">new</span> ServicePrincipal(<span class="hljs-string">'lambda.amazonaws.com'</span>),
      managedPolicies: [
        ManagedPolicy.fromAwsManagedPolicyName(<span class="hljs-string">'service-role/AWSLambdaBasicExecutionRole'</span>),
        ManagedPolicy.fromManagedPolicyArn(<span class="hljs-built_in">this</span>, <span class="hljs-string">'managed-policy'</span>, managedPolicyArn)
      ]
    });

    <span class="hljs-keyword">const</span> lambdaFunction = <span class="hljs-keyword">new</span> NodejsFunction(<span class="hljs-built_in">this</span>, <span class="hljs-string">'LambdaZipFunction'</span>, {
      entry: resolve(process.cwd(), <span class="hljs-string">'src/handler.ts'</span>),
      ...
      bundling: {
        ...
      },
      layers: [ 
        LayerVersion.fromLayerVersionArn(<span class="hljs-built_in">this</span>, <span class="hljs-string">'ExtensionArn'</span>, extensionArn) 
      ],
    });
</code></pre>
<h2 id="heading-lambda-custom-image-with-extensions">Lambda Custom Image With Extensions</h2>
<p>The example custom images build a docker image base function from a <code>Dockerfile</code> based on provided base image with extension included.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1738410560235/56580f69-4217-4ce4-bb66-7b1fdbff4cfc.png" alt class="image--center mx-auto" /></p>
<p>The <code>Dockerfile</code> is based on both extension image and lambda nodejs 22 image provided by aws. The interesting part of AWS provided image is that it can be run locally and can be invoked for example using a curl command.</p>
<pre><code class="lang-dockerfile">
<span class="hljs-keyword">FROM</span> <span class="hljs-number">904233108557</span>.dkr.ecr.eu-west-<span class="hljs-number">1</span>.amazonaws.com/lambda-telemetry-image:latest AS extensions
<span class="hljs-keyword">FROM</span> public.ecr.aws/lambda/nodejs:<span class="hljs-number">22</span>

<span class="hljs-keyword">WORKDIR</span><span class="bash"> <span class="hljs-variable">${LAMBDA_TASK_ROOT}</span></span>

<span class="hljs-keyword">COPY</span><span class="bash"> dist/* ./</span>
<span class="hljs-keyword">COPY</span><span class="bash"> --from=extensions ./opt/ /opt/</span>

<span class="hljs-keyword">CMD</span><span class="bash"> [<span class="hljs-string">"index.handler"</span>]</span>
</code></pre>
<p>The built asset will be copied to the <code>/var/task</code> path that can be accessed using <code>LAMBDA_TASK_ROOT</code> env variable and finally the <code>CMD</code> layer point to the handler inside <code>index.js</code> file.</p>
<p>In the example the base image imageUri is hardcoded in <code>Dockerfile</code> but it can be parametrized using parameter store and passing as docker build ARGs,</p>
<h2 id="heading-lambda-web-adapter-image-with-extensions">Lambda Web Adapter Image With Extensions</h2>
<p>While Lambda Adapter provides a custom runtime, it brings some particularity to the way the Dockerfile shall be used. As per LWA documentation and examples, the base image used is <code>public.ecr.aws/docker/library/node:22.9.0-slim</code> and not <code>public.ecr.aws/lambda/nodejs:22</code> which means the lambda api interface no more can be used for local invocation, e.g using curl.</p>
<p>The example <code>Dockerfile</code> uses multiples stages</p>
<pre><code class="lang-dockerfile">
<span class="hljs-keyword">FROM</span> <span class="hljs-number">904233108557</span>.dkr.ecr.eu-west-<span class="hljs-number">1</span>.amazonaws.com/lambda-telemetry-image:latest AS extensions
<span class="hljs-keyword">FROM</span> public.ecr.aws/awsguru/aws-lambda-adapter:<span class="hljs-number">0.9</span>.<span class="hljs-number">0</span>-aarch64 AS webadapter
<span class="hljs-keyword">FROM</span> public.ecr.aws/docker/library/node:<span class="hljs-number">22.9</span>.<span class="hljs-number">0</span>-slim

<span class="hljs-keyword">WORKDIR</span><span class="bash"> <span class="hljs-variable">${LAMBDA_TASK_ROOT}</span></span>

<span class="hljs-keyword">COPY</span><span class="bash"> dist/* ./</span>
<span class="hljs-keyword">COPY</span><span class="bash"> --from=extensions ./opt/ /opt/</span>
<span class="hljs-keyword">COPY</span><span class="bash"> --from=webadapter /lambda-adapter /opt/extensions/lambda-adapter</span>

<span class="hljs-keyword">CMD</span><span class="bash"> [<span class="hljs-string">"node"</span>, <span class="hljs-string">"index.js"</span>]</span>
</code></pre>
<p>The image will use the extension base image alongside the lambda web adapter base image and use the contents of <code>./opt</code> folder related to each extension. Also it uses the built asset of function code ( here a <code>http</code> server on default port of LWA 8080 ).</p>
<p>The particularity behavior for LWA in our example is the way function logs are received in extension. Only the function logs are under this unfortunate behavior and are not formatted as a valid json object but are treated as text event while the LambdaLogFormat is set as JSON. so the above official format is not working and <code>element.record.message</code> will result an undefined value. The following shows how the record is received which is a representation of Javascript object surrounded by double quotes.</p>
<pre><code class="lang-json">{
   <span class="hljs-attr">"time"</span>:<span class="hljs-string">"2025-01-29T21:24:33@.665Z"</span>,
   <span class="hljs-attr">"type"</span>:<span class="hljs-string">"function"</span>,
   <span class="hljs-attr">"record"</span>:<span class="hljs-string">"{ name: 'omid' }"</span>
}
</code></pre>
<p>To resolve the problem, the extension is adopted to look at <code>element.record</code> if the <code>element.record.message</code> is value. But event the change is not sufficient as the received record is a double quoted JS object. So the log data must be formatted using <code>JSON.stringify()</code>.</p>
<pre><code class="lang-typescript"><span class="hljs-built_in">console</span>.log(<span class="hljs-built_in">JSON</span>.stringify( logObject ));
</code></pre>
<h2 id="heading-fargate-with-firelens-sidecar">Fargate with Firelens sidecar</h2>
<p>Fargate, as a serverless solution for running containers on demand, supports both short-lived and long-running tasks. Regardless of the use case, enabling containers to communicate and complement each other’s capabilities is essential for building scalable and efficient architectures.</p>
<p>In line with the examples in this article, this section demonstrates how to forward container logs to a central Kinesis Data Stream. To achieve this, the Fargate task can include a sidecar container responsible for collecting logs and forwarding them to the data stream.</p>
<p>The application container provides a <code>Dockerfile</code> as below</p>
<pre><code class="lang-dockerfile"><span class="hljs-keyword">FROM</span> --platform=linux/arm64 public.ecr.aws/docker/library/node:<span class="hljs-number">22</span>-slim

<span class="hljs-keyword">COPY</span><span class="bash"> dist/* ./</span>

<span class="hljs-keyword">CMD</span><span class="bash"> [<span class="hljs-string">"node"</span>, <span class="hljs-string">"index.js"</span>]</span>
</code></pre>
<p>But as mentioned above, there will be another <code>Dockerfile</code> for the log forwarder container that uses the <code>FluentBit</code> image provided by aws.</p>
<pre><code class="lang-dockerfile"><span class="hljs-keyword">FROM</span> amazon/aws-for-fluent-bit:latest

<span class="hljs-keyword">ADD</span><span class="bash"> container.conf /container.conf</span>
<span class="hljs-keyword">ADD</span><span class="bash"> parsers.conf /parsers.conf</span>
</code></pre>
<p>As shown in the <code>Dockerfile</code> there are two configuration files for Parsing and Container specific configurations such as Filtering , etc. the content of both files are as below</p>
<pre><code class="lang-dockerfile">// parsers.conf file
[PARSER]
    Name    log_json
    Format  json

// container.conf file
[SERVICE]
    Parsers_File    parsers.conf

[FILTER]
    Name            parser
    Match           *
    Key_Name        log
    Parser          log_json

[FILTER]
    Name            grep
    Match           *
    Regex           app_name fargate-example-app
</code></pre>
<p>Let see how these are deployed and resources are created. AWS Cdk provides the L2 constructs that can be used to simplify the infra as code steps. The example uses the <code>FargateTaskDefinition</code> and <code>FargateService</code> constructs.</p>
<pre><code class="lang-typescript">   <span class="hljs-keyword">const</span> jobDefinition = <span class="hljs-keyword">new</span> FargateTaskDefinition(<span class="hljs-built_in">this</span>, <span class="hljs-string">'JobDefinition'</span>, {
      cpu: <span class="hljs-number">256</span>,
      memoryLimitMiB: <span class="hljs-number">512</span>,
      runtimePlatform: {
        cpuArchitecture: CpuArchitecture.ARM64,
        operatingSystemFamily: OperatingSystemFamily.LINUX,
      },
      taskRole: jobTaskRole,
      executionRole: jobTaskExecutionRole,
    });
</code></pre>
<p>After creating the base TaskDefinition, the app container and firelens router will be added as below</p>
<pre><code class="lang-typescript">
    jobDefinition.addContainer(<span class="hljs-string">'Container'</span>, {
      image: ContainerImage.fromAsset(join(process.cwd())),
      logging: LogDrivers.firelens({
        options: {
          Name: <span class="hljs-string">'kinesis_streams'</span>,
          region,
          stream: props.streamName,
        },
      }),
    });

    jobDefinition.addFirelensLogRouter(<span class="hljs-string">'LoggingContainer'</span>, {
      image: ContainerImage.fromAsset(join(process.cwd(), <span class="hljs-string">'fluent-bit'</span>)),
      logging: LogDrivers.awsLogs({
        streamPrefix: <span class="hljs-string">'logging'</span>,
        logGroup: <span class="hljs-keyword">new</span> LogGroup(<span class="hljs-built_in">this</span>, <span class="hljs-string">'FireLensLogGroup'</span>, {
          logGroupName: <span class="hljs-string">`/ecs/<span class="hljs-subst">${props.contextVariables.context}</span>`</span>,
          retention: RetentionDays.ONE_DAY,
          removalPolicy: RemovalPolicy.DESTROY,
        }),
      }),
      environment: { FLB_LOG_LEVEL: <span class="hljs-string">'info'</span> },
      firelensConfig: {
        <span class="hljs-keyword">type</span>: FirelensLogRouterType.FLUENTBIT,
        options: {
          configFileType: FirelensConfigFileType.FILE,
          configFileValue: <span class="hljs-string">'/container.conf'</span>,
        },
      },
    });
</code></pre>
<p>A service shall be created to encapsulate a task consisting of two side by side containers. The is simple and straightforward.</p>
<pre><code class="lang-typescript">   <span class="hljs-keyword">const</span> service = <span class="hljs-keyword">new</span> FargateService(<span class="hljs-built_in">this</span>, <span class="hljs-string">'Service'</span>, {
      cluster,
      capacityProviderStrategies: capacityStrategy,
      desiredCount: <span class="hljs-number">1</span>,
      platformVersion: FargatePlatformVersion.VERSION1_4,
      propagateTags: PropagatedTagSource.TASK_DEFINITION,
      taskDefinition: jobDefinition,
      assignPublicIp: <span class="hljs-literal">true</span>,
      vpcSubnets: { subnets: vpc.publicSubnets },
      securityGroups: [ taskSecurityGroup ],
    });
</code></pre>
<p>For simplicity, the example allow assigning a public ip address to the task and put the service in public subnets, the reason that this is required is the <code>FargatePlatformVersion.VERSION1_4</code> is under managed <code>awsvpc</code> and this is the simplest way to let fargate pull images from ECR. This is not recommended for production cases.</p>
<p>The Task role must have the kinesis <code>PutRecords</code> action permissions. Here the observability-core stack provides a managed policy that can be attached to the role.</p>
<pre><code class="lang-typescript">  <span class="hljs-keyword">const</span> managedPolicyArn = StringParameter.fromStringParameterName(
      <span class="hljs-built_in">this</span>, 
      <span class="hljs-string">'ObservabilityManagedPolicy'</span>, <span class="hljs-string">`/<span class="hljs-subst">${props.contextVariables.stage}</span>/logs-collector-observability-core/telemetry/kinesis/runtime/policy/arn`</span>).stringValue;

  <span class="hljs-keyword">const</span> jobTaskRole = <span class="hljs-keyword">new</span> Role(<span class="hljs-built_in">this</span>, <span class="hljs-string">'JobTaskRole'</span>, {
      assumedBy: <span class="hljs-keyword">new</span> ServicePrincipal(<span class="hljs-string">'ecs-tasks.amazonaws.com'</span>),
      managedPolicies: [
        ManagedPolicy.fromManagedPolicyArn(<span class="hljs-built_in">this</span>, <span class="hljs-string">'TaskRoleManagedPolicy'</span>, managedPolicyArn),
      ],
  });

  <span class="hljs-keyword">const</span> jobTaskExecutionRole = <span class="hljs-keyword">new</span> Role(<span class="hljs-built_in">this</span>, <span class="hljs-string">'JobTaskExecutionRole'</span>, {
      assumedBy: <span class="hljs-keyword">new</span> ServicePrincipal(<span class="hljs-string">'ecs-tasks.amazonaws.com'</span>),
      managedPolicies: [
        ManagedPolicy.fromAwsManagedPolicyName(<span class="hljs-string">'service-role/AmazonECSTaskExecutionRolePolicy'</span>),
      ],
  });
</code></pre>
<p>After deploying attached IP to the created ENI can be used over `http` protocol. The logs will be sent to the Kinesis DS as shown in following screenshot.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1738337109885/4a83c63a-612a-4a9c-8b49-160b87cb66c5.png" alt class="image--center mx-auto" /></p>
<p>The Firelens has the same problem as LWA mentioned before, The log metadata object must be stringyfied. If the JS object is directly logged the same behavior will be provided as Web Adapter.</p>
<h1 id="heading-conclusion">Conclusion</h1>
<p>While Serverless offers a wide range of managed services that scale per needs, It is important to forget the shared responsibility that forces engineering teams to be engaged on their side. As part of software development the use of Processors capacity and Memory is the part under engineering teams ownership. This is not far from traditional software principals but is somehow forgotten by the fascinating nature of managed services.</p>
<p>Using Lambda extensions, multi container, or background processes is the way to apply processing isolation and achieve more trustable software which is running as a foreground process.</p>
<p>The article focuses on log aggregation to represent how decouple the critical processing from non critical ones via isolation and provides some examples to showcase the implementation in different scenarios.</p>
]]></content:encoded></item><item><title><![CDATA[Personal Use of AWS Organizations Using CDK]]></title><description><![CDATA[Over the years of using AWS, I’ve invested a lot of effort into managing costs and maintaining a tidy account. However, I began creating new accounts and closing them once they were no longer needed. This approach has led to several challenges, inclu...]]></description><link>https://blogs.serverlessfolks.com/personal-use-of-aws-organizations-using-cdk</link><guid isPermaLink="true">https://blogs.serverlessfolks.com/personal-use-of-aws-organizations-using-cdk</guid><dc:creator><![CDATA[Omid Eidivandi]]></dc:creator><pubDate>Fri, 29 Nov 2024 01:56:58 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1732845358049/9b788ee0-6a8a-4506-a2d7-c2e034ff347b.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Over the years of using AWS, I’ve invested a lot of effort into managing costs and maintaining a tidy account. However, I began creating new accounts and closing them once they were no longer needed. This approach has led to several challenges, including:</p>
<ul>
<li><p>Complications in billing management</p>
</li>
<li><p>Inability to share credits</p>
</li>
<li><p>Challenges with quota management</p>
</li>
<li><p>Orphaned resources</p>
</li>
<li><p>A $1 charge for each account created</p>
</li>
<li><p>The need to create temporary emails and manage contact information</p>
</li>
</ul>
<h2 id="heading-aws-organization">AWS Organization</h2>
<p>AWS Organizations is a service from AWS designed to streamline the centralized management of accounts. It offers features such as account provisioning, centralized billing, access management, policy management, and enforcement of standards.</p>
<h2 id="heading-organizational-unit">Organizational Unit</h2>
<p>An Organizational Unit (OU) is a grouping of accounts that allows for the distribution of accounts based on specific contexts or needs. For example, you might create one OU for workload accounts (development, testing, production) and another for security or networking purposes.</p>
<p>This separation enables additional automation for the member accounts, such as bootstrapping or deploying infrastructure tailored to specific contexts.</p>
<h2 id="heading-aws-cdk-and-challenges">AWS CDK and challenges</h2>
<p>Onboarding a new organization using CDK initially appeared to be as straightforward as creating any other piece of infrastructure. However, as I began implementation, I encountered several gaps due to missing features in CloudFormation and the Dedicated Service API.</p>
<ul>
<li><p>The AWS Organizations service requires trusted access, which is only available through the Organizations Service API.</p>
</li>
<li><p>Activating SSO with IAM Identity Center can only be done through a manual process in the console.</p>
</li>
<li><p>Creating accounts necessitates unique email addresses, which makes it frustrating to set up multiple Gmail or whatever accounts.</p>
</li>
<li><p>While setting up SES with Route 53 was relatively easy, the documentation was confusing and misleading.</p>
</li>
</ul>
<h2 id="heading-source-code">Source Code</h2>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://github.com/XaaXaaX/aws-cdk-organization-setup">https://github.com/XaaXaaX/aws-cdk-organization-setup</a></div>
<p> </p>
<h2 id="heading-phase-1-configuration">Phase 1: Configuration</h2>
<p>Since setting up an organization involves more details than an application, it’s important to consider the relevant configuration sections. The following ConfigType outlines the structure of the stack configuration.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">export</span> <span class="hljs-keyword">type</span> Config = {
  contextVariables: ContextVariables;
  dns: DNSConfig;
  org: OrgConfg;
  sso:  SSOConfig;
}
</code></pre>
<h3 id="heading-dns-config">DNS Config</h3>
<p>DNSConfig should account for either importing an external HostedZone or creating a new one. By separating the types, we can enhance type safety and maintain better control within the stack.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">type</span> ExternalZone = { isExternal: <span class="hljs-literal">true</span>; hostedZoneId: <span class="hljs-built_in">string</span>; }
<span class="hljs-keyword">type</span> InternalZone = { isExternal: <span class="hljs-literal">false</span>; domainName: <span class="hljs-built_in">string</span>; }
<span class="hljs-keyword">export</span> <span class="hljs-keyword">type</span> DNSConfig = (ExternalZone | InternalZone) &amp; {
  mailExchangeDomainName: <span class="hljs-built_in">string</span>;
};
</code></pre>
<h3 id="heading-org-config">Org Config</h3>
<p>OrgConfig includes configuration attributes related to account creation and bootstrapping, as well as the activation of trusted services and parameter sharing.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">export</span> <span class="hljs-keyword">type</span> OrgConfg = {
  memebers: {
    bootstrap?: <span class="hljs-built_in">boolean</span>;
    accounts: {
      accountName: <span class="hljs-built_in">string</span>;
    }[];
  };
  trustedAWSServices?: <span class="hljs-built_in">string</span>[];
  crossAccountParametersSharing?: <span class="hljs-built_in">boolean</span>;
};
</code></pre>
<h3 id="heading-sso-config">SSO Config</h3>
<p>SSOConfig features two distinct types: Ready and NotReady. This distinction allows for the creation of SSO and identity store groups, along with permission sets, after implementing the Click-Ops solution in the management account.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">export</span> <span class="hljs-keyword">type</span> SSONotReady = { isReadyToDeploy: <span class="hljs-literal">false</span>; }
<span class="hljs-keyword">export</span> <span class="hljs-keyword">type</span> SSOReady = { 
  isReadyToDeploy: <span class="hljs-literal">true</span>; 
  ssoInstanceArn: <span class="hljs-built_in">string</span>;
  identityStoreId: <span class="hljs-built_in">string</span>; 
}
<span class="hljs-keyword">export</span> <span class="hljs-keyword">type</span> SSOConfig = SSONotReady | SSOReady;
</code></pre>
<h2 id="heading-hostedzone-and-emailing">HostedZone and Emailing</h2>
<p>The management account can either own an existing HostedZone or create a new one for a domain name, and also set up the necessary components to receive emails. This is crucial for having unique email addresses for each account while still allowing them to be forwarded to a personal account.</p>
<p>The process for importing or creating the hosted zone is outlined below:</p>
<pre><code class="lang-typescript">    <span class="hljs-keyword">let</span> hostedZone: IHostedZone;
    <span class="hljs-keyword">if</span>(dnsConfig.isExternal)
      hostedZone = HostedZone.fromHostedZoneId(<span class="hljs-built_in">this</span>, <span class="hljs-string">'HostedZone'</span>, dnsConfig.hostedZoneId);
    <span class="hljs-keyword">else</span> {
      hostedZone = <span class="hljs-keyword">new</span> HostedZone(<span class="hljs-built_in">this</span>, <span class="hljs-string">'HostedZone'</span>, { zoneName: dnsConfig.domainName, comment: <span class="hljs-string">'Managed by CDK'</span> });

      <span class="hljs-keyword">new</span> StringParameter(<span class="hljs-built_in">this</span>, <span class="hljs-string">'HostedZoneId'</span>, {
        parameterName: <span class="hljs-string">`/<span class="hljs-subst">${<span class="hljs-built_in">this</span>.ENV}</span>/<span class="hljs-subst">${<span class="hljs-built_in">this</span>.CONTEXT}</span>/<span class="hljs-subst">${dnsConfig.domainName}</span>/hostedzone/id`</span>,
        stringValue: hostedZone.hostedZoneId,
      })
    }
</code></pre>
<p>To enable email reception through the HostedZone, an MXRecord must be added. Note that the mailExchangeDomainName configuration can be either the same as the domain name (e.g., <code>example.com</code>) or a subdomain (e.g., <code>mail.example.com</code>).</p>
<pre><code class="lang-typescript">  <span class="hljs-keyword">new</span> MxRecord(<span class="hljs-built_in">this</span>, <span class="hljs-string">'MXRecord'</span>, {
      zone: hostedZone,
      values: [{
        hostName: <span class="hljs-string">`inbound-smtp.<span class="hljs-subst">${<span class="hljs-built_in">this</span>.REGION}</span>.amazonaws.com`</span>,
        priority: <span class="hljs-number">10</span>,
      }],
      recordName: dnsConfig.mailExchangeDomainName,
      deleteExisting: <span class="hljs-literal">true</span>,
    });
</code></pre>
<p>To receive emails using SES, you need to extend its capabilities through ReceiptRuleSet actions, as SES does not natively provide a mailbox option. To integrate SES, you must create an EmailIdentity, which should be of the Domain type, accomplished by providing the HostedZone.</p>
<pre><code class="lang-typescript">    <span class="hljs-keyword">new</span> EmailIdentity(<span class="hljs-built_in">this</span>, <span class="hljs-string">'Identity'</span>, {
      identity: Identity.publicHostedZone(HOSTED_ZONE),
    });
</code></pre>
<p>The ReceiveRuleSet allows you to establish rules for receiving emails and define actions to be taken. The actions specified in a rule will be executed in sequence. In this example, each incoming email will be stored in an S3 bucket, and a Lambda function will be triggered to process the stored content and forward it to other email servers.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> receiptRuleSet = <span class="hljs-keyword">new</span> ReceiptRuleSet(<span class="hljs-built_in">this</span>, <span class="hljs-string">'MailReceivedRuleSet'</span>, {
        dropSpam: <span class="hljs-literal">false</span>,
        rules: [{
          tlsPolicy: TlsPolicy.REQUIRE,
          scanEnabled: <span class="hljs-literal">false</span>,
          enabled: <span class="hljs-literal">true</span>,
          recipients: [ HOSTED_ZONE.zoneName ],
          actions: [
            <span class="hljs-keyword">new</span> S3({ bucket: deliveryBucket }),
            <span class="hljs-keyword">new</span> Lambda({ <span class="hljs-function"><span class="hljs-keyword">function</span>: <span class="hljs-title">receivedMailLambda</span> }),
          ],
        }],
    })</span>;
</code></pre>
<p>Deploying the stack outlined above will create all the necessary resources for a functional domain name with email reception. However, if you attempt to send an email from your address, you will receive a postmaster response indicating that the email sending has failed.</p>
<blockquote>
<p><a target="_blank" href="http://inbound-smtp.eu-west-1.amazonaws.com"><strong>inbound-smtp.eu-west-1.amazonaws.com</strong></a> <strong>has generated this error :<br />Requested action not taken: mailbox unavailable</strong></p>
</blockquote>
<p>SES permits only one active ruleset at a time, so when you create a new ruleset, it will not be activated automatically.</p>
<p>To activate the ruleset, you can use AWS CDK custom resources with SDK calls, as shown below.</p>
<pre><code class="lang-typescript">
    <span class="hljs-keyword">const</span> rulesetActivationSDKCall: AwsSdkCall = {
        service: <span class="hljs-string">'SES'</span>,
        action: <span class="hljs-string">'setActiveReceiptRuleSet'</span>,
        physicalResourceId: PhysicalResourceId.of(<span class="hljs-string">'SesCustomResource'</span>),
    };

    <span class="hljs-keyword">const</span> setActiveReceiptRuleSetSdkCall: AwsSdkCall = {
      ...rulesetActivationSDKCall,
      parameters: { RuleSetName: receiptRuleSet.receiptRuleSetName }
    };
    <span class="hljs-keyword">const</span> deleteReceiptRuleSetSdkCall: AwsSdkCall = rulesetActivationSDKCall;

    <span class="hljs-keyword">new</span> AwsCustomResource(<span class="hljs-built_in">this</span>, <span class="hljs-string">"setActiveReceiptRuleSetCustomResource"</span>, {
      onCreate: setActiveReceiptRuleSetSdkCall,
      onUpdate: setActiveReceiptRuleSetSdkCall,
      onDelete: deleteReceiptRuleSetSdkCall,
      logRetention: RetentionDays.ONE_WEEK,
      policy: AwsCustomResourcePolicy.fromStatements([
        <span class="hljs-keyword">new</span> PolicyStatement({
          sid: <span class="hljs-string">'SesCustomResourceSetActiveReceiptRuleSet'</span>,
          effect: Effect.ALLOW,
          actions: [
            <span class="hljs-string">'ses:SetActiveReceiptRuleSet'</span>,
            <span class="hljs-string">'ses:DeleteReceiptRuleSet'</span>,
          ],
          resources: [<span class="hljs-string">'*'</span>]
        }),
      ]),
    });
</code></pre>
<p>This solution lets to activate the created ruleset and the ses reception now works as expected.</p>
<p>The provided example only trigger a lambda function that logs an ses event but you can implement your email forwarding if needed.</p>
<h2 id="heading-organization-and-accounts">Organization and Accounts</h2>
<p>The CDK for AWS Organizations only offers L1 constructs, but I find this approach simple and straightforward. In my opinion, adding L2 constructs might be unnecessary over-engineering, so I’m fine with this CDK decision for now.</p>
<p>To create an Organization and an OU, the only parameters required are the FeatureSet of the Organization, which can be either CONSOLIDATED_BILLING or ALL.</p>
<pre><code class="lang-typescript">
    <span class="hljs-keyword">const</span> orga =<span class="hljs-keyword">new</span> CfnOrganization(<span class="hljs-built_in">this</span>, <span class="hljs-string">'Organization'</span>, { featureSet: <span class="hljs-string">'ALL'</span> });

    <span class="hljs-keyword">const</span> orgUnit = <span class="hljs-keyword">new</span> CfnOrganizationalUnit(<span class="hljs-built_in">this</span>, <span class="hljs-string">'OrganitationUnit'</span>, {
      name: <span class="hljs-string">`workloads<span class="hljs-subst">${tempSuffix}</span>`</span>,
      parentId: orga.attrRootId
    });

    orgUnit.addDependency(orga);
</code></pre>
<p>The following snippets illustrate how to create accounts. In this example repository, the account list is provided through configuration, meaning the accounts parameter will be an array of objects in the format <code>{ accountName: string }</code>. The created account IDs will be stored in the parameter store, although this may not be necessary for a personal organization setup.</p>
<pre><code class="lang-typescript">ACCOUNTS.forEach(<span class="hljs-function">(<span class="hljs-params">account: { accountName: <span class="hljs-built_in">string</span> }</span>) =&gt;</span> {
      <span class="hljs-keyword">const</span> awsAccount =<span class="hljs-keyword">new</span> CfnAccount(<span class="hljs-built_in">this</span>, <span class="hljs-string">`<span class="hljs-subst">${account.accountName}</span>Account`</span>, {      
        accountName: <span class="hljs-string">`<span class="hljs-subst">${account.accountName}</span><span class="hljs-subst">${tempSuffix}</span>`</span>,
        email: <span class="hljs-string">`<span class="hljs-subst">${account.accountName}</span><span class="hljs-subst">${tempSuffix}</span>@<span class="hljs-subst">${DOMAIN_NAME}</span>`</span>,
        parentIds: [orgUnit.attrId],
      });

      <span class="hljs-keyword">const</span> param = <span class="hljs-keyword">new</span> StringParameter(<span class="hljs-built_in">this</span>, <span class="hljs-string">`<span class="hljs-subst">${account.accountName}</span>AccountIdParam`</span>, {
        stringValue: awsAccount.attrAccountId,
        description: <span class="hljs-string">`Account ID for <span class="hljs-subst">${awsAccount.accountName}</span>`</span>,
        parameterName: <span class="hljs-string">`/<span class="hljs-subst">${<span class="hljs-built_in">this</span>.ENV}</span>/<span class="hljs-subst">${<span class="hljs-built_in">this</span>.CONTEXT}</span>/<span class="hljs-subst">${awsAccount.accountName}</span>/account/id`</span>,
      })
    });
</code></pre>
<p>To set up SSO using IAM Identity Center, it's essential to enable AWS Organization trusted access. Unfortunately, there’s no way to activate this feature using CDK or CloudFormation, so we will once again rely on a Custom Resource. In this example, all services required for trusted access are specified through configuration (e.g., <code>sso.amazonaws.com</code> and <code>servicequota.amazonaws.com</code>).</p>
<pre><code class="lang-typescript">
    trustedServices.forEach(<span class="hljs-function">(<span class="hljs-params">service: <span class="hljs-built_in">string</span></span>) =&gt;</span> {

      <span class="hljs-keyword">const</span> identifier = service.replace(<span class="hljs-string">'.'</span>, <span class="hljs-string">''</span>);
      <span class="hljs-keyword">const</span> enable: AwsSdkCall = {
        service: <span class="hljs-string">'organizations'</span>,
        action: <span class="hljs-string">'enableAWSServiceAccess'</span>,
        physicalResourceId: PhysicalResourceId.of(<span class="hljs-string">`OrgCustomResource<span class="hljs-subst">${identifier}</span>`</span>),
        parameters: { ServicePrincipal: service },
      };

      <span class="hljs-keyword">const</span> disable: AwsSdkCall = {
        service: <span class="hljs-string">'organizations'</span>,
        action: <span class="hljs-string">'disableAWSServiceAccess'</span>,
        physicalResourceId: PhysicalResourceId.of(<span class="hljs-string">`OrgCustomResource<span class="hljs-subst">${identifier}</span>`</span>),
        parameters: { ServicePrincipal: service },
      };

      <span class="hljs-keyword">new</span> AwsCustomResource(<span class="hljs-built_in">this</span>, <span class="hljs-string">`AWSServiceAccessActivation<span class="hljs-subst">${identifier}</span>CustomResource`</span>, {
        onCreate: enable,
        onUpdate: enable,
        onDelete: disable,
        logRetention: RetentionDays.ONE_WEEK,
        policy: AwsCustomResourcePolicy.fromStatements([
          <span class="hljs-keyword">new</span> PolicyStatement({
            sid: <span class="hljs-string">'OrgCustomResourceSetOrgAWSServiceActivation'</span>,
            effect: Effect.ALLOW,
            actions: [
              <span class="hljs-string">'organizations:enableAWSServiceAccess'</span>,
              <span class="hljs-string">'organizations:disableAWSServiceAccess'</span>,
            ],
            resources: [<span class="hljs-string">'*'</span>]
          }),
        ]),
      });
    })
</code></pre>
<h2 id="heading-sso-setup">SSO Setup</h2>
<p>The provided example configuration uses a NotReady state by setting <code>isReadyToDeploy = false</code>, which prevents the CDK deploy from generating the SSO configuration, as the SSO instance has not yet been created. As mentioned earlier, this cannot be accomplished through automation or API calls; the only available API call for CreateInstance works solely for standalone accounts, not for Organization Management accounts.</p>
<p>Before changing the flag to true, you must go to the AWS Console in the management account, navigate to IAM Identity Center, and click the Enable button. After activation, you’ll need the <code>SsoInstanceArn</code> and <code>IdentityStoreId</code>, which should be set in the stack configuration file along with <code>isReadyToDeploy=true</code>.</p>
<p>Once these steps are completed, the CDK deploy will proceed to deploy the stack along with all associated groups and permission sets.</p>
<pre><code class="lang-typescript">    <span class="hljs-keyword">const</span> group = <span class="hljs-keyword">new</span> CfnGroup(<span class="hljs-built_in">this</span>, id, {
      displayName: <span class="hljs-string">`<span class="hljs-subst">${id}</span>`</span>,
      description: <span class="hljs-string">`<span class="hljs-subst">${id}</span> Group`</span>,
      identityStoreId,
    });

    <span class="hljs-keyword">const</span> permissionSet = <span class="hljs-keyword">new</span> CfnPermissionSet(<span class="hljs-built_in">this</span>, <span class="hljs-string">`<span class="hljs-subst">${id}</span>PermissionSet`</span>, {
      name: <span class="hljs-string">`<span class="hljs-subst">${id}</span>@<span class="hljs-subst">${ENV}</span>`</span>,
      description: <span class="hljs-string">`<span class="hljs-subst">${id}</span>@<span class="hljs-subst">${ENV}</span>`</span>,
      instanceArn: ssoInstanceArn,
      managedPolicies: managedPolicies,
      inlinePolicy: <span class="hljs-literal">undefined</span>,
      sessionDuration: Duration.hours(<span class="hljs-number">12</span>).toIsoString(),
    });

    accounts.forEach(<span class="hljs-function">(<span class="hljs-params">account</span>) =&gt;</span> {
      <span class="hljs-keyword">new</span> CfnAssignment(<span class="hljs-built_in">this</span>, <span class="hljs-string">`<span class="hljs-subst">${id}</span>Assignment`</span>, {
        instanceArn: ssoInstanceArn,
        permissionSetArn: permissionSet.attrPermissionSetArn,
        principalId: group.attrGroupId,
        principalType: <span class="hljs-string">'GROUP'</span>,
        targetId: account,
        targetType: <span class="hljs-string">'AWS_ACCOUNT'</span>,
      });
    })
</code></pre>
<p>In the CDK snippet above, a group is created in the Identity Store, and a permission set is established for the SSO Instance. This group is assigned to each of the accounts created earlier. The code illustrates the Group Construct, which is utilized as shown below.</p>
<pre><code class="lang-typescript">
    <span class="hljs-comment">// Org Accounts</span>
    <span class="hljs-keyword">const</span> developmentAccount = StringParameter.fromStringParameterName(<span class="hljs-built_in">this</span>, <span class="hljs-string">'AccountSecurity'</span>, <span class="hljs-string">`/<span class="hljs-subst">${<span class="hljs-built_in">this</span>.ENV}</span>/<span class="hljs-subst">${<span class="hljs-built_in">this</span>.CONTEXT}</span>/security_b/account/id`</span>).stringValue; 

    <span class="hljs-comment">//Managed Policies</span>
    <span class="hljs-keyword">const</span> adminManagedPolicy = ManagedPolicy.fromAwsManagedPolicyName(<span class="hljs-string">'AdministratorAccess'</span>);
    <span class="hljs-keyword">const</span> poweredUserManagedPolicy = ManagedPolicy.fromAwsManagedPolicyName(<span class="hljs-string">'PowerUserAccess'</span>);
    <span class="hljs-keyword">const</span> readonlyManagedPolicy = ManagedPolicy.fromAwsManagedPolicyName(<span class="hljs-string">'ReadOnlyAccess'</span>);

    <span class="hljs-keyword">new</span> Group(<span class="hljs-built_in">this</span>, <span class="hljs-string">'Admin'</span>, { 
      contextVariables: <span class="hljs-built_in">this</span>.CONTEXT_VARIABLES,
      ssoInstanceArn: SSO_INSTANCE_ARN,
      identityStoreId: IDENTITY_STORE_ID,
      managedPolicies: [ adminManagedPolicy.managedPolicyArn ],
      accounts: [ developmentAccount ]
    });

    <span class="hljs-keyword">new</span> Group(<span class="hljs-built_in">this</span>, <span class="hljs-string">'PowerUser'</span>, { 
      contextVariables: <span class="hljs-built_in">this</span>.CONTEXT_VARIABLES,
      ssoInstanceArn: SSO_INSTANCE_ARN,
      identityStoreId: IDENTITY_STORE_ID,
      managedPolicies: [ poweredUserManagedPolicy.managedPolicyArn ],
      accounts: [ developmentAccount ]
    });

    <span class="hljs-keyword">new</span> Group(<span class="hljs-built_in">this</span>, <span class="hljs-string">'Developer'</span>, { 
      contextVariables: <span class="hljs-built_in">this</span>.CONTEXT_VARIABLES,
      ssoInstanceArn: SSO_INSTANCE_ARN,
      identityStoreId: IDENTITY_STORE_ID,
      managedPolicies: [ readonlyManagedPolicy.managedPolicyArn ],
      accounts: [ developmentAccount ]
    });
</code></pre>
<p>You can now create a user in IAM Identity Center and assign it to one or more groups. Next, navigate to the AWS Access Portal (accessible via the link from the Identity Center dashboard: <code>https://d-123456788.awsapps.com/start</code>), which will prompt you for login credentials.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1730747405458/fc2b7f7f-ff23-40af-a435-bbe03cdcb59c.png" alt class="image--center mx-auto" /></p>
<p>Once logged in, the application page will allow you to select the appropriate group role and access it with the corresponding permissions.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1730747592329/090feea6-645d-4b19-b3ac-95c787c2fbf9.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-account-boostrap">Account Boostrap</h2>
<p>It was an effective way to automate account creation and implement varying levels of security. However, an empty account requires several repetitive tasks before it can be considered usable. The example includes setting up GitHub OIDC and CDK Bootstrap to prepare the member accounts.</p>
<p>The bootstrap can be deactivated through configuration, and this will be verified in the organization stack as shown below.</p>
<pre><code class="lang-typescript">   <span class="hljs-keyword">if</span>( BOOTSTRAP ) {
      <span class="hljs-keyword">new</span> Bootstrap(<span class="hljs-built_in">this</span>, <span class="hljs-string">'Bootstrap'</span>, { 
        contextVariables: props.contextVariables,
        regions: [ <span class="hljs-built_in">this</span>.REGION ],
        organizationUnits: [ orgUnit ],
        types: {
          [BootstrapTypes.CDK]: { FileAssetsBucketKmsKeyId: <span class="hljs-string">'AWS_MANAGED_KEY'</span> },
          [BootstrapTypes.GitHub]: { Owner: gitHubConfig.owner, Repo: <span class="hljs-string">'*'</span> }  
        }, 
      })
    }
</code></pre>
<p>The bootstrap construct creates a set of CloudFormation StackSets to allow the management account to bootstrap the member accounts, which will be triggered when accounts are created or updated. Unfortunately, the CDK does not offer a straightforward way to use CDK stacks for StackSets. After researching online, I found many solutions to be overly complicated, so I opted to stick with the readily available YAML templates found across the web (even though I probably won’t look at or modify them). I'm fine with this approach.</p>
<pre><code class="lang-typescript">    <span class="hljs-keyword">export</span> <span class="hljs-built_in">enum</span> BootstrapTypes {
        GitHub = <span class="hljs-string">'oidc-github.yml'</span>,
        CDK = <span class="hljs-string">'cdk-bootstrap-template.yml'</span>,
     }

    <span class="hljs-keyword">const</span> { contextVariables: { stage: ENV, context: CONTEXT }, types: TYPES } = props;
    <span class="hljs-keyword">const</span> tags =  Stack.of(<span class="hljs-built_in">this</span>).tags.renderTags();

    <span class="hljs-built_in">Object</span>.keys(TYPES).forEach(<span class="hljs-function">(<span class="hljs-params">value: <span class="hljs-built_in">string</span>, index: <span class="hljs-built_in">number</span></span>) =&gt;</span> {
      <span class="hljs-keyword">const</span> typeIdentifier = value.replace(<span class="hljs-string">'.yml'</span>, <span class="hljs-string">''</span>).replace(<span class="hljs-regexp">/[^a-zA-Z]/g</span>, <span class="hljs-string">''</span>);
      <span class="hljs-keyword">const</span> cfnParams = <span class="hljs-built_in">Object</span>.entries(TYPES[value <span class="hljs-keyword">as</span> unknown <span class="hljs-keyword">as</span> BootstrapTypes])
        .map(<span class="hljs-function">(<span class="hljs-params">[key, value]</span>) =&gt;</span> (
          { parameterKey: key, parameterValue: value } <span class="hljs-keyword">as</span> CfnStackSet.ParameterProperty
        )); 

      <span class="hljs-keyword">new</span> CfnStackSet(<span class="hljs-built_in">this</span>, <span class="hljs-string">`BootstrapStackSet<span class="hljs-subst">${typeIdentifier}</span>`</span>, {
        permissionModel: <span class="hljs-string">"SERVICE_MANAGED"</span>,
        stackSetName: <span class="hljs-string">`<span class="hljs-subst">${CONTEXT}</span>-bootstrap-<span class="hljs-subst">${typeIdentifier}</span>-<span class="hljs-subst">${ENV}</span>`</span>,
        description: <span class="hljs-string">`Account bootstrap StackSet <span class="hljs-subst">${typeIdentifier}</span>`</span>,
        autoDeployment: { enabled: <span class="hljs-literal">true</span>, retainStacksOnAccountRemoval: <span class="hljs-literal">false</span> },
        capabilities: [<span class="hljs-string">"CAPABILITY_NAMED_IAM"</span>],
        templateBody: readFileSync(join(process.cwd(), <span class="hljs-string">`/cdk/lib/orga/bootstrap/<span class="hljs-subst">${value}</span>`</span>), <span class="hljs-string">'utf8'</span>),
        parameters: cfnParams,
        tags,
        operationPreferences: { failureToleranceCount: <span class="hljs-number">1</span>, maxConcurrentCount: <span class="hljs-number">1</span> },
        stackInstancesGroup: [{
          regions: props.regions,
          deploymentTargets: {
            organizationalUnitIds: props.organizationUnits.map(<span class="hljs-function">(<span class="hljs-params">ou: { attrId: <span class="hljs-built_in">string</span> }</span>) =&gt;</span> ou.attrId), 
          },
        }],
      });
    });
</code></pre>
<h2 id="heading-conslusion">Conslusion</h2>
<p>For a long time, I had been trying to set up an organization, but since I couldn't find a working piece of code, I decided to dive into it myself. It was an exciting experience, tackling different challenges and solving them along the way. Kudos to AWS CDK for its flexibility!</p>
<p>Having an organization is a great way to experiment and easily tear things down afterward. When closing accounts, you’ll incur charges for 90 days after they’ve been removed from the organization. However, after this 90-day period, the accounts will be permanently deleted. During this time, you can still recover the account, access it with limited permissions, and perform certain actions.</p>
]]></content:encoded></item><item><title><![CDATA[Conventional Use of AWS CDK]]></title><description><![CDATA[Infrastructure as code, a principal rule of agility and reliability, helps deliver configurable software by combining all different pieces into a single asset, such as Software code, Configuration, and infrastructure. However, this approach introduce...]]></description><link>https://blogs.serverlessfolks.com/conventional-use-of-aws-cdk</link><guid isPermaLink="true">https://blogs.serverlessfolks.com/conventional-use-of-aws-cdk</guid><category><![CDATA[aws-cdk]]></category><category><![CDATA[abstractions]]></category><category><![CDATA[compliance ]]></category><category><![CDATA[Governance]]></category><category><![CDATA[enablement]]></category><dc:creator><![CDATA[Omid Eidivandi]]></dc:creator><pubDate>Mon, 04 Nov 2024 00:21:40 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/AqR7JhePqmQ/upload/d17126dddf9fc3a158a5dd99f37a4fd6.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Infrastructure as code, a principal rule of agility and reliability, helps deliver configurable software by combining all different pieces into a single asset, such as Software code, Configuration, and infrastructure. However, this approach introduces some complexities and can become a bottleneck against one of the principal goals of IaC, agility.</p>
<p>AWS CDK, as an abstraction layer, offers more flexibility using code by overcoming the difficulties of structured cloud formation templating. However, this allowance of autonomy and flexibility can become an obstacle in the long term. Often, this becomes frustrating when applying some conventions at the enterprise level, when teams must change all stacks to respect some standard or convention. Also, facing changes in AWS services or deprecations are cases that need some extra effort and modification in all stacks which is theoretically not possible or takes a lot of time.</p>
<h2 id="heading-configuration">Configuration</h2>
<p>When dealing with IaC, configurable software practices must be adopted to help have a central and easy-to-change stack. There are many ways to use configuration such as configuration files, ParameterStore, or CloudFormation Outputs. The choice of configuration depends on the nature of values, lifecycle, and dependencies.</p>
<h3 id="heading-configuration-file">Configuration File</h3>
<p>A configuration file can be ideal for some internal and local configuration elements, the following snippet shows an example of a configuration file in Typescript, including some naming and static config values.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> { ContextVariables, EnvVariable } <span class="hljs-keyword">from</span> <span class="hljs-string">"@type"</span>;

<span class="hljs-keyword">export</span> <span class="hljs-keyword">type</span> Config = {
  contextVariables: ContextVariables;
  org: { accounts: { accountName: <span class="hljs-built_in">string</span>;}[];};
}

<span class="hljs-keyword">const</span> defaultConfig: Config = {
  contextVariables: {
    context: <span class="hljs-string">`my-application`</span>,
    stage: <span class="hljs-string">'dev'</span>, 
    owner: <span class="hljs-string">'operations'</span>,
    usage: <span class="hljs-string">'EPHEMERAL'</span>,
  }
}

<span class="hljs-keyword">const</span> getFinalConfig = (config: Partial&lt;Config&gt;): <span class="hljs-function"><span class="hljs-params">Config</span> =&gt;</span> {
  <span class="hljs-keyword">return</span> {
    ...defaultConfig,
    contextVariables: {
      ...defaultConfig.contextVariables,
      ...config.contextVariables,
    },
    ...config
  }
}

<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> getConfig = (stage: EnvVariable): <span class="hljs-function"><span class="hljs-params">Config</span> =&gt;</span> {
  <span class="hljs-keyword">switch</span> (stage) {
    <span class="hljs-keyword">case</span> <span class="hljs-string">'test'</span>:
      <span class="hljs-keyword">return</span> getFinalConfig({ contextVariables: { 
        ...defaultConfig.contextVariables, 
        stage: <span class="hljs-string">'test'</span>, usage: <span class="hljs-string">'PRODUCTION'</span> } 
      });
    <span class="hljs-keyword">case</span> <span class="hljs-string">'prod'</span>:
      <span class="hljs-keyword">return</span> getFinalConfig({ contextVariables: { 
        ...defaultConfig.contextVariables,
        stage: <span class="hljs-string">'prod'</span>, usage: <span class="hljs-string">'PRODUCTION'</span> 
      }});
    <span class="hljs-keyword">case</span> <span class="hljs-string">'dev'</span>:
      <span class="hljs-keyword">return</span> getFinalConfig({ contextVariables: { 
        ...defaultConfig.contextVariables, 
        stage: <span class="hljs-string">'dev'</span>, usage: <span class="hljs-string">'EPHEMERAL'</span> 
      }});
    <span class="hljs-keyword">case</span> <span class="hljs-string">'sandbox'</span>:
      <span class="hljs-keyword">return</span> getFinalConfig({ contextVariables: {
         ...defaultConfig.contextVariables,
         stage: <span class="hljs-string">'sandbox'</span>, usage: <span class="hljs-string">'POC'</span> 
      }});
    <span class="hljs-keyword">default</span>:
      <span class="hljs-keyword">return</span> getFinalConfig({});
  }
};
</code></pre>
<p>The config can be later easily fetched in CDK app entry as below</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> app = <span class="hljs-keyword">new</span> cdk.App();
<span class="hljs-keyword">const</span> environment = getEnv(app);
<span class="hljs-keyword">const</span> config = getConfig(environment);
</code></pre>
<h3 id="heading-parameter-store">Parameter Store</h3>
<p>Parameter Store is advantageous when using external configurations such as Application Load Balancer Listener Arn, VPC Id, etc. It is a good practice to use parameters for central configurations or unpredictable cross-stack ones.</p>
<p>Parameter Store as a solution can lead to complexities if:</p>
<ul>
<li><p>Lack Of Conventional Naming for parameters.</p>
</li>
<li><p>Consumer stacks spread param fetch in different places.</p>
</li>
<li><p>Excessive use of Parameters</p>
</li>
</ul>
<p>In the following snippet, the parameters fetch is done in the parent stack as passed to the nested stacks, either all nested stacks don’t need all parameters.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">export</span> <span class="hljs-keyword">class</span> AwsGatewayUsingLoadbalancerStack <span class="hljs-keyword">extends</span> cdk.Stack {
  <span class="hljs-keyword">constructor</span>(<span class="hljs-params">scope: Construct, id: <span class="hljs-built_in">string</span>, props?: cdk.StackProps</span>) {
    <span class="hljs-built_in">super</span>(scope, id, props);

    <span class="hljs-keyword">const</span> albSecurityGroupId = StringParameter.fromStringParameterName(<span class="hljs-built_in">this</span>, <span class="hljs-string">'https-listener-securitygroupe-id'</span>, <span class="hljs-string">'/alb/securityGroupId'</span>).stringValue;
    <span class="hljs-keyword">const</span> albHttpsListenerArn = StringParameter.fromStringParameterName(<span class="hljs-built_in">this</span>, <span class="hljs-string">'https-listener-arn'</span>,  <span class="hljs-string">'/alb/httpsListenerArn'</span>).stringValue;

    <span class="hljs-keyword">const</span> securityGroup = SecurityGroup.fromSecurityGroupId(<span class="hljs-built_in">this</span>, <span class="hljs-string">'alb-security-group'</span>, albSecurityGroupId);
    <span class="hljs-keyword">const</span> albListener = ApplicationListener.fromApplicationListenerAttributes(<span class="hljs-built_in">this</span>, <span class="hljs-string">'alb-listener'</span>, {
      listenerArn: albHttpsListenerArn,
      securityGroup: securityGroup,
    });

    <span class="hljs-keyword">const</span> lambda = <span class="hljs-keyword">new</span> NodejsFunction(<span class="hljs-built_in">this</span>, <span class="hljs-string">'example-lambda'</span>, {
      entry: join(process.cwd(), <span class="hljs-string">'/src/example.ts'</span>),
      handler: <span class="hljs-string">'handler'</span>,
      ...LambdaConfiguration
    });

    <span class="hljs-keyword">const</span> target = <span class="hljs-keyword">new</span> ApplicationTargetGroup(<span class="hljs-built_in">this</span>, <span class="hljs-string">'example-target-group'</span>, { targets: [ <span class="hljs-keyword">new</span> LambdaTarget(lambda) ] });

    albListener.addTargetGroups(<span class="hljs-string">'dosomething-target'</span>, {
      priority: <span class="hljs-number">1</span>,
      conditions: [
        ListenerCondition.hostHeaders([<span class="hljs-string">'myservice.example.com'</span>]),
        ListenerCondition.pathPatterns([<span class="hljs-string">'/v1/dosomthing'</span>]),
        ListenerCondition.httpRequestMethods([
          HttpMethod.POST,
          HttpMethod.OPTIONS,
        ]),
      ],
      targetGroups: [target],
    });
  }
}
</code></pre>
<p>The example fetches the parameters early and passes through when you need it. The advantage of this approach is that the parameters are side by side and in a single place, simplifying future changes and evolutions. another approach is the parameter is fetched once and shared, this means if you need multiple times the same parameter, the Api call to parameter store is done in a single place and only once.</p>
<h2 id="heading-validation">Validation</h2>
<p>As part of the Infrastructure as code, companies often apply conventions that help simplify governance, such as Available Stages ( dev, test, staging, prod, sandbox, etc. ) or Tagging ( stage, context, project, application, or domain).</p>
<p>Discovering noncompliant resources is one way of doing so, and this is a correct way of analyzing and finding noncompliant resources, but often, these discovered problems take a long time to fix and cause a lot of frustration.</p>
<p>A better approach is to think as an enabler, making the present and future life easy and trying to simplify the way teams must apply the regulations and conventional aspects.</p>
<p>Using CDK, a validator will be executed soon during Synthesizer execution. The following example validates the allowed stage configurations</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">export</span> <span class="hljs-keyword">class</span> WorkloadEnvValidator <span class="hljs-keyword">implements</span> IValidation {
  <span class="hljs-keyword">constructor</span>(<span class="hljs-params"><span class="hljs-keyword">private</span> <span class="hljs-keyword">readonly</span> variables: ContextVariables</span>) {}

  <span class="hljs-keyword">public</span> validate(): <span class="hljs-built_in">string</span>[] {
    <span class="hljs-keyword">const</span> errors: <span class="hljs-built_in">string</span>[] = [];
    <span class="hljs-keyword">if</span>(!(isEnvValid(<span class="hljs-built_in">this</span>.variables.stage))) {
      errors.push(<span class="hljs-string">`Provided Stage value is not a valid environment. Must be one of: <span class="hljs-subst">${<span class="hljs-built_in">JSON</span>.stringify(AvailableEnvs)}</span>.`</span>);
    }

    <span class="hljs-keyword">if</span>(
      ![<span class="hljs-string">'dev'</span>, <span class="hljs-string">'sandbox'</span>].includes(<span class="hljs-built_in">this</span>.variables.stage) &amp;&amp;
      <span class="hljs-built_in">this</span>.variables.usage !== <span class="hljs-string">'PRODUCTION'</span>
    ){
      errors.push(<span class="hljs-string">`Provided Stage value is not eligible to run ephemeral or experimental stacks.`</span>);
    }

    <span class="hljs-keyword">return</span> errors;
  }
}


<span class="hljs-comment">// IsEnvValid allows Dev, Test, Prod and sandbox</span>
<span class="hljs-comment">//export const isEnvValid = (env: EnvVariable): env is EnvVariable =&gt;</span>
<span class="hljs-comment">//  IsDevelopmentEnv(env) || </span>
<span class="hljs-comment">//  IsTestingEnv(env) ||</span>
<span class="hljs-comment">//  IsProductionEnv(env) ||</span>
<span class="hljs-comment">//  IsSandboxEnv(env);</span>
</code></pre>
<p>Running the synth using an unknown stage name will result in the following messages.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1730675821105/7050071f-9a57-4ee7-bd5a-e629cd35b1dc.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-aspects">Aspects</h2>
<p>We can use aspects to apply conventions and compliance-related tasks in an automated way, such as Tagging resources, Conventional Naming, etc. The following example shows a simple way of applying parameter naming.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">export</span> <span class="hljs-keyword">class</span> ApplyParameterStoreNamingPolicyAspect <span class="hljs-keyword">implements</span> IAspect {
  <span class="hljs-keyword">constructor</span>(<span class="hljs-params"><span class="hljs-keyword">private</span> <span class="hljs-keyword">readonly</span> variables: ContextVariables</span>) { }
  <span class="hljs-keyword">public</span> visit(node: IConstruct): <span class="hljs-built_in">void</span> {
    <span class="hljs-keyword">if</span> (node <span class="hljs-keyword">instanceof</span> CfnParameter) {
      <span class="hljs-keyword">const</span> inspector = <span class="hljs-keyword">new</span> TreeInspector();
      node.inspect(inspector);

      <span class="hljs-keyword">const</span> name = inspector.attributes[<span class="hljs-string">'aws:cdk:cloudformation:props'</span>][<span class="hljs-string">'name'</span>].toString();
      <span class="hljs-keyword">if</span>(name.startsWith(<span class="hljs-string">`/<span class="hljs-subst">${<span class="hljs-built_in">this</span>.variables.stage}</span>/<span class="hljs-subst">${<span class="hljs-built_in">this</span>.variables.context}</span>/`</span>)) <span class="hljs-keyword">return</span>;
        <span class="hljs-keyword">const</span> cleanedName = name
            .replace(<span class="hljs-string">`<span class="hljs-subst">${<span class="hljs-built_in">this</span>.variables.stage}</span>`</span>, <span class="hljs-string">''</span>)
            .replace(<span class="hljs-string">`<span class="hljs-subst">${<span class="hljs-built_in">this</span>.variables.context}</span>`</span>, <span class="hljs-string">''</span>)
            .replace(<span class="hljs-string">'//'</span>, <span class="hljs-string">''</span>);
      node.addPropertyOverride(<span class="hljs-string">'Name'</span>, <span class="hljs-string">`/<span class="hljs-subst">${<span class="hljs-built_in">this</span>.variables.stage}</span>/<span class="hljs-subst">${<span class="hljs-built_in">this</span>.variables.context}</span><span class="hljs-subst">${name}</span>`</span> );
      Annotations.of(node).addWarningV2(<span class="hljs-string">`<span class="hljs-subst">${name}</span>`</span>, <span class="hljs-string">`Parameter Name should start with /<span class="hljs-subst">${<span class="hljs-built_in">this</span>.variables.stage}</span>/<span class="hljs-subst">${<span class="hljs-built_in">this</span>.variables.context}</span>, A managed fix is applied by renaming the parameter name but this can have consequences per usage, please apply the correct naming convention`</span> );
    }
  }
</code></pre>
<p>This sample Aspect looks at parameter resources, and transforms the parameter name, and shows a warning in the terminal.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1730677096175/9d876469-8533-4528-843e-e9aa93339d67.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-centralization">Centralization</h2>
<p>There are cases where providing a framework or extensions can be useful, letting the teams use them when necessary, but applying compliance brings a lot of small pieces that can become error-prone to ask teams to apply explicitly or per need.</p>
<p>The best way of achieving the goal but also keeping the rate of change and effort as minimal as possible will be giving abstractions that help teams to achieve the goal with simplicity, this can be achieved with constructs but again creating constructs and applying them everywhere and in all services is not the best choice.</p>
<p>However, providing an abstraction of type Stack can be simple enough to propagate at the enterprise level. The example below shows how can be achieved.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> { ArnFormat, Aspects, Duration, NestedStack, NestedStackProps, Stack, StackProps, Tag } <span class="hljs-keyword">from</span> <span class="hljs-string">"aws-cdk-lib"</span>;
<span class="hljs-keyword">import</span> { Construct } <span class="hljs-keyword">from</span> <span class="hljs-string">"constructs"</span>;
<span class="hljs-keyword">import</span> { ApplyDestroyPolicyAspect, ApplyParameterStoreNamingPolicyAspect, ApplyTagsAspect } <span class="hljs-keyword">from</span> <span class="hljs-string">"./aspects"</span>;
<span class="hljs-keyword">import</span> { ContextVariablesValidator, WorkloadEnvValidator } <span class="hljs-keyword">from</span> <span class="hljs-string">"./validators"</span>;
<span class="hljs-keyword">import</span> { ContextVariables } <span class="hljs-keyword">from</span> <span class="hljs-string">"../types/Context"</span>;
<span class="hljs-keyword">import</span> { Rule, Schedule } <span class="hljs-keyword">from</span> <span class="hljs-string">"aws-cdk-lib/aws-events"</span>;
<span class="hljs-keyword">import</span> { LambdaFunction } <span class="hljs-keyword">from</span> <span class="hljs-string">"aws-cdk-lib/aws-events-targets"</span>;
<span class="hljs-keyword">import</span> { NodejsFunction } <span class="hljs-keyword">from</span> <span class="hljs-string">"aws-cdk-lib/aws-lambda-nodejs"</span>;
<span class="hljs-keyword">import</span> { Runtime } <span class="hljs-keyword">from</span> <span class="hljs-string">"aws-cdk-lib/aws-lambda"</span>;
<span class="hljs-keyword">import</span> { join } <span class="hljs-keyword">from</span> <span class="hljs-string">"path"</span>;
<span class="hljs-keyword">import</span> { PolicyDocument, PolicyStatement, Role, ServicePrincipal } <span class="hljs-keyword">from</span> <span class="hljs-string">"aws-cdk-lib/aws-iam"</span>;
<span class="hljs-keyword">export</span> <span class="hljs-keyword">type</span> EnforcedStackProps = StackProps &amp; NestedStackProps &amp; {
  contextVariables: ContextVariables;
}

<span class="hljs-keyword">export</span> <span class="hljs-keyword">class</span> EnforcedNestedStack <span class="hljs-keyword">extends</span> NestedStack {
  <span class="hljs-keyword">protected</span> <span class="hljs-keyword">readonly</span> REGION: <span class="hljs-built_in">string</span>;
  <span class="hljs-keyword">protected</span> <span class="hljs-keyword">readonly</span> ACCOUNT_ID: <span class="hljs-built_in">string</span>;
  <span class="hljs-keyword">protected</span> <span class="hljs-keyword">readonly</span> ENV: <span class="hljs-built_in">string</span>;
  <span class="hljs-keyword">protected</span> <span class="hljs-keyword">readonly</span> CONTEXT: <span class="hljs-built_in">string</span>;
  <span class="hljs-keyword">protected</span> <span class="hljs-keyword">readonly</span> CONTEXT_VARIABLES: ContextVariables;

  <span class="hljs-keyword">constructor</span>(<span class="hljs-params">scope: Construct, id: <span class="hljs-built_in">string</span>, props: EnforcedStackProps</span>) {
    <span class="hljs-built_in">super</span>(scope, id, props);

    <span class="hljs-keyword">const</span> { account: ACCOUNT_ID, region: REGION } = Stack.of(<span class="hljs-built_in">this</span>);
    <span class="hljs-built_in">this</span>.REGION = REGION;
    <span class="hljs-built_in">this</span>.ACCOUNT_ID = ACCOUNT_ID;

    <span class="hljs-keyword">const</span> { contextVariables: variables } = props;
    <span class="hljs-built_in">this</span>.ENV = variables.stage;
    <span class="hljs-built_in">this</span>.CONTEXT = variables.context;
    <span class="hljs-built_in">this</span>.CONTEXT_VARIABLES = variables
  }
}
<span class="hljs-keyword">export</span> <span class="hljs-keyword">class</span> EnforcedStack <span class="hljs-keyword">extends</span> Stack { 
  <span class="hljs-keyword">protected</span> <span class="hljs-keyword">readonly</span> REGION: <span class="hljs-built_in">string</span>;
  <span class="hljs-keyword">protected</span> <span class="hljs-keyword">readonly</span> ACCOUNT_ID: <span class="hljs-built_in">string</span>;
  <span class="hljs-keyword">protected</span> <span class="hljs-keyword">readonly</span> ENV: <span class="hljs-built_in">string</span>;
  <span class="hljs-keyword">protected</span> <span class="hljs-keyword">readonly</span> CONTEXT: <span class="hljs-built_in">string</span>;
  <span class="hljs-keyword">protected</span> <span class="hljs-keyword">readonly</span> CONTEXT_VARIABLES: ContextVariables;

  <span class="hljs-keyword">constructor</span>(<span class="hljs-params">scope: Construct, id: <span class="hljs-built_in">string</span>, props: EnforcedStackProps</span>) {
    <span class="hljs-built_in">super</span>(scope, id, props);

    <span class="hljs-keyword">const</span> { account, region } = Stack.of(<span class="hljs-built_in">this</span>);
    <span class="hljs-built_in">this</span>.REGION = REGION;
    <span class="hljs-built_in">this</span>.ACCOUNT_ID = ACCOUNT_ID;

    <span class="hljs-keyword">const</span> { contextVariables } = props;
    <span class="hljs-built_in">this</span>.ENV = contextVariables.stage;
    <span class="hljs-built_in">this</span>.CONTEXT = contextVariables.context;
    <span class="hljs-built_in">this</span>.CONTEXT_VARIABLES = contextVariables

    <span class="hljs-built_in">this</span>.node.addValidation(<span class="hljs-keyword">new</span> ContextVariablesValidator(contextVariables));
    <span class="hljs-built_in">this</span>.node.addValidation(<span class="hljs-keyword">new</span> WorkloadEnvValidator(contextVariables));
    Aspects.of(<span class="hljs-built_in">this</span>).add(<span class="hljs-keyword">new</span> AwsSolutionsChecks());

    <span class="hljs-keyword">if</span>( contextVariables.usage !== <span class="hljs-string">'PRODUCTION'</span> ) 
      Aspects.of(<span class="hljs-built_in">this</span>).add(<span class="hljs-keyword">new</span> ApplyDestroyPolicyAspect());

    <span class="hljs-keyword">if</span>( contextVariables.usage === <span class="hljs-string">'EPHEMERAL'</span> ) {

      <span class="hljs-keyword">const</span> now = <span class="hljs-keyword">new</span> <span class="hljs-built_in">Date</span>(<span class="hljs-keyword">new</span> <span class="hljs-built_in">Date</span>().getTime() + <span class="hljs-number">24</span> * <span class="hljs-number">60</span> * <span class="hljs-number">60</span> * <span class="hljs-number">1000</span>);

      <span class="hljs-keyword">const</span> deleteFunction = <span class="hljs-keyword">new</span> NodejsFunction(<span class="hljs-built_in">this</span>, <span class="hljs-string">'DeleteFunction'</span>, {
        handler: <span class="hljs-string">'index.handler'</span>,
        runtime: Runtime.NODEJS_20_X,
        timeout: Duration.minutes(<span class="hljs-number">15</span>),
        entry: join(process.cwd(), <span class="hljs-string">'core/custom-resources'</span>, <span class="hljs-string">'remove-stack.ts'</span>),
        environment: {
          STACK_NAME: <span class="hljs-built_in">this</span>.stackName,
        },
        role: <span class="hljs-keyword">new</span> Role(<span class="hljs-built_in">this</span>, <span class="hljs-string">'DeleteFunctionRole'</span>, {
          assumedBy: <span class="hljs-keyword">new</span> ServicePrincipal(<span class="hljs-string">'lambda.amazonaws.com'</span>),
          inlinePolicies: {
            <span class="hljs-string">'CloudFormationPolicy'</span>: <span class="hljs-keyword">new</span> PolicyDocument({
              statements: [
                <span class="hljs-keyword">new</span> PolicyStatement({
                  actions: [<span class="hljs-string">'cloudformation:DeleteStack'</span>],
                  resources: [
                    Stack.of(<span class="hljs-built_in">this</span>).formatArn({
                      service: <span class="hljs-string">'cloudformation'</span>,
                      resource: <span class="hljs-string">'stack'</span>,
                      resourceName: <span class="hljs-string">`<span class="hljs-subst">${<span class="hljs-built_in">this</span>.stackName}</span>/*`</span>,
                      arnFormat: ArnFormat.SLASH_RESOURCE_NAME
                    }),
                  ]
                })
              ]
            })
          }
        })
      });

      <span class="hljs-keyword">new</span> Rule(<span class="hljs-built_in">this</span>, <span class="hljs-string">'EphemeralRule'</span>, {
        schedule: Schedule.cron({
          minute: now.getMinutes().toString(),
          hour: now.getHours().toString(),
          day: now.getDate().toString(),
          month: now.getMonth().toString(),
          year:  now.getFullYear().toString(),
        }),
        targets: [ <span class="hljs-keyword">new</span> LambdaFunction(deleteFunction)],
      });
    }

    Aspects.of(<span class="hljs-built_in">this</span>).add(<span class="hljs-keyword">new</span> ApplyTagsAspect({
      context: contextVariables.context,
      stage: contextVariables.stage,
      owner: contextVariables.owner,
      usage: contextVariables.usage,
    }));

    Aspects.of(<span class="hljs-built_in">this</span>).add(<span class="hljs-keyword">new</span> ApplyParameterStoreNamingPolicyAspect(contextVariables));

  }
}
</code></pre>
<p>The example applies all aspects and validations in a central place, also, as a specific detail, it uses one-time event bridge schedules to remove the ephemeral stacks after 24 hours.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Adopting IaC was a great evolution in how companies have been dealing with infrastructure, but as the rapidity of cloud infrastructure creation leads to a lot more amount of resources, this becomes important to enable teams to apply practices and help enterprises without becoming road-blockers or reducing the velocity of development teams.</p>
<p>AWS CDK is not only an object-oriented way of writing IaC but also provides many design patterns under the hood that can be extended to some requirements such as compliance, security, etc.</p>
]]></content:encoded></item><item><title><![CDATA[Lambda Code Execution Freeze/Thaw]]></title><description><![CDATA[AWS Lambda, a serverless computing service, enables code execution on demand while providing an isolated environment for each individual request. This design inherently ensures reliability, as any failure in one request does not affect others.
When w...]]></description><link>https://blogs.serverlessfolks.com/lambda-code-execution-freezethaw</link><guid isPermaLink="true">https://blogs.serverlessfolks.com/lambda-code-execution-freezethaw</guid><category><![CDATA[aws lambda]]></category><dc:creator><![CDATA[Omid Eidivandi]]></dc:creator><pubDate>Fri, 18 Oct 2024 15:59:15 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/O8dDy7BRgBA/upload/98645108e2cf9e65429ab2b9f9b572bf.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>AWS Lambda, a serverless computing service, enables code execution on demand while providing an isolated environment for each individual request. This design inherently ensures reliability, as any failure in one request does not affect others.</p>
<p>When working with Lambda, it's important to explore the intricacies of its execution environment and how it is managed. In this article, I will share my insights and observations, though some aspects may be open to debate.</p>
<h2 id="heading-execution-environment">Execution Environment</h2>
<p>An execution environment is an isolated container (Micro-VM) that is launched on demand to handle incoming requests. Each environment processes one request at a time but can handle subsequent requests once the previous one is complete. If a new request arrives before the ongoing one finishes, an additional execution environment is created to manage it. The diagram below illustrates the lifecycle of an execution environment.</p>
<p>An Execution environment follows 3 phases: Initialization, Invocation, and shutdown</p>
<p><strong>Init phase</strong></p>
<p>The init includes initiating runtime, registering configured extensions, and downloading code packages which happen sequentially in respective order.</p>
<p><strong>Invocation Phase</strong></p>
<p>The Runtime, extensions, and function code will be invoked during the Invocation phase.</p>
<p><strong>Shutdown Phase</strong></p>
<p>The shutdown phase will shut down the runtime and send the shutdown signal to all extensions letting them clean up and finish the remaining work.</p>
<h3 id="heading-reuse-of-environment">Reuse of Environment</h3>
<p>An Execution Environment will be used for the subsequent requests, so the lambda will enter the shutdown phase if an execution environment does not receive any request for a period of time. The duration between the end of the invocation phase and the start of the shutdown phase is the idle duration when the Lambda service allows the reuse of that available environment.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1728249092729/23a7160d-a87c-46ae-bb28-96844b7af722.png" alt class="image--center mx-auto" /></p>
<p>AWS Lambda has a lot of interesting technical details. Lambda will not leave the execution environment running while there is no new demand, but the execution environment gets frozen and will be Thawed if it receives any new demand.</p>
<p>In the case of unintentional interruptions, the lambda will run the initialization as part of the next invocation, but this just seems to be a light init.</p>
<p><img src="https://cdn-images-1.medium.com/max/2400/1*Y3K1xvUWQGlUgtXe5h7Ucw.png" alt /></p>
<h3 id="heading-code-execution">Code Execution</h3>
<p>Let’s first see how the code is executed for any request. When the first request is received, the lambda initiates the environment by running the top-level code. How the init phase behaves depends on the programming language used and how the code is written. Typically, when using NodeJs what happens during the initialization phase is like running the following command</p>
<pre><code class="lang-bash">&gt; node index.js
</code></pre>
<p>The lambda service will then execute the function handler, when the execution is finished, the execution environment gets frozen. Thawing the execution environment i the tricky part. When lambda Freezes and the execution environment all background processes will be frozen and will be executed by thawing back the execution environment. But what about if the execution environment gets no more requests? The execution environment gets shut down and everything will be lost.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1728253614244/86ec6c43-50b4-4a07-a5c8-a24ac521b33f.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-freeze">Freeze</h3>
<p>When the execution environment gets frozen under the hood, the container gets into kinda hibernate state, when all resources get in a sleep state, like when a PC goes into hibernate mode, The address space of all running processes will be registered, allowing reconstruction of same state next.</p>
<h3 id="heading-thaw">Thaw</h3>
<p>The thawing stage is part of runtime invoke when a container is reused as part of the invocation phase; the lambda service invokes the runtime, but this theoretically must be behind the Frozen background process reconstruction.</p>
<blockquote>
<p><em>I could not find a response to what happens when the execution environment wakes up, but this must be when the container gets awake and not the runtime.</em></p>
</blockquote>
<h2 id="heading-try-it-out">Try it out</h2>
<p>The following example will create a lambda-based Api with two endpoints, one for awaited and another for non-awaited tasks to observe how lambda will behave in real time.</p>
<p>Both functions have the same code except the first use the await and the second non.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> { LambdaFunctionURLEvent, LambdaFunctionURLResult } <span class="hljs-keyword">from</span> <span class="hljs-string">"aws-lambda"</span>;

<span class="hljs-keyword">const</span> delay = <span class="hljs-keyword">async</span> (ms: <span class="hljs-built_in">number</span>) =&gt; {
  <span class="hljs-keyword">return</span> <span class="hljs-keyword">new</span> <span class="hljs-built_in">Promise</span>(<span class="hljs-function">(<span class="hljs-params">resolve</span>) =&gt;</span> {
    <span class="hljs-built_in">setTimeout</span>(resolve, ms);
  });
}

<span class="hljs-keyword">const</span> Task = <span class="hljs-keyword">async</span> (req: <span class="hljs-built_in">string</span>, name: <span class="hljs-built_in">string</span>, sleep: <span class="hljs-built_in">number</span>) =&gt; {
  <span class="hljs-keyword">await</span> delay(sleep);
  <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`<span class="hljs-subst">${name}</span> : `</span>, req );
  <span class="hljs-keyword">return</span> { name: <span class="hljs-string">`<span class="hljs-subst">${name}</span>`</span> };
}
<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> handler = <span class="hljs-keyword">async</span> (_event: LambdaFunctionURLEvent): <span class="hljs-built_in">Promise</span>&lt;LambdaFunctionURLResult&gt; =&gt; {

  <span class="hljs-keyword">const</span> resultA = <span class="hljs-keyword">await</span> Task(_event.requestContext.requestId, <span class="hljs-string">"TaskA"</span>, <span class="hljs-number">1000</span>);
  <span class="hljs-keyword">const</span> resultB = <span class="hljs-keyword">await</span> Task(_event.requestContext.requestId, <span class="hljs-string">"TaskB"</span>, <span class="hljs-number">2000</span>);
  <span class="hljs-keyword">const</span> resultC = <span class="hljs-keyword">await</span> Task(_event.requestContext.requestId, <span class="hljs-string">"TaskC"</span>, <span class="hljs-number">3000</span>);

  <span class="hljs-keyword">const</span> result = {
    resultA,
    resultB,
    resultC,
  };
  <span class="hljs-keyword">return</span> {
    statusCode: <span class="hljs-number">200</span>,
    body: <span class="hljs-built_in">JSON</span>.stringify(result, <span class="hljs-literal">null</span>, <span class="hljs-number">2</span>),
    headers: {
      <span class="hljs-string">"Content-Type"</span>: <span class="hljs-string">"application/json"</span>,
    },
  };
}
</code></pre>
<p>Running the awaited function will give a response time of 6.XX seconds accumulating 1000, 2000, and 3000 milliseconds.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1728256687523/6167772d-f761-4373-9075-34d3ac25d44e.png" alt class="image--center mx-auto" /></p>
<p>The non-awaited function has the same code but calls TaskB and TaskC without waiting.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> resultA = <span class="hljs-keyword">await</span> Task(_event.requestContext.requestId, <span class="hljs-string">"TaskA"</span>, <span class="hljs-number">1000</span>);
<span class="hljs-keyword">const</span> resultB = Task(_event.requestContext.requestId, <span class="hljs-string">"TaskB"</span>, <span class="hljs-number">2000</span>);
<span class="hljs-keyword">const</span> resultC = Task(_event.requestContext.requestId, <span class="hljs-string">"TaskC"</span>, <span class="hljs-number">3000</span>);
</code></pre>
<p>Running the first request gives the following logs, showing that only TaskA is terminated and the log is present.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1728257588182/50fcc355-51fc-4677-ac24-5e58a47df2a4.png" alt class="image--center mx-auto" /></p>
<p>Running a second request results in the logs as below, the previous execution remaining tasks are executed as part of subsequent invocations.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1728257631655/a0d581b2-17a4-4c15-9c1a-60d3e4975a13.png" alt class="image--center mx-auto" /></p>
<p>But the interesting part is how long they took to log. The TaskB and TaskC are executed at the same time and ended instantly. The following image shows the response time as 1105 ms, which is normal for new invocation TaskA.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1728257961474/e90634c6-7008-43eb-bbc1-9eb1dc47ff3b.png" alt class="image--center mx-auto" /></p>
<p>Looking at the Non-Awaited code previously, the TaskB and TaskC must take around 5000 ms together that is not the case. Looking at Billing Duration the Billed duration corresponds TaskA execution.</p>
<h2 id="heading-real-scenario">Real scenario</h2>
<p>To trust better the hypothesis ( i did not ) and validate that there will be no tricky side effect, we gonna send a message to an SQS queue just to prove the idea behind the observations is real.</p>
<p>By running the first request there will be nothing fancy except a new message in the queue for TaskA.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1728260004420/8b155aa5-6ad8-4d5e-b610-091f9505b541.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1728259940785/166311d9-8d06-4596-9ed0-dc43c1dda9a7.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1728259986044/b230ea1e-ed5b-4b8b-ad9b-a482ef864cd3.png" alt class="image--center mx-auto" /></p>
<p>Now, let’s run another request and see what happens. Here, TaskB and TaskC just got included in the execution, and the new messages are present in the queue. the top 1 in the following screenshot is the first invocation message, and the three others are TaskB/TaskC of the first invocation + New Invocation TaskA.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1728260177483/34062cce-95bc-4d37-a662-544d49bdd44b.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1728260323017/332ceb45-8b81-4b36-aafc-a696cb3e2588.png" alt class="image--center mx-auto" /></p>
<p>In the above examples, the logs ingested were the logs after treatment. I was curious to see if I could observe how those function calls happen so i kept it simple by adding logs for the start and end of each treatment.</p>
<p>Here how Task method looks like</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> Task = <span class="hljs-keyword">async</span> (req: <span class="hljs-built_in">string</span>, name: <span class="hljs-built_in">string</span>, sleep: <span class="hljs-built_in">number</span>) =&gt; {
  <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`Starting <span class="hljs-subst">${name}</span> : `</span>, req );
  <span class="hljs-keyword">await</span> delay(sleep);
  <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`Waking up <span class="hljs-subst">${name}</span> : `</span>, req );
  <span class="hljs-keyword">await</span> client.send(<span class="hljs-keyword">new</span> SendMessageCommand({
    QueueUrl: process.env.QUEUE_URL,
    MessageBody: <span class="hljs-built_in">JSON</span>.stringify({ req, name }),
  }));
  <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`End <span class="hljs-subst">${name}</span> : `</span>, req );
  <span class="hljs-keyword">return</span> { name: <span class="hljs-string">`<span class="hljs-subst">${name}</span>`</span> };
}
</code></pre>
<p>The Task method has three levels of Logs indicating the Starting, WakingUp, and End. According to the following logs, the first represents an execution that the frozen tasks are executed before the current execution, and the second one, TaskB woke up at the same time as the current invocation start ( CloudWatch log ordering is defaulted by time ). based on these observations frozen tasks are not executed at the function handler invoke phase but at runtime invoke apparently.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1728379585821/9c3e1a6f-cff3-4ae8-a264-c5739094e55d.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1728296214216/b93e9bcb-a4f6-4d36-9888-626ff7abc474.png" alt class="image--center mx-auto" /></p>
<p>These observations can prove that the frozen processes will be run as soon as the execution environment is thawed, and this seems like a kind of decoupling ( this is only a hypothesis ) to prove the coupling or decoupling running some failures for TaskB and TaskC will illustrate better the internal state.</p>
<h2 id="heading-simulating-failures">Simulating Failures</h2>
<p>The Task Method will be changed to cover failing explicitly per Task name as following code snippet.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> Task = <span class="hljs-keyword">async</span> (
    req: <span class="hljs-built_in">string</span>,
    name: <span class="hljs-built_in">string</span>,
    sleep: <span class="hljs-built_in">number</span>,
    extendedProcess?: <span class="hljs-built_in">Function</span>
) =&gt; {
  <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`Starting <span class="hljs-subst">${name}</span> : `</span>, req );
  <span class="hljs-keyword">await</span> delay(sleep);
  <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`Waking up <span class="hljs-subst">${name}</span> : `</span>, req );
  <span class="hljs-keyword">if</span>( extendedProcess ){ extendedProcess(); }
  <span class="hljs-keyword">await</span> client.send(<span class="hljs-keyword">new</span> SendMessageCommand({
    QueueUrl: process.env.QUEUE_URL,
    MessageBody: <span class="hljs-built_in">JSON</span>.stringify({ req, name }),
  }));
  <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`End <span class="hljs-subst">${name}</span> : `</span>, req );
  <span class="hljs-keyword">return</span> { name: <span class="hljs-string">`<span class="hljs-subst">${name}</span>`</span> };
}
</code></pre>
<p>The <strong><em>extendedProcess</em></strong> is passed in TaskB initialization call as below</p>
<pre><code class="lang-typescript">
<span class="hljs-keyword">const</span> extendedProcess = <span class="hljs-function">(<span class="hljs-params">name: <span class="hljs-built_in">string</span>, req: <span class="hljs-built_in">string</span></span>) =&gt;</span> {
  <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`Failing <span class="hljs-subst">${name}</span> : `</span>, req );
  <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> <span class="hljs-built_in">Error</span>(<span class="hljs-string">'TaskB Failed'</span>);
}

<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> handler = <span class="hljs-keyword">async</span> (_event: LambdaFunctionURLEvent): <span class="hljs-built_in">Promise</span>&lt;LambdaFunctionURLResult&gt; =&gt; {
  ...
  <span class="hljs-keyword">const</span> resultB = Task(
     _event.requestContext.requestId,
     <span class="hljs-string">"TaskB"</span>,
     <span class="hljs-number">2000</span>,
     extendedProcess(<span class="hljs-string">"TaskB"</span>, _event.requestContext.requestId));
  ...
}
</code></pre>
<p>The first request will behave as before, but during the second execution, the previously frozen TaskB will result in the current execution interruption by throwing an UnresolvedPromiseRejection error. This proves that the Freeze/Thaw of incompleted tasks can be dangerous and breaks the AWS Lambda design principle of having isolated event-based processing. this approves a level of coupling that can become dangerous without careful implementation.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1728297362913/76bbd901-28fe-41ab-9c35-a93535baacef.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1729249389180/4037234c-17a0-4f85-bdb3-55decbaf8e19.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-tracking-state">Tracking state</h2>
<p>This is just a game play not a way to implement production ready solutions</p>
<p>This experimentation pushed me to think about how the lambda can behave like a stateful container and act based on the state. To experiment with state tracking, a use case can be a task that will provide some state in the execution environment, which can be done using a variable outside the handler, but this time, I want to try not to keep only the state but defer the actions and see if possible.</p>
<p>How the idea behaves can be demonstrated by the following sequence diagram</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1729251872332/e7b9779d-4d28-4c13-9e14-29f935c1364a.png" alt class="image--center mx-auto" /></p>
<p>The idea is that the un-waited task will check the execution environment state and and modify it. What was achived in this test :</p>
<ul>
<li><p>Accumulting State in a Dictionary outside handler</p>
</li>
<li><p>Pushing accumulated items to an SQS when count reaches 10 item</p>
</li>
</ul>
<p>But what about undesired situation , during the time i played i had some sort of timeout occasionally, while deep diving i discovered the dictionary state got empty after a timeout as per following screenshot from cloudwatch logs.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1729264510452/15e93d74-5d08-40f0-aef5-5a3e58e8c6bf.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>While this is cool to try and fail, this article just was a game around how lambda execution environment behaves, and the behavior seems like an hibernate , when again awaked, the process will come back and resuming, did you ever tried a hibernate while coping a huge folder from one drive to other ? this is the same behavior.</p>
<p>The final note is, a controlled and imperative programming model is many times more efficient that behavioral programming, actually for this case a better approach is using the lambda layer to push the logs but again this was a fun and i thought it is worth to share with the community.</p>
<p>Enjoy reading</p>
]]></content:encoded></item><item><title><![CDATA[You Are Not Saved By IaC]]></title><description><![CDATA[Technology exists to simplify human challenges, and as tech professionals, we must also leverage it to solve our own problems. One common area we deal with daily is Infrastructure as Code (IaC), raising frequent questions such as which tool is better...]]></description><link>https://blogs.serverlessfolks.com/you-are-not-saved-by-iac</link><guid isPermaLink="true">https://blogs.serverlessfolks.com/you-are-not-saved-by-iac</guid><category><![CDATA[Infrastructure as code]]></category><category><![CDATA[#IaC]]></category><category><![CDATA[dependencies]]></category><category><![CDATA[AWS]]></category><dc:creator><![CDATA[Omid Eidivandi]]></dc:creator><pubDate>Sat, 28 Sep 2024 12:01:36 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/m_HRfLhgABo/upload/467a2001e895c3643dfe281231a5c759.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Technology exists to simplify human challenges, and as tech professionals, we must also leverage it to solve our own problems. One common area we deal with daily is Infrastructure as Code (IaC), raising frequent questions such as which tool is better—AWS CDK, CloudFormation, Serverless Framework, or Terraform?</p>
<p>However, we often overlook the foundational principles of IaC, like recovery, fast deployment, resiliency, and minimizing time-to-market (TTM). For instance, if you decide to implement a multi-regional failover, can it be deployed effortlessly? How quickly can you recover if your production environment, region, or accounts go down?</p>
<h3 id="heading-common-roadblocks">Common Roadblocks</h3>
<p>To navigate the complexities of IaC, vigilance and discipline are essential. Let’s explore some common roadblocks and how to address them effectively:</p>
<ul>
<li><p><strong>Historical manual interventions</strong></p>
</li>
<li><p><strong>Lack of configurable and parameterized code</strong></p>
</li>
<li><p><strong>Lost secrets that cannot be restored</strong></p>
</li>
<li><p><strong>Hard dependencies between stacks</strong></p>
</li>
<li><p><strong>Circular dependencies</strong></p>
</li>
</ul>
<h3 id="heading-human-actions">Human Actions</h3>
<p>Human intervention is a regular part of our jobs—quick fixes to temporary issues often lead to future automation. Unfortunately, these "notes for later" sometimes get forgotten, turning into major pain points down the road. Identifying these recurring manual actions is key to saving time, effort, and frustration in the future.</p>
<p><strong>Recommendations:</strong></p>
<ul>
<li><p>Use tags to identify automated resources.</p>
</li>
<li><p>Regularly explore untagged resources to detect those not yet automated.</p>
</li>
<li><p>Review generated IaC templates to find missing tags.</p>
</li>
<li><p>Foster a tech-driven culture within your team.</p>
</li>
</ul>
<h3 id="heading-configuration-shortcomings">Configuration Shortcomings</h3>
<p>One frequent issue in IaC is hardcoding variables in Stacks or Nested Stacks, which can complicate configuration management. Whether it's a queue name, topic name, or HTTP endpoint, manually searching through different IaC tools like CloudFormation or AWS CDK can slow you down. Centralizing all dependencies simplifies future changes—like shifting from "<a target="_blank" href="http://me.mycompany.com">me.mycompany.com</a>" to "<a target="_blank" href="http://me.mycompany.org">me.mycompany.org</a>"—by allowing you to quickly locate and update configurations.</p>
<h3 id="heading-losing-secrets">Losing Secrets</h3>
<p>Managing secrets securely is crucial. While storing sensitive information like API keys or credentials in a secret manager is helpful, what happens if your account gets lost? What about your partner’s lost secrets? What about the secrets you shared with your partners ? The solution is to maintain a backup of all necessary secrets outside the software environment, ideally in a dedicated vault—this is more of an organizational best practice.</p>
<h3 id="heading-managing-dependencies">Managing Dependencies</h3>
<p>Dependencies in IaC can be categorized into three levels:</p>
<ol>
<li><p><strong>Light Dependencies:</strong> Passed to the environment variables (e.g., Lambda), these won’t break your deployment but could affect testing and runtime.</p>
</li>
<li><p><strong>Soft Dependencies:</strong> Tied to infrastructure services but manageable—like subscribing to an SNS topic, though permission issues may arise from unautomated historical actions.</p>
</li>
<li><p><strong>Hard Dependencies:</strong> These will prevent deployment if not properly handled. For example, an EventBridge rule may require an EventBus that isn’t yet deployed. The key here is identifying priority stacks and documenting these relationships, often using dependency graphs or architecture diagrams.</p>
</li>
</ol>
<h3 id="heading-circular-dependencies">Circular Dependencies</h3>
<p>Over time, as requirements evolve, stacks can develop circular dependencies. Imagine planning a production release only to find that it fails due to a circular dependency between two stacks. For instance, Stack A may require a CloudFront distribution that needs an upstream domain name for CORS, but the record set is managed in another stack—leading to a deadlock.</p>
<p>To avoid such issues, actively manage and mitigate circular dependencies. Divide stacks if needed or apply predictable naming conventions. For example, using "<a target="_blank" href="http://products.mycompany.com">products.mycompany.com</a>" instead of introducing direct dependencies between stacks can eliminate such problems.</p>
<hr />
<p>By proactively addressing these common challenges, we can build more resilient, efficient, and scalable infrastructure, reducing downtime and increasing the speed of recovery when issues arise.</p>
]]></content:encoded></item><item><title><![CDATA[Enterprise Level Micro-frontend on AWS]]></title><description><![CDATA[Micro-frontends, a center of interest for some years now, have evolved rapidly and have become a great choice for achieving independent, autonomous, and well-defined teams and services that improve the velocity and agility of applications.
This artic...]]></description><link>https://blogs.serverlessfolks.com/enterprise-level-micro-frontend-on-aws</link><guid isPermaLink="true">https://blogs.serverlessfolks.com/enterprise-level-micro-frontend-on-aws</guid><category><![CDATA[microfrontends]]></category><category><![CDATA[distributed system]]></category><category><![CDATA[Server side rendering]]></category><category><![CDATA[boundedcontext]]></category><dc:creator><![CDATA[Omid Eidivandi]]></dc:creator><pubDate>Wed, 28 Aug 2024 23:48:16 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1724888733598/6d911527-1ff1-48b7-b9d9-ee631865c06c.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Micro-frontends, a center of interest for some years now, have evolved rapidly and have become a great choice for achieving independent, autonomous, and well-defined teams and services that improve the velocity and agility of applications.</p>
<p>This article delves into different concepts and details regarding adopting micro-frontends, giving more clarity and vision about tradeoffs and the under-the-hood parts. To learn more about the impacts and goals of a MicroFrontend design, I highly recommend the Building Micro-Frontends book by Luca Mezzalira.</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://www.oreilly.com/library/view/building-micro-frontends/9781492082989/">https://www.oreilly.com/library/view/building-micro-frontends/9781492082989/</a></div>
<p> </p>
<p>The conception at the root of the proposal applies to achieve the above ideas, but when it comes to design and implementation, this leads to a lot of discussions and details to take into account and consider. choosing the right approach depends on the tradeoffs and the priorities that business follows.</p>
<p>Micro-frontends are not far from microservice design approach, as they apply the same goals fundamentally, but at the same time brings exact or partial complexities and challenges the microservices introduce.</p>
<h2 id="heading-the-primer">The Primer</h2>
<p>At high level a micro-frontend architecture leads to answer achieving independent and autonomous collaboration to satisfy a final result asset within a distributed system.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1719739061388/e6b97b80-d1c5-4c7e-ab4a-bba1775eb92f.png" alt class="image--center mx-auto" /></p>
<p>In the above schema, the 3 services are deployed, maintained, and scale independently and will be composed together as a final result asset.</p>
<p>This is a simple, small and straight forward example but at higher scale we need to think about:</p>
<ul>
<li><p>Templating</p>
</li>
<li><p>Discovery</p>
</li>
<li><p>Routing</p>
</li>
<li><p>Composition</p>
</li>
</ul>
<h2 id="heading-boundary-definition-challenges">Boundary Definition Challenges</h2>
<p>One principal challenge when designing Micro-Frontends is how evaluate the boundaries of each MFE. Often having too many small distributed micro frontends seems to bring more flexibility and reusability opportunities, so a MFE can be composed in as many places as the client app needs. While having MFEs with small defined contexts bring the advantage of having self managed and autonomous MFEs but at the same time, adds more complexity at Discovery &amp; Routing part, also adds the communication chattiness into MFEs by calling the downstream services multiple times while the result asset will be in the same context. In the other side, having too big MFEs can improve some challenges small MFEs brings but gives less flexibility and opportunity of reusability, and will introduce the rendering performance issues, complexities and performance degradation at client side.</p>
<p>Having MFEs with a logical size based on the context is a key to success, at the same time it brings more logic at MFE side but avoid adding complexity at other parts of the system.</p>
<p>Some factors to consider while defining a micro-frontend context</p>
<ul>
<li><p>Performance</p>
</li>
<li><p>Lifecycle</p>
</li>
<li><p>Co-existence</p>
</li>
<li><p>Reusability</p>
</li>
<li><p>Ownership</p>
</li>
<li><p>Clarity &amp; Vision</p>
</li>
<li><p>Team Topology possibilities</p>
</li>
</ul>
<h3 id="heading-horizontal-vs-vertical-splitting">Horizontal vs Vertical Splitting</h3>
<p>The splitting refers to the fact of composing the MFEs on the horizontal or vertical axis.</p>
<p>In horizontal splitting the composition introduces a coupling like having a web app page dedicated to a micro-frontend in this case the components inside that given page have less opportunity to be reused in other contexts and pages. an example will be a widget that renders the active products for a given customer, while it can be shown in different pages like customer detail page or in the own customers admin page. Horizontal splitting leads to duplicated components, this seems ok but can be tricky when the number of reusable components raises or the standardisation and governance becomes a concern like applying design systems, complex business logic, etc.</p>
<p>In a vertical splitting the chance of reusability increases and a component has the chance of being reused in any other contexts and result assets. but it adds some layer of duplicated communication and data fetching but also at some points introduces useless decomposition.</p>
<p>When it comes to splitting a better approach is to pragmatically define the right boundary but also composition layer. often teams stands for applying MFE composition at a single layer being Web Server, Edge or shell. but the better approach seems to define right decomposition at server side while giving flexibility to future reuse but also right composition in shell ( kernel ).</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1723500947848/f474a9aa-7024-42d4-a597-4d5d4874e3e0.png" alt class="image--center mx-auto" /></p>
<p>In the above diagram having product catalog and insights at the same layer ( server side ) helps to reduce the unnecessary downstream service calls but adds some challenges such as cache configuration. So putting two Components in the same MFE must be considered when those components follow the same lifecycle.</p>
<p>This is important to think about different decision axis to adopt the right approach based on requirements and tradeoffs.</p>
<h2 id="heading-rendering">Rendering</h2>
<p>The choice of rendering approach depends on functional requirements and technical possibilities. My journey began with front-end development using VBScript, where code ran on the client side. However, limitations such as database connectivity and security concerns led to a shift towards server-side development with <a target="_blank" href="http://ASP.NET">ASP.NET</a>, where interactions and rendering were handled on the server. As JavaScript technologies advanced, particularly with the introduction of Ajax, the drawbacks of server-side latency became apparent. This prompted a shift back to client-side rendering, allowing for partial or full application rendering on the client side, with DOM manipulation and updates occurring after API responses were received. This approach was well-suited to the era of limited server resources and networking constraints.</p>
<p>However, as server capabilities improved, particularly with the advent of cloud computing and rapid scaling, it became advantageous to offload processing back to the server. Today, server-side rendering (SSR) is a crucial front-end design decision, thanks to its ability to respond in milliseconds and scale efficiently. However, it's not always the optimal choice and should be considered within the context of the overall architecture.</p>
<p>Rendering choices are critical and involve several factors, including:</p>
<ul>
<li><p>User Experience</p>
</li>
<li><p>Performance</p>
</li>
<li><p>Cost</p>
</li>
<li><p>Search Engine Optimization (SEO)</p>
</li>
</ul>
<p>While other considerations like Separation of Concerns (SoC) and polyglot ecosystems are important, the ones listed above are the primary focus here. In the context of Micro-Frontends, various rendering strategies can be employed, such as Client-Side Rendering (CSR), Edge-Side Rendering (ESR), Server-Side Rendering (SSR), and Server-Side Generation (SSG), which is a variation of SSR.</p>
<h3 id="heading-client-side-rendering">Client Side Rendering</h3>
<p>CSR applies rendering of HTML content in the browser using javascript dynamically, But with CSR a preliminary content is received from the server containing the javascript and CSS, further the execution of those javascript in client side will lead to content rendering by calling the required api or backend services. Before the script execution, the content will be appeared and will be interactive.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1719785447322/969e064c-d59d-492b-ac61-54ecc205344a.png" alt class="image--center mx-auto" /></p>
<p>To see more how the CSR and SSR behave this video by <strong>Scott Hanselman</strong> is a nice one.</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://www.youtube.com/watch?v=7GRKUaQ8Spk">https://www.youtube.com/watch?v=7GRKUaQ8Spk</a></div>
<p> </p>
<h3 id="heading-server-side-rendering">Server Side Rendering</h3>
<p>SSR is all about rendering the HTML content at server side before returning to the client (browser), The server request can be done at the page load or behind any action with the ability to trigger a request. The server response contains the content but also all necessary scripts that lead to have interactive content on the client side.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1719785706147/b4e68a39-8938-4f66-90a8-8da4e5b03ec1.png" alt class="image--center mx-auto" /></p>
<p>SSR can be applied in a variety of ways but listening to experts like <strong>Luca Mezzalira</strong> helps to capture enough and this video is a rich and detailed one</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://www.youtube.com/watch?v=QD2BvPfNc6c">https://www.youtube.com/watch?v=QD2BvPfNc6c</a></div>
<p> </p>
<h3 id="heading-server-side-generation">Server Side Generation</h3>
<p>SSG is the technique of generating HTML assets in server side and let the client use the pre rendered assets, This is really close to the fact of using static assets but in SSG the asset content is HTML. The SSG is great if the number of required assets are limited but this can be tricky when dealing with millions of items. when using SSG it s important to consider how long and till when that assets is really required and estimate the resulting costs.</p>
<h3 id="heading-edge-side-rendering">Edge Side Rendering</h3>
<p>ESR helps to tackle some SSR challenges such as latency and performance, The ESR helps to respond to client request at Edge and benefit from the global Point of presence close to the user, aside of latency improvements by removing the full load on origin, ESR also helps to remove the load of processing on client side being a browser, mobile app or etc.. to achieve a more device friendly approach.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1719785557282/ee8e10b8-ee74-4313-82f1-4754b1dac812.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-orchestration">Orchestration</h2>
<p>The Micro-Frontend orchestration is a pattern that is used to apply composition of different micro frontends and makes a final result that represents a page or a representable and meaningful part of a UI. The orchestration can be done in different levels being Client side, Server side or Edge side. and also an intelligent mix of them can be applied.</p>
<p>This article fundamentally focuses on Server Side Rendering and demonstrates some challenges the teams face while designing SSR.</p>
<h2 id="heading-kernel-shell">Kernel / Shell</h2>
<p>As discussed earlier, while Micro Frontends (MFE) offer significant benefits to the overall system, they can also introduce cross-domain challenges and coordination issues among teams as they work to integrate distributed components into a cohesive solution. To address these complexities, a shell (or kernel) adds a layer of abstraction between the client and the distributed services, streamlining communication and simplifying the composition process, ultimately leading to a more flexible design.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1719740636863/ce32f22c-a521-45b8-b768-5fc8f1a11a0b.png" alt class="image--center mx-auto" /></p>
<p>Implementing a shell offers the advantage of simplifying the previously mentioned requirements. It facilitates easier discovery of services and their versions, resolves template details, routes requests to the appropriate service based on the resolved template, and ultimately handles the composition.</p>
<p>A shell (or kernel) is particularly valuable when managing various types of rendering—whether client-side, edge-side, server-side, or a combination of these approaches.</p>
<p>A shell can respond to the following requirements if applied:</p>
<ul>
<li><p>Service Discovery</p>
</li>
<li><p>Template Discovery and rendering</p>
</li>
<li><p>Service deployment strategy ( Canary, Rolling, All-At-Once )</p>
</li>
<li><p>Error handling and fallbacks</p>
</li>
<li><p>Instrumenting for Observability</p>
</li>
</ul>
<h3 id="heading-templating">Templating</h3>
<p>A template serves as a predefined framework that outlines the structure of the result set required by a client application, whether it's a browser or a mobile app. Consequently, the templating module must identify the appropriate template based on the request information received from the client. For instance, a template represents a page that includes various components such as scripts, meta tags, CSS references, and more.</p>
<p>At the first glance maybe templating is not really interesting approach but it helps to resolve a lot of frontend cross cutting concerns like shared scripts, design systems , and etc, without changing and deploying all MFEs or client app.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1719741957986/7def62ea-4ea7-40f3-b133-90192ff2c921.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-discovery-amp-routing">Discovery &amp; Routing</h3>
<p>Discovery is the process of locating the appropriate services based on identifiers, versions, and other factors. It provides a more granular approach to integrating services and the fundamental aspects of a distributed design with reduced complexity.</p>
<p>A high-level implementation of discovery would look like this:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1719741426668/4fc29c22-cca1-4780-9246-28165ddbf02c.png" alt class="image--center mx-auto" /></p>
<p>The routing module is responsible for directing external and public service requests to the appropriate internal or private distributed services. An effective routing module takes into account the deployment strategy and the type of composition, whether server-side, client-side, or edge-side.</p>
<p>The diagram illustrates the routing process when server-side composition is used.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1719779322282/7e9cf89d-ba73-401d-99c0-796b7162aae7.png" alt class="image--center mx-auto" /></p>
<p>Routing can be handled either by the Web Server or the Shell. When the Shell manages routing, it performs lightweight mapping based on the incoming request and the associated micro-frontend. The Shell directs traffic to the appropriate service and conducts basic contextual and generic checks. More detailed routing, which requires domain-specific knowledge, is managed within the micro-frontend itself. This includes scenarios such as applying a specific version for an individual customer or a particular list of products.</p>
<h2 id="heading-shell-challenges">Shell Challenges</h2>
<p>When designing a shell the most important thing to consider is to design a shell agnostic of business details, often the teams start light and brings incrementally the complexity into shell for a single reason, and this is the shell is already there and the easiest way for change. But at longterm this will lead to have a higher risk of changes and distributed knowledge. why this is risky ? a shell can be central and need to be evolved by a wide range of teams, this becomes tricky in long run.</p>
<h2 id="heading-what-we-gonna-build">What we gonna build</h2>
<p>The following diagram shows the application we are building for this article, we will build a simple react application providing a client application that interact with server side resources via a shell ( Kernel ) that provides the discovery &amp; routing, and templating and a light Api to help the micro-frontends communicate together. Each micro-frontend provides the html in response which will be shown in our Web App.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1723301404980/1e53e944-fab4-4ffa-b239-5a8c0105b1db.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-source-code">Source Code</h3>
<p>The example related to this part of series is Part-01 branch in following Github repository.</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://github.com/XaaXaaX/aws-microfrontend-ssr">https://github.com/XaaXaaX/aws-microfrontend-ssr</a></div>
<p> </p>
<h3 id="heading-bookmarks">Bookmarks</h3>
<p>The Bookmarks Micro frontend stack has the responsibility of returning the bookmarks for a given userid, the bookmarks MFE validates the presence of UserId parameter and returns the results for the corresponding user.</p>
<p>In bookmarks service, the bookmarks list MFE is vertically sliced, but all downstream service calls can be managed if required.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1723509440909/31c14ffb-f219-4894-97b5-1efc2586d40e.png" alt class="image--center mx-auto" /></p>
<p>The bookmarks list mfe returns all bookmarks for given userid and returns an 4xx error if the userid not provided. also it applies the filtering of user bookmarks per product reference and name if applied.</p>
<p>The caching is applied per query string parameters including userid, ref, name. the following snippet represents the CDK example for bookmark service and related configuration.</p>
<p>The bookmarks distribution owns the caching requirements of bookmarks service, this is related to the way that each service owns and master its proper requirements.</p>
<pre><code class="lang-typescript"><span class="hljs-built_in">this</span>.Distribution = <span class="hljs-keyword">new</span> Distribution(<span class="hljs-built_in">this</span>, <span class="hljs-string">'BookmarkDistribution'</span>, {
    ...
    defaultBehavior: {
        origin: <span class="hljs-keyword">new</span> FunctionUrlOrigin(props.DefaultOriginListBookmarksFunctionUrl, {
            connectionAttempts: <span class="hljs-number">3</span>,
            connectionTimeout: Duration.seconds(<span class="hljs-number">1</span>),
            keepaliveTimeout: Duration.seconds(<span class="hljs-number">5</span>),
        }),
        allowedMethods: AllowedMethods.ALLOW_ALL,
        cachedMethods: AllowedMethods.ALLOW_GET_HEAD_OPTIONS,
        cachePolicy: <span class="hljs-keyword">new</span> CachePolicy(<span class="hljs-built_in">this</span>, <span class="hljs-string">'BookmarksCachePolicy'</span>, {
            queryStringBehavior: CacheQueryStringBehavior.allowList(
                <span class="hljs-string">'userid'</span>,
                <span class="hljs-string">'ref'</span>,
                <span class="hljs-string">'name'</span>
            ),
            defaultTtl: Duration.hours(<span class="hljs-number">1</span>),
            minTtl: Duration.hours(<span class="hljs-number">0</span>),
            maxTtl: Duration.hours(<span class="hljs-number">24</span>),
        }),
        viewerProtocolPolicy: ViewerProtocolPolicy.REDIRECT_TO_HTTPS,
        originRequestPolicy: OriginRequestPolicy.ALL_VIEWER_EXCEPT_HOST_HEADER,
    },
});

<span class="hljs-keyword">const</span> cfCfnDist = <span class="hljs-built_in">this</span>.Distribution.node.defaultChild <span class="hljs-keyword">as</span> CfnDistribution;

<span class="hljs-keyword">const</span> bookmarksOriginAccessControl = <span class="hljs-keyword">new</span> CfnOriginAccessControl(<span class="hljs-built_in">this</span>, <span class="hljs-string">'LambdaUrlOAC'</span>, {
    originAccessControlConfig: {
        name: <span class="hljs-string">`Bookmarks-Lambda-OAC`</span>,
        originAccessControlOriginType: <span class="hljs-string">'lambda'</span>,
        signingBehavior: <span class="hljs-string">'no-override'</span>,
        signingProtocol: <span class="hljs-string">'sigv4'</span>,
    }
});

cfCfnDist.addPropertyOverride(
    <span class="hljs-string">'DistributionConfig.Origins.0.OriginAccessControlId'</span>,
    bookmarksOriginAccessControl.getAtt(<span class="hljs-string">'Id'</span>)
);
</code></pre>
<p>The cloudfront and function url integration is done using OAC Signature V4, the function url grants the permissions to let the calls only be authorized only from cloudfront.</p>
<pre><code class="lang-typescript"><span class="hljs-built_in">this</span>.FunctionUrl.grantInvokeUrl(<span class="hljs-keyword">new</span> ServicePrincipal(<span class="hljs-string">'cloudfront.amazonaws.com'</span>, {
     conditions: {
        ArnLike: {
            <span class="hljs-string">'aws:SourceArn'</span>: <span class="hljs-string">`arn:aws:cloudfront::<span class="hljs-subst">${account}</span>:distribution/XXXXXXXXX`</span>,
        },
        StringEquals: { <span class="hljs-string">'aws:SourceAccount'</span>: account},
      }
}));
</code></pre>
<h3 id="heading-products">Products</h3>
<p>The Product service provides two distinct MFEs being catalog and Product Details Page ( PDP ), those are independent and isolated ones.</p>
<p>The product service follows same implementation and application architecture principles as bookmarks so vertically sliced and optimized for potential reusability needs. Here, the important note is possibility of sharing the repository layer, this is not an interest from programming perspective but more communication, as this let us to simply have multiple MFEs and apply composition for different components inside one MFE and reducing the number of network calls or Database calls.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1723509755788/124ef2f2-3581-4a59-b381-f0d8860fc8b6.png" alt class="image--center mx-auto" /></p>
<p>The products service follows the same principals as bookmarks in terms of caching, security and uses also lambda function url for api invocations.</p>
<h3 id="heading-web-application">Web Application</h3>
<p>The web application consists of Two pages , home page and product details page, the home page serves as a landing for bookmarks , and catalog micro frontends, while the PDP hosts only the product details micro frontend.</p>
<p>The Web app home page includes two MFEs , Product Catalog and bookmarks list, but this is a different case from the previous interests we had when optimizing to reduce network calls. The interest here applies more on organizational and bounded context.</p>
<ul>
<li><p>Each micro-frontend is owned and runs by different team</p>
</li>
<li><p>Each one has its proper context so database or downstream service calls</p>
</li>
<li><p>There is no domain context based relation between them.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1723511110802/2825f790-8d03-4892-8a9d-af627a1f3f57.png" alt class="image--center mx-auto" /></p>
<p>In this diagram the react app calls both MFEs in parallel and use the received results as html assets that live side by side. The website build assets will be uploaded on a dedicated bucket that will be as the default behavior of web app cloudfront. The web app cloudfront applies the required caching for static web site assets, but for dynamic routes as the ones related to home page calls, bookmarks and catalog, and product details page applies no caching but only forward the requests to the downstream MFE servies.</p>
<p><strong>Bundling</strong></p>
<p>To bundle the website app, the build is done using <code>react-scripts build</code> command, the bundle results will be deployed to the web app s3 bucket using CDK BucketDeployment construct.</p>
<pre><code class="lang-typescript">
<span class="hljs-keyword">new</span> BucketDeployment(<span class="hljs-built_in">this</span>, <span class="hljs-string">'BucketDeployment'</span>, {
   sources: [ Source.asset(join(process.cwd(), <span class="hljs-string">'/front-app/website/build'</span>)) ],
   cacheControl: [CacheControl.fromString(<span class="hljs-string">'max-age=1800,must-revalidate'</span>)],
   destinationBucket: frontStack.Bucket,
   distribution: ditribution.Distribution,
   distributionPaths: [<span class="hljs-string">'/*'</span>],
});
</code></pre>
<p>The use of BucketDeployment is for sake of demonstration but in real projects the deployment process must be under a dedicated cicd pipeline with multiple stages including CI ( linting, testing, analysis, etc.) and CD.</p>
<p><strong>Caching</strong></p>
<p>The cloudfront behavior for the dynamic content will use a dedicated CachePolicy and OriginAcessPolicy applying no caching but forwarding all request QueryString parameters.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> dynamicContentCachePolicy = <span class="hljs-keyword">new</span> CachePolicy(<span class="hljs-built_in">this</span>, <span class="hljs-string">'DynamicContentCachePolicy'</span>, {
    headerBehavior: CacheHeaderBehavior.none(),
    cookieBehavior: CacheHeaderBehavior.none(),
    queryStringBehavior: CacheHeaderBehavior.none(),
    defaultTtl: Duration.seconds(<span class="hljs-number">0</span>),
    minTtl: Duration.seconds(<span class="hljs-number">0</span>),
});

<span class="hljs-keyword">const</span> dynamicContentOriginRequestPolicy = <span class="hljs-keyword">new</span> OriginRequestPolicy(<span class="hljs-built_in">this</span>, <span class="hljs-string">'DynamicContentOriginRequestPolicy'</span>, {
    queryStringBehavior: OriginRequestQueryStringBehavior.all(),
    headerBehavior: OriginRequestHeaderBehavior.none(),
    cookieBehavior: OriginRequestCookieBehavior.none()
});

<span class="hljs-built_in">this</span>.Distribution.addBehavior(<span class="hljs-string">'api/v1/bookmarks/*'</span>, <span class="hljs-keyword">new</span> HttpOrigin(props.BookmarkServiceDomainName), {
    allowedMethods: AllowedMethods.ALLOW_ALL,
    cachePolicy: dynamicContentCachePolicy,
    viewerProtocolPolicy: ViewerProtocolPolicy.REDIRECT_TO_HTTPS,
    originRequestPolicy: dynamicContentOriginRequestPolicy
});
</code></pre>
<p><strong>Rendering</strong></p>
<p>The underlying code for the example website will be a simple react app, using useEffects react hook to fetch the data on page load.</p>
<pre><code class="lang-typescript">useEffect(<span class="hljs-function">() =&gt;</span> {
    <span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">RenderMFEs</span>(<span class="hljs-params"></span>) </span>{
      <span class="hljs-keyword">const</span> promiseProcessSuccess = <span class="hljs-string">'fulfilled'</span>;
      <span class="hljs-keyword">await</span> <span class="hljs-built_in">Promise</span>.allSettled([
        fetchMfe(Mfes.BOOKMARKS_LIST),
        fetchMfe(Mfes.PRODUCT_CATALOG),
      ]).then(<span class="hljs-function">(<span class="hljs-params">results</span>) =&gt;</span> {
        results.forEach(<span class="hljs-function">(<span class="hljs-params">result</span>) =&gt;</span> {
          <span class="hljs-keyword">if</span> (result.status === <span class="hljs-string">'rejected'</span>) <span class="hljs-built_in">console</span>.error(<span class="hljs-string">'HP error :'</span>, result.reason) });
        <span class="hljs-keyword">if</span> (results[<span class="hljs-number">0</span>].status === promiseProcessSuccess) setBookmarks(results[<span class="hljs-number">0</span>].value);
        <span class="hljs-keyword">if</span> (results[<span class="hljs-number">1</span>].status === promiseProcessSuccess) setCatalog(results[<span class="hljs-number">1</span>].value);

      });
    };

    <span class="hljs-keyword">if</span> (!bookmarks || !catalog) 
      RenderMFEs();

  }, [bookmarks, catalog]);
</code></pre>
<p>The following code demonstrates the <code>FetchMfe()</code> function, which is called in useEffect react hook.</p>
<pre><code class="lang-typescript">
<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> <span class="hljs-built_in">enum</span> Mfes {
  <span class="hljs-string">'PRODUCT_DETAILS'</span>,
  <span class="hljs-string">'BOOKMARKS_LIST'</span>,
  <span class="hljs-string">'PRODUCT_CATALOG'</span>,
}
<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> Paths: Record&lt;Mfes, <span class="hljs-built_in">string</span>&gt; = {
  [Mfes.PRODUCT_DETAILS]: <span class="hljs-string">'/api/v1/products/details/'</span>,
  [Mfes.PRODUCT_CATALOG]: <span class="hljs-string">'/api/v1/products/catalog/'</span>,
  [Mfes.BOOKMARKS_LIST]: <span class="hljs-string">'/api/v1/bookmarks/'</span>,
}

<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> fetchMfe = <span class="hljs-keyword">async</span> (MFE: Mfes, host?: <span class="hljs-built_in">string</span>): <span class="hljs-built_in">Promise</span>&lt;<span class="hljs-built_in">string</span>&gt;  =&gt; {

  <span class="hljs-keyword">const</span> hostDomain = <span class="hljs-string">`<span class="hljs-subst">${<span class="hljs-built_in">window</span>.location.protocol}</span>//<span class="hljs-subst">${<span class="hljs-built_in">window</span>.location.host}</span>/`</span>;

  <span class="hljs-keyword">let</span> urlPath = <span class="hljs-string">''</span>;
  urlPath = urlPath.concat(Paths[MFE]);

  <span class="hljs-keyword">const</span> url = <span class="hljs-keyword">new</span> URL(urlPath, hostDomain);

  <span class="hljs-keyword">const</span> queryParameters = <span class="hljs-keyword">new</span> URLSearchParams(<span class="hljs-built_in">window</span>.location.search);
  <span class="hljs-keyword">if</span>(queryParameters) url.search = queryParameters.toString();

  <span class="hljs-keyword">const</span> res = <span class="hljs-keyword">await</span> fetch(<span class="hljs-string">`<span class="hljs-subst">${url.href}</span>`</span>);
  <span class="hljs-keyword">return</span> <span class="hljs-keyword">await</span> res.text();
}
</code></pre>
<p><strong>Redirection</strong></p>
<p>As per requirement the redirection of certain URLs are important for SEO and brand trustworthy on the public, this is the case when certain links are no more valuable or has no corresponding result ( ex. when a product get out of stock or a url is deleted permanently).</p>
<p>The following CDK shows how to use a CloudFront Function to apply simple redirection for specific paths. The CloudFront functions use a lightweight version of Javascript. This way we can easily apply the redirection on top of specific urls under a distribution behavior and let the crawler to accumulate the old url score and brings them to the new redirected url ( this is how google indexation works when the permanent redirection is applied)</p>
<p>Thanks to <a target="_blank" href="https://x.com/rooToTheZ">David Behroozi</a> sharing the great and cost effective solution for redirection. This sections use the mentioned solution for our dedicated purpose.</p>
<pre><code class="lang-typescript">defaultBehavior: {
   allowedMethods: AllowedMethods.ALLOW_GET_HEAD,
   origin: webbucketOrigin,
   cachedMethods: AllowedMethods.ALLOW_GET_HEAD,
   cachePolicy: WebCachePolicy,
   viewerProtocolPolicy: ViewerProtocolPolicy.REDIRECT_TO_HTTPS,
   functionAssociations: [
        {
           <span class="hljs-function"><span class="hljs-keyword">function</span>: <span class="hljs-title">new</span> <span class="hljs-title">Function</span>(<span class="hljs-params">scope, `ProductRedirectFunctionViewerResponse`, {
                code: FunctionCode.fromFile({ filePath: `front-app/src/url-redirect.js`}),
                runtime: FunctionRuntime.JS_2_0 
           }</span>),
           <span class="hljs-title">eventType</span>: <span class="hljs-title">FunctionEventType</span>.<span class="hljs-title">VIEWER_RESPONSE</span>,
        },
   ]}</span>
</code></pre>
<p>The CloudFront function will be triggered at viewer response CloudFront event source trigger stage and applies a 301 permanent redirection.</p>
<p>The solution uses the s3 user-defined metadata to register the redirection target link.</p>
<pre><code class="lang-typescript"><span class="hljs-comment">// @ts-ignore</span>
<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">handler</span>(<span class="hljs-params">event</span>) </span>{
  <span class="hljs-built_in">console</span>.log(<span class="hljs-built_in">JSON</span>.stringify(event, <span class="hljs-literal">null</span>, <span class="hljs-number">2</span>));
  <span class="hljs-keyword">const</span> response = event.response,
        headers = response.headers,
        request = event.request;

  <span class="hljs-keyword">const</span> header = <span class="hljs-string">'x-amz-meta-location'</span>;

  <span class="hljs-keyword">if</span> ( 
    <span class="hljs-string">'GET'</span> == request.method &amp;&amp;  
    <span class="hljs-number">200</span> == response.statusCode &amp;&amp; 
    headers[header] &amp;&amp; 
    headers[header].value
  ) {
      headers.location = { value: headers[header].value };
      <span class="hljs-keyword">return</span> {
        statusCode: <span class="hljs-number">301</span>,
        statusDescription: <span class="hljs-string">'Moved Permanently'</span>,
        headers,
      };
  }
  <span class="hljs-keyword">return</span> response;
}
</code></pre>
<p>The example ues BucketDeployment construct to upload a single file with corresponding metadata.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">new</span> BucketDeployment(<span class="hljs-built_in">this</span>, <span class="hljs-string">'RedirectionDeployment'</span>, {
   sources: [ Source.asset(join(process.cwd(), <span class="hljs-string">'/front-app/src/redirection-files'</span>)) ],
   metadata: {
      <span class="hljs-string">'location'</span>: <span class="hljs-string">`https://<span class="hljs-subst">${ditribution.Distribution.distributionDomainName}</span>/?category=ON_SOLD`</span>,
   },
   destinationBucket: frontStack.Bucket,
   prune: <span class="hljs-literal">false</span>,
});
</code></pre>
<p>In this example all files in <code>redirection-files</code> folder will be uploaded with a location to the s3 bucket. but in real projects it can be any dedicated back-office generating files with corresponding metadata.</p>
<h3 id="heading-shell">Shell</h3>
<p>The shell is responsible for composing a template from different independent micro-frontends and returning the final result asset to the client app. The shell also applies some cross cutting concerns such as Authentication/Authorization, Graceful Degradation and Logging while acceding server side resources.</p>
<p>One of the most popular responsibilities of a shell is service (MFEs) discovery, a service discovery considers also the deployment strategy if applies, such as wighted , canary or Blue/Green.</p>
<p>A shell often impose an standard such as following example that can be integrated in a template as demonstrated below</p>
<pre><code class="lang-xml"><span class="hljs-tag">&lt;<span class="hljs-name">MfeTag</span>
   <span class="hljs-attr">src</span>=<span class="hljs-string">'...'</span>
   <span class="hljs-attr">error-handling</span>=<span class="hljs-string">'fallback'</span>
   <span class="hljs-attr">fallback</span>=<span class="hljs-string">'...'</span>
   <span class="hljs-attr">options</span>=<span class="hljs-string">'cors,auth,apikey,tls'</span>
   <span class="hljs-attr">timeout</span>=<span class="hljs-string">1000</span>
   <span class="hljs-attr">passthrough</span>=<span class="hljs-string">'cookies=[...],query=[...],headers=[...]'</span>
   <span class="hljs-attr">strategy</span>=<span class="hljs-string">'canary'</span>
/&gt;</span>
</code></pre>
<p>The above example represent a a simple custom HTML tag indicating under which constraints the MFEs communication shall be done.</p>
<p>in this example the shell will send a http call to the backend</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Attribute</td><td>Description</td><td>Remarks</td></tr>
</thead>
<tbody>
<tr>
<td>Src</td><td>The path to the MFE</td><td>This is often over HTTPS protocol but can be any other protocol such as TCP, WSS direct Lambda Request/Response invocation, StepFunctions sync execution, etc.</td></tr>
<tr>
<td>Erro-Handling</td><td>Indicates how will handle the overall template behavior in case of failures</td><td>The possible values are: Fail, Fallback, Degradation</td></tr>
<tr>
<td>Fallback</td><td>The fallback url to call if the src will be unavailable</td><td>This is often over HTTPS protocol but can be any other protocol such as TCP</td></tr>
<tr>
<td>Options</td><td>The options indicates standard cross cutting concerns related to this call</td><td>Possible values are: ApiKey, Authorization, Cors, mTLS, SignatureV4</td></tr>
<tr>
<td>Timeout</td><td>The MFE corresponding max waiting time for a response</td><td>This is in milliseconds</td></tr>
<tr>
<td>Passthrough</td><td>How the shell transfers the cookies , query string or headers to the MFE, if not present all parameters will be transited to the MFE</td><td>This can be customised based on the enterprise requirements</td></tr>
<tr>
<td>Strategy</td><td>This indicates the how the shell must consider the deployment strategy, Possible values are canary, bluegreen, weighted</td><td>The extra details per strategy will be fetched from discovery service , for example for canary the shell fetch the corresponding canary strategy like 10PercentPer5Minutes, To achieve an effective strategy the shell must be able to track state ( stateful )</td></tr>
</tbody>
</table>
</div><p>In the next part of this series we will deep dive in the implementation details and design tradeoffs while considering a shell as a candidate to be user or not. The next part will focus on how to achieve a valuable shell and when consider applying a shell or adopt a simplified approach with less cognitive load but with a distributed responsibility at MFE sides per servie or Context.</p>
<h2 id="heading-running-the-example">Running the Example</h2>
<p>To run the example locally, The backend MFEs must be ran on a local host server to let the local react website be functional. We use a simple way of running the typescript services locally over localhost( port 4242) to achieve local distributed MFEs and let the website communicate over localhost with MFEs.</p>
<p>The provided local server script is a simplified version of the script i used previously to run the backend lambda services with a minimum overhead and dependencies. A thanks to <a target="_blank" href="https://x.com/wow_sig">Zied Ben Tahar</a> for giving the indications and helps to make this functional while we worked together at Aviv.</p>
<p>First, Lets deep a bit to see how this works. The script simply loop through an array of entrypoint and invoke the configured function which is exported.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> handleRoute = <span class="hljs-keyword">async</span> (lambdaEntry: { entryPoint: <span class="hljs-built_in">string</span>, handlerFn?: <span class="hljs-built_in">string</span>, action?: <span class="hljs-function">(<span class="hljs-params">...args: <span class="hljs-built_in">any</span></span>) =&gt;</span> <span class="hljs-built_in">void</span> }, req: Request, res: Response ) =&gt; {
   <span class="hljs-keyword">const</span> <span class="hljs-keyword">module</span> = await import(lambdaEntry.entryPoint);
   const lambdaFunctionHandler = <span class="hljs-keyword">module</span>[lambdaEntry.handlerFn ?? "handler"];
   const result = await lambdaFunctionHandler(
       generateEvent(req), 
       createLambdaContextObjectFromContextPayload(req.body.context)
   );
   return res.send(result.body).end();
}
</code></pre>
<p>The entrypoints are the <code>ts</code> or <code>tsx</code> files, imported using dynamic import.</p>
<p>also the script uses the express to setup a local server using some defined middleware to use and the express router and provide all http verbs for each entrypoint.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> app = express();
app.use(cors());
app.use(express.json());
app.use(express.text());
app.use(express.raw());
app.use(express.urlencoded({ extended: <span class="hljs-literal">true</span> }));
<span class="hljs-keyword">const</span> router = express.Router();

lambdasEntrypoints.forEach(<span class="hljs-function">(<span class="hljs-params">lambdaEntry</span>) =&gt;</span> {
  router.all(<span class="hljs-string">`/<span class="hljs-subst">${lambdaEntry.endpoint}</span>/`</span>, <span class="hljs-keyword">async</span> (req: Request, res: Response) =&gt; {
    <span class="hljs-keyword">return</span> handleRoute(lambdaEntry, req, res);
  });
});
</code></pre>
<p>The Entrypoints configs are configured as an array of entrypoints and are fetched using the glob package. The paths correspond to the corresponding MFEs real origin behavior to avoid changing the website code just for local testing.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">export</span> <span class="hljs-keyword">type</span> ConfigSource = { path: <span class="hljs-built_in">string</span>, source: <span class="hljs-built_in">string</span>, handlerFn?: <span class="hljs-built_in">string</span>, action?: <span class="hljs-function">(<span class="hljs-params">...args: <span class="hljs-built_in">any</span></span>) =&gt;</span> <span class="hljs-built_in">void</span> };
<span class="hljs-keyword">const</span> configs: ConfigSource[] = [
    { path: <span class="hljs-string">"api/v1/bookmarks/"</span>, source: <span class="hljs-string">'micro-fronends/bookmarks/src/handlers/list/index.ts'</span> },
    { path: <span class="hljs-string">"api/v1/products/catalog/"</span>, source: <span class="hljs-string">'micro-fronends/products/src/handlers/catalog/index.ts'</span> },
    { path: <span class="hljs-string">"api/v1/products/details/"</span>, source: <span class="hljs-string">'micro-fronends/products/src/handlers/details/index.ts'</span>, action: <span class="hljs-function">(<span class="hljs-params">req: Request, res: Response</span>) =&gt;</span> { <span class="hljs-built_in">console</span>.log(req); res.writeHead(<span class="hljs-number">302</span>, {Location: <span class="hljs-string">`/api/v1/products/catalog/v1/?category=ON_SOLD`</span>}).end();} },
];

<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> lambdasEntrypoints = globSync(configs.map(<span class="hljs-function"><span class="hljs-params">src</span> =&gt;</span> src.source ?? <span class="hljs-string">'./'</span>)  , { 
    ignore: [
        <span class="hljs-string">"**/**/node_modules/**"</span>,
        <span class="hljs-string">"**/**/*..test.ts"</span>,
        <span class="hljs-string">"**/**/*..spec.ts"</span>,
    ] }).map(<span class="hljs-function">(<span class="hljs-params">entry: <span class="hljs-built_in">string</span></span>) =&gt;</span> {
        <span class="hljs-keyword">const</span> config = configs.find(<span class="hljs-function"><span class="hljs-params">c</span> =&gt;</span> c.source.includes(entry));
        <span class="hljs-keyword">const</span> entryPoint = join(process.cwd(), entry.split(path.sep).join(path.posix.sep)),
              lambdaName = entry
                .split(path.sep)
                .slice(<span class="hljs-number">-1</span>)[<span class="hljs-number">0</span>]
                .replace(<span class="hljs-string">".(ts|js)"</span>, <span class="hljs-string">""</span>),
              endpoint = config?.path,
              handlerFn = config?.handlerFn,
              action = config?.action;

        <span class="hljs-keyword">return</span> { entryPoint, lambdaName, endpoint, handlerFn, action };
    });
</code></pre>
<p>The example local MFEs can be started using following comand</p>
<pre><code class="lang-bash">$ npm run <span class="hljs-built_in">local</span>:start
------------ 
[Local λ debugger]: Local lambda invoke debug server is running at http://localhost:4242
[Local λ debugger]: Discovered 3 lambdas entrypoints
  [λ endpoint]: api/v1/bookmarks/
    [exported <span class="hljs-built_in">functions</span>]: handler
  [λ endpoint]: api/v1/products/catalog/
    [exported <span class="hljs-built_in">functions</span>]: handler
  [λ endpoint]: api/v1/products/details/
    [exported <span class="hljs-built_in">functions</span>]: handler
</code></pre>
<p>The website can be ran using folling command</p>
<pre><code class="lang-bash">$ npm run <span class="hljs-built_in">local</span>:website
</code></pre>
<p>The package.json <code>local:website</code> script passes the local url using env variable and the Shared FetchMFE function look at this environment variable to decide where to look, principal host or the local.</p>
<pre><code class="lang-json">{
   <span class="hljs-attr">"local:website"</span>: <span class="hljs-string">"REACT_APP_LOCAL_URL=http://localhost:4242 npm run start --prefix front-app/website"</span>
}
</code></pre>
<p>Addressing the navigation to `<a target="_blank" href="http://localhost:3000/?userid=HJ-HnhYul_sm">http://localhost:3000/?userid=HJ-HnhYul_sm</a>` url will fetch the user bookmarks and also the product catalog page as below</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1724715172480/55404569-d483-4e61-97b9-22147941c3a9.png" alt class="image--center mx-auto" /></p>
<p>By adding the product reference as a query parameter the bookmarks and catalog will show only the corresponding references ( example <a target="_blank" href="http://localhost:3000/?userid=HJ-HnhYul_sm&amp;ref=REF_2">http://localhost:3000/?userid=HJ-HnhYul_sm&amp;ref=REF_2</a> )</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1724715331262/5379cf50-a8e5-4510-ba60-3f88fe4562f8.png" alt class="image--center mx-auto" /></p>
<blockquote>
<p>The user auth will be as part of next article with shell implementation. the actual system pass all query string params to all the MFEs without considering the requirements of each MFE separately.</p>
</blockquote>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Applying a well architected Micro Frontend approach is highly similar to the micro services, it starts with curiosity, enthusiasm and rigor but can become complex and introduce the obstacles against the approach primer et goals.</p>
<p>Simplicity a principal to success but putting the right part in the right place and defining the responsibilities and boundaries are the crucial things to achieve a long term decision and design.</p>
<p>In this part of series, some simple and straightforward parts was discovered by a focus on representing the overall design goals and possibilities while adopting Micro Frontends.</p>
<p>The next part will deep dive in Shell implementation and all corresponding modules such as Discovery, Templating, Routing, and putting all pieces together for achieving run and build time advantages.</p>
]]></content:encoded></item><item><title><![CDATA[EventBridge Api Destination]]></title><description><![CDATA[This article walks through the event bridge Api Destinations and delves into some architectural abstractions when using the Api Destinations.
Eventbridge provides a highly available and scalable service bus that covers the Point to point and Fan-Out ...]]></description><link>https://blogs.serverlessfolks.com/eventbridge-api-destination</link><guid isPermaLink="true">https://blogs.serverlessfolks.com/eventbridge-api-destination</guid><category><![CDATA[eventbrdige api destination]]></category><category><![CDATA[Amazon Eventbridge ApiDestination]]></category><category><![CDATA[AWS]]></category><category><![CDATA[ratelimit]]></category><category><![CDATA[throttling]]></category><category><![CDATA[serverless]]></category><dc:creator><![CDATA[Omid Eidivandi]]></dc:creator><pubDate>Fri, 03 May 2024 21:16:33 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1714770875965/d4005018-dbc4-45f5-8521-8c856f6946b6.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This article walks through the event bridge Api Destinations and delves into some architectural abstractions when using the Api Destinations.</p>
<p>Eventbridge provides a highly available and scalable service bus that covers the Point to point and Fan-Out communication patterns, It integrates with a very wide range of AWS services and also provides a way of communicating with external systems and many well-known partners in the software industry.</p>
<h2 id="heading-api-destinations">Api Destinations</h2>
<p>EventBridge API destinations allow you to send events to HTTPS endpoints., this means any public-facing and resolvable domain can be used as a destination to the Event Bridge Api Destination module.</p>
<p>Api destination supports all HTTP verbs except TRACE and CONNECT, so you can use all other verbs like GET, PUT, POST, PATCH, OPTIONS, and DELETE.</p>
<p>To use Api destination a <strong>Connection</strong> must be configured first, the connection is where we define the authorization mechanism.</p>
<p>The supported auth types:</p>
<ul>
<li><p>OAuth</p>
</li>
<li><p>Api key</p>
</li>
<li><p>Basic ( username/password)</p>
</li>
</ul>
<p>The following representation shows my understanding of Api destination ( any feedback will be appreciated )</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1710975770730/72d3e909-c831-459e-850e-2e4d98e986da.png" alt class="image--center mx-auto" /></p>
<p><strong><em>Rest Api façade:</em></strong> is the entry point of Api Destinations and is triggered by a rule or pipe for any matched event.</p>
<p><strong><em>Rate Limiter:</em></strong> when creating an API destination and setting an <strong>Invocation Rate Limit Per Second</strong> value, the API destination explicitly controls that Api Destination incoming throughput per second.</p>
<p>How exactly works rate limit is a big point, and till now i find no article or blog representing it correctly event <a target="_blank" href="https://serverlessland.com/serverless/visuals/eventbridge/api-destinations">Serverless Land visuals for invocation rate</a></p>
<p><strong><em>Connection:</em></strong> The connection validates the authorization mechanism, the connection validate and prepare the auth config.</p>
<p>I would like just to know if this component is responsible for OAuth call or not, for other types Apikey and Basic just adding a header is sufficient but for OAuth a call is required, i'm curious about that.</p>
<p><strong><em>Target Invoker:</em></strong> This is the name given by me and i just tried to decompose the whole process. so to be clear here i mean the module calling the target via HTTP.</p>
<h2 id="heading-overview">Overview</h2>
<p>The Rest api receives the events over http and is used by Rule or pipe, the following diagram shows the Rule integration with api destination</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1710801507176/81263c6d-3351-4818-b5e1-15f6fbc44a30.png" alt class="image--center mx-auto" /></p>
<p>The rule sends all matched events by calling Api destination synchronously and the rule will be acknowledged by the success of target.</p>
<p>Eventbridge default retry policy reattempts to send the event for a 24h period with a max retry count of 185 times. this way the eventbridge will do the best effort to have a chance for event delivery.</p>
<p>There is a possibility to add a Retry policy to customise the default retry configuration, the retry policy accepts maximum age of events to keep and the number of attempts in case of errors.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1710890384578/1b897316-92fb-4465-887c-6bc6f3c8089f.png" alt class="image--center mx-auto" /></p>
<p>This is a good opportunity to avoid loose of events in case the target returns an error or has incapacity for reception of events.</p>
<p>Api Destination as part of request validation verifies the Rate limit if the throughput exceeds the configured rate limit the api destination will throttle the requests as shown in the following diagram.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1710809225362/ace0cdbb-804b-4081-854d-99eecc0743db.png" alt class="image--center mx-auto" /></p>
<p>EventBridge and Rules are the abstract concepts on top of a queueing system.</p>
<p>The event navigation flow follows as demonstrated below</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1710975724135/d966a6fe-9694-425c-a7de-9fbb4510d315.png" alt class="image--center mx-auto" /></p>
<ul>
<li><p>EventBridge event will be send to the rule</p>
</li>
<li><p>Rule sends the request to the Api destination via api call</p>
</li>
<li><p>The api validates the rate limiter status</p>
</li>
<li><p>The connection manages the auth per configuration</p>
</li>
<li><p>The target endpoint manage the external api endpoint via HTTP.</p>
</li>
</ul>
<h2 id="heading-source-code">Source Code</h2>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://github.com/XaaXaaX/aws-eventbridge-api-destination">https://github.com/XaaXaaX/aws-eventbridge-api-destination</a></div>
<p> </p>
<h2 id="heading-rate-limiting">Rate Limiting</h2>
<p>This section is a brief recap of my own understanding after some discussions and research.</p>
<p>When you configure an API destination, you can specify a rate limit, which controls the maximum number of events per second that EventBridge will send to the destination endpoint. This rate limit helps prevent overwhelming the destination with a high volume of events.</p>
<ul>
<li><p><strong>Rate limit</strong> is based on <a target="_blank" href="https://en.wikipedia.org/wiki/Token_bucket">token bucket algorithm</a>. The rate limit is represented by the size of a token bucket and the rate at which tokens are replenished. Each event arrival consumes a token.</p>
</li>
<li><p>When the bucket becomes <strong>empty</strong>, the Api destination will throttle the requests that involves temporarily <strong>delaying or buffering</strong> some of the events until tokens become available again.</p>
</li>
<li><p>This is not yet clear ( No documentation or reference found ), but per tokenization algorithm the tokens are added into the bucket at a fixed rate, corresponding to the specified rate limit.</p>
</li>
<li><p>EventBridge continuously <strong>monitors</strong> the rate of incoming events for each API destination. It keeps track of the number of events received per second and compares it against the specified rate limit for that destination.</p>
</li>
<li><p>A <strong>Backoff Mechanism</strong> will be applied if the rate of incoming events consistently exceeds the specified rate limit. This means that EventBridge will gradually decrease the rate at which it sends events to the destination in order to alleviate the overload. Once the rate of incoming events decreases and falls below the specified limit, EventBridge will gradually resume sending events at the normal rate.</p>
</li>
<li><p>If EventBridge encounters errors while attempting to deliver events to the destination due to throttling or other issues, it may <strong>retry the delivery</strong> according to its <strong>retry policy</strong>. However, if the errors persist or if the destination consistently fails to handle the events, EventBridge may eventually stop attempting to deliver events to that destination and generate an error or warning notification.</p>
</li>
</ul>
<h2 id="heading-connection">Connection</h2>
<p>A connection can be used as a Auth configuration blackbox, this means yous can choose your required auth type but Event Bridge will use a secret manager secret to register those credential securely.</p>
<p>Event bridge Basic and Api Key auth types are simple standards and the population of credentials and Http request header generation is managed.</p>
<p>The OAuth is based on <a target="_blank" href="https://tools.ietf.org/html/rfc6749#section-4.4"><strong>client grant standard</strong></a> that is a standard to obtain credentials outside context of a user. When OAuth configured the Event bridge will communicate with OAuth base service providing the <code>client_id</code> and <code>client_secret</code> to obtain an <code>access_token</code>. As an <code>access_token</code> expires event bridge as per receiving an unauthorised response error being <strong>401</strong> or <strong>407</strong> will use the <code>refresh_token</code> to ask for a new <code>access_token</code> from OAuth server.</p>
<h2 id="heading-the-practical-usage">The practical usage</h2>
<p>For the sake of demonstration, this section use two destinations to see in action how event bridge api destination behaves in action.</p>
<p>The example connection is simple and provides an Apikey auth type that will send the apikey under <code>x-api-key</code> header.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> connection = <span class="hljs-keyword">new</span> Connection(<span class="hljs-built_in">this</span>, Connection.name, {
    authorization: Authorization.apiKey(
        <span class="hljs-string">'x-api-key'</span>, 
         SecretValue.secretsManager(secret.secretArn)
    )
});
</code></pre>
<p>And the Connection will be used with Api destination</p>
<p>Please replace a new webhook.site url by navigating to the <a target="_blank" href="https://webhook.site">https://webhook.site</a> and place it in cdk stack as a replacement for webhooksiteUrl const variable <a target="_blank" href="https://github.com/XaaXaaX/aws-eventbridge-api-destination/blob/eff1e44d3d2551e6ed85822fffb749d2510d5707/cdk/lib/cdk-stack.ts#L14">here</a> and also <a target="_blank" href="https://github.com/XaaXaaX/aws-eventbridge-api-destination/blob/eff1e44d3d2551e6ed85822fffb749d2510d5707/data/events.json#L11">here</a></p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> apiDestination = <span class="hljs-keyword">new</span> ApiDestination(<span class="hljs-built_in">this</span>, <span class="hljs-string">'api-destination'</span>, {
    httpMethod: HttpMethod.POST,
    endpoint: props.apiUrl!,
    connection: connection,
    rateLimitPerSecond: <span class="hljs-number">1</span>,
});
</code></pre>
<p>The repository provides a fake event json file that let send the events to the bus as batch of events</p>
<pre><code class="lang-bash">npm run events:send
</code></pre>
<p>The api destination has a rate limit of 1 RPS, This means the api destination will receive the event from rule and in case of more than 1 request per second the request will be throttled at the Rule / Api Destination edge and this leads to prevent overwhelming the target.</p>
<p>Looking at the <a target="_blank" href="https://webhook.site/">webhook.site</a> and looking in reception time preciously, as per following figure the events are reaching target.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1710898938624/a9a4a676-e5c4-4622-8b56-fba599c2e18a.png" alt class="image--center mx-auto" /></p>
<p>If the throttling errors cause the event to reach the DLQ the message attributes shows the RETRY_ATTEMPTS and ERROR_MESSAGE shows the Api Destination Message indicating the reason of failure including target response payload.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1710899815667/395aae57-6020-4ede-a982-c7c26e09a93b.png" alt class="image--center mx-auto" /></p>
<p>EventBridge guarantees the rate limit control at a '<strong>Best Effort</strong>' and does not guarantee the exact, because rates are not globally shared across the fleet and asynchronously propagated. It s important to remind that Event bridge is a distributed service and keeping state consistency without having increased latency can be hard to achieve and that is why the Rate limits are not tracked correctly especially low TPS.</p>
]]></content:encoded></item><item><title><![CDATA[Integrate  Bedrock With Alexa skill]]></title><description><![CDATA[Previously I wrote an article about Bedrock ( Suspicious message detection ) and enjoyed it a lot since, while playing with the different models provided, as part of amusement, playing with text was one of the cool parts. The Mistral 7B model sounds ...]]></description><link>https://blogs.serverlessfolks.com/integrate-bedrock-with-alexa-skill</link><guid isPermaLink="true">https://blogs.serverlessfolks.com/integrate-bedrock-with-alexa-skill</guid><category><![CDATA[AI]]></category><category><![CDATA[serverless]]></category><category><![CDATA[bedrock]]></category><category><![CDATA[Alexa]]></category><category><![CDATA[Amazon Bedrock]]></category><dc:creator><![CDATA[Omid Eidivandi]]></dc:creator><pubDate>Tue, 30 Apr 2024 14:34:32 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1714487587262/1d0a6d9e-5f02-4c8d-816b-4ca8e2fec4ec.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Previously I wrote an article about Bedrock ( <a target="_blank" href="https://serverlessfolks.com/aws-bedrock-suspicious-message-detection">Suspicious message detection</a> ) and enjoyed it a lot since, while playing with the different models provided, as part of amusement, playing with text was one of the cool parts. The Mistral 7B model sounds great option when dealing with text and chat options. my journey with Mistral started with some reasoning tests, giving some prompts, and at the end asking why you considered these provided results.</p>
<p>The hard part was imagining different prompts and it was a time-consuming task, so the next step was to simplify my prompting journey, I decided to try a simpler scenario and that was testing conversation. here is an example.</p>
<blockquote>
<p>You are the secondary person in a conversation, you have a funny and sympathetic character, Omid is here to have a chat about some random topic, you must consider the logic, and facts but at the same time keep the conversation friendly. don't generate the response on his turn and just answer on your turn keeping responses short and moderate.</p>
<p>Omid says: {}</p>
</blockquote>
<p>Here are my last attempts to discover the model. gave it up as this was a lot of thinking about phrasing and not a time-optimised discovery period.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1710688294494/fded086d-bb33-486c-a579-10ba7f4071ea.gif" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1710688344226/18ad43fb-34e3-4f56-af20-eea01f59cecf.gif" alt class="image--center mx-auto" /></p>
<p>Now the conversation needs to end at some time with a Goodbye phrase, let's say Goodbye.</p>
<blockquote>
<p>You: Oh, it was nice talking to you, Omid. Have a great day!</p>
</blockquote>
<p>But the conversation can end up with other words as we know the AI is behind let's say this time Stop</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1710688706690/3023fa56-4bde-4046-b122-2dabb1a6a02c.png" alt class="image--center mx-auto" /></p>
<p>I ended up writing a lambda receiving the conversation and looking for keywords to avoid calling LLM, this was lots of if and else but was working. but there were gaps and saying '<strong><em>I prefer to stop the conversation</em></strong>', the code was sending the request and that was reaching the LLM.</p>
<p>Lots of frustration leaving your goal and typing procedural codes around Yes ,No or Stop and end.</p>
<p>Finally i discovered my two echo devices in living room and YESSSSSS, Since smart devices and connected objects are propagating in our lives, in every residence I visit a kind of connected hub is present, Alexa, Siri, etc. That made me think about how we can benefit from their presence to remove simple tasks, so had a look around my journey and found behind any busy day there is enough tiredness and this has an impact on the way of following my son.</p>
<blockquote>
<p>I want to have a simple and fun evening with him</p>
</blockquote>
<p>Back in 2021, I had created an Alexa skill to play with the service but this time the idea was going toward no blueprint and a custom skill to have a customised skill that fit better my son's behaviors.</p>
<h2 id="heading-why-not-only-llm">Why not only LLM?</h2>
<p>Using GenAI is a fun part of the puzzle but it comes with a cost, the need of added complexity while prompting. Not considering the Alexa vocal and device capabilities, Alexa helps to simplify the conversational flow and creation of simple provisional intentions and manages it for us. With LLM you need to manage when and how to reiterate through the conversation, stop it, or defer it.</p>
<p>Refining the prompt could be simple to handle this challenge but again '<strong><em>Does it make sens to call Bedrock?</em></strong>', below, example of prompt that could handle the situation.</p>
<blockquote>
<p>You are the secondary person in a conversation, you have a funny and sympathetic character, Omid is here to have a chat about some random topic, you must consider the logic, and facts but at the same time keep the conversation friendly, End the conversation if he asks to stop such as stop or you feel any frustration in his response such as you gonna make me crazy, don't generate the response on his turn and just answer on your turn keeping responses short and moderate.</p>
</blockquote>
<p>By looking at the default Alexa intents an intent has nothing special on its own and is just a wrapper around slots ( variables ) that can have many predefined words configured, these words in a custom intent can be such as '<strong><em>run</em></strong>', '<strong><em>execute</em></strong>', or '<strong><em>open</em></strong>'. and this is more cost effective to use intents instead of calling Bedrock and being billed by token.</p>
<h2 id="heading-what-we-build">What we build</h2>
<p>Building a simple Alexa skill was fun but was hard and time-consuming to list all required intentions and example phrases, there are a large number of situations to consider, and did not seem to me simple and achievable. we benefit from Alexa just for standard intents she provides but for conversation we put every thing to the backend and LLM.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1710609282275/4d95ad8a-d5ac-46d0-9ac7-c03458e81dc4.png" alt class="image--center mx-auto" /></p>
<p><strong>Bedrock:</strong> Amazon Bedrock provides a simplified way of interacting with LLM models and this time Mistral 7B seemed to me a good one.</p>
<p><strong>Step Functions:</strong> Orchestrating the Prompt generation and interacting with LLM was really fast using StepFunction.</p>
<p><strong>Lambda:</strong> Lambda was a mandatory step as Alexa skill supports two kinds of backend endpoints, Http and Lambda trigger.</p>
<p><strong>Alexa</strong>: Alexa helps to with minimum effort get into a vocal atmosphere.</p>
<h2 id="heading-source-code">Source Code</h2>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://github.com/XaaXaaX/aws-bedrock-with-alexa">https://github.com/XaaXaaX/aws-bedrock-with-alexa</a></div>
<p> </p>
<h2 id="heading-alexa-skill">Alexa Skill</h2>
<p>To start working with Alexa skills, the <a target="_blank" href="https://marketplace.visualstudio.com/items?itemName=ask-toolkit.alexa-skills-kit-toolkit">Alexa ASK Toolkit for VsCode</a> gives all the required things to create, download, and interact with Skill.</p>
<p>Next Installing the Alexa Skill Kit cli.</p>
<pre><code class="lang-bash">npm install -g ask-cli
</code></pre>
<p>The walkthrough needs to create an <a target="_blank" href="https://developer.amazon.com/loginwithamazon/console/site/lwa/overview.html">Amazon Security Profile</a> in the Developer Console, For further testing using the Alexa device use the same email address as the Alexa device account, this will provide the possibility to interact with skills from the real device without public distribution of skill.</p>
<p>After creating the security profile, Configure allowed redirect and allowed origin</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Allowed Redirect Url</td><td>http://127.0.0.1:9090/cb</td></tr>
</thead>
<tbody>
<tr>
<td>Allowed Origin</td><td>http://127.0.0.1:9090</td></tr>
</tbody>
</table>
</div><p>Configuring the local is simple and fast, this will be using ask cli, when asked provide the ClientId and Client Secret from Security profile created ( client confirmation is where you put client secret )</p>
<pre><code class="lang-bash">ask configure
</code></pre>
<p>Proceed with the account linking process when asked if you prefer hosting the Alexa infrastructure on your account, for this article we leave the host of skill infrastructure on the Alexa side.</p>
<p>After redirect a default profile will be created in '&lt;HOME_DIR&gt;/.ask/cli_config'</p>
<h2 id="heading-deploying-the-back-end">Deploying the back end</h2>
<p>Before deploying the skill manifest is already present as part of the source code the backend must be deployed.</p>
<p><em>The Mistral LLM model is not available in all regions, for this article i used</em> <strong><em>us-west-2</em></strong></p>
<p>To deploy the backend run the following command.</p>
<pre><code class="lang-bash">npm run cdk:app deploy
</code></pre>
<h2 id="heading-deploy-skill">Deploy Skill</h2>
<p>First we need to copie the SkillFunction Lambda Arn and use it in '<strong><em>skill.json</em></strong>' file.</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"manifest"</span>: {
    <span class="hljs-attr">"apis"</span>: {
      <span class="hljs-attr">"custom"</span>: {
        <span class="hljs-attr">"endpoint"</span>: {
          <span class="hljs-attr">"uri"</span>: <span class="hljs-string">"arn:aws:lambda:us-west-2:11111111111:function:ConversationStack-SkillFunct-skillfunctionB016215E-dYmSe81VVXgY"</span>
        },
        ....
      }
    },
    ....
  }
}
</code></pre>
<p>The Alexa assets are part of the article Source code under '<strong>src/skill/WANTIT</strong>' path. to deploy the skill go to the path and run the following command</p>
<pre><code class="lang-bash"><span class="hljs-built_in">cd</span> src/skill/WANTIT
ask deploy
</code></pre>
<p>The Skill will be Launched by '<strong><em>Alexa open daddy omid</em></strong>' or '<strong><em>Alexa ask daddy omid, ..........</em></strong>' , and skill will sent all remaining part of instruction to the backend. this help me to have the whole demand of my son in backend.</p>
<h2 id="heading-deep-dive-in-backend">Deep dive in Backend</h2>
<p><strong>The</strong> entry-point of our backend is the lambda function listening to skill request. the lambda contains multiple request handler as</p>
<ul>
<li><p>LaunchRequestHandler (Alexa default)</p>
</li>
<li><p>HelpIntentHandler (Alexa default)</p>
</li>
<li><p>CancelAndStopIntentHandler (Alexa default)</p>
</li>
<li><p>SessionEndedRequestHandler (Alexa default)</p>
</li>
<li><p>YesIntent (Alexa default)</p>
</li>
<li><p>NoIntent (Alexa default)</p>
</li>
<li><p>AskWantItIntentHandler ( Custom )</p>
</li>
</ul>
<p>The following snippet represents the AskWantItIntentHandler implementation</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> AskWantItIntentHandler : RequestHandler = {
  canHandle(handlerInput : HandlerInput) : <span class="hljs-built_in">boolean</span> {
    <span class="hljs-keyword">const</span> request = handlerInput.requestEnvelope.request;  
    <span class="hljs-keyword">return</span> request.type === <span class="hljs-string">'IntentRequest'</span> &amp;&amp; request.intent.name === process.env.SKILL_NAME;
  },
  <span class="hljs-keyword">async</span> handle(handlerInput : HandlerInput) : <span class="hljs-built_in">Promise</span>&lt;Response&gt; {
    <span class="hljs-keyword">const</span> item = { ... };
    <span class="hljs-keyword">const</span> sfnresponse = <span class="hljs-keyword">await</span> sfnClient.send(<span class="hljs-keyword">new</span> StartSyncExecutionCommand({ ... }));

    <span class="hljs-keyword">const</span> output = <span class="hljs-built_in">JSON</span>.parse(sfnresponse.output ?? <span class="hljs-string">'{}'</span>);

    <span class="hljs-keyword">let</span> speechText = <span class="hljs-built_in">JSON</span>.parse(sfnresponse.output ?? <span class="hljs-string">'{}'</span>)?.Body?.outputs?.[<span class="hljs-number">0</span>]?.text;

    <span class="hljs-keyword">return</span> handlerInput.responseBuilder
      .speak(speechText)
      .reprompt(<span class="hljs-string">'Are ok with that?'</span>)
      .withSimpleCard(<span class="hljs-string">'You will get it.'</span>, speechText)
      .getResponse();
  },
};
</code></pre>
<p>As Alexa needs a direct response, the lambda send a <strong><em>StartSyncExecutionCommand</em></strong> and wait for the State Machine response.</p>
<p><em>The</em> <strong><em>StartSyncExecutionCommand</em></strong> <em>is only supported by</em> <strong><em>express</em></strong> <em>workflows and not</em> <strong><em>standard*</em></strong>.*</p>
<p>The state machine definition is a simple workflow as illustrated below:</p>
<p>it simply retrieves the prompt from s3 bucket , prepare format by injecting the conversation into it, and finally invoke Mistral model.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1710710403906/ef266378-c840-49f5-8fb9-5f0df1bb1277.png" alt class="image--center mx-auto" /></p>
<p>Looking at statemachin executions the input received will be as bellow.</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"input"</span>: {
    <span class="hljs-attr">"Id"</span>: <span class="hljs-string">"amzn1.echo-api.session.981ddf27-8725-438e-8022-eb6970bc769a"</span>,
    <span class="hljs-attr">"timestamp"</span>: <span class="hljs-string">"2024-03-17T18:11:13.796Z"</span>,
    <span class="hljs-attr">"message"</span>: <span class="hljs-string">"can I have a cup of coffee"</span>
  },
  <span class="hljs-attr">"inputDetails"</span>: {
    <span class="hljs-attr">"truncated"</span>: <span class="hljs-literal">false</span>
  },
  <span class="hljs-attr">"roleArn"</span>: <span class="hljs-string">"arn:aws:iam::11111111111111:role/ConversationStack-Convers-ConversationStateMachineR-cFO5YRWKRD91"</span>
}
</code></pre>
<h2 id="heading-testing">Testing</h2>
<p>At this level we have two ways of testing, 1st in the Alexa skill developer console, which is nice during the development phase, the 2nd will be testing with a real Alexa device.</p>
<p>You can publish the skill publicly this way anyone can test the skill from a device, there is a possibility to use a user ID or device ID as part of policy conditions in Lambda Role to refuse any unintended use. I'm sure no one will distribute the skill in proof phase publicly. Alexa has a really interesting functionality that lets you use your device to trigger the skill, the only condition is connecting to the device with the same email address as the developer console account, this way the device easily discovers the skill and lets you interact with the skill as a real user.</p>
<p><em>Alexa skills are localized, you need to select a language and local like English(US), For people like me located in other regions ( France for me ), the skill will show an error indicating that you can not interact with the skill. a workaround will be changing the Kindle account address on your local Amazon site ( exp.</em> <a target="_blank" href="http://amazon.fr"><em>amazon.fr</em></a> <em>), this will create an account on</em> <a target="_blank" href="http://amazon.com"><em>amazon.com</em></a> <em>and make it possible to test the skill. You can later replace your Kindle account on</em> <a target="_blank" href="http://amazon.com"><em>amazon.com</em></a> <em>with your local site (</em> <a target="_blank" href="http://amazon.fr"><em>amazon.fr</em></a> <em>)</em></p>
<p>The following video shows the result with real device.</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://youtube.com/shorts/yn-niMTI1GE?feature=share">https://youtube.com/shorts/yn-niMTI1GE?feature=share</a></div>
<p> </p>
<h2 id="heading-next-steps">Next steps</h2>
<p>As part of this article, due to the miss of clear documentation in AWS Docs there were difficulties in a<a target="_blank" href="https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/alexa-resource-ask-skill.html">utom</a>ating the skill deployment as part of CDK stack. The Alexa documentation refers to CloudFormation Docs (<a target="_blank" href="https://developer.amazon.com/en-US/docs/alexa/aws-tools/create-and-manage-skills-with-aws-tools.html#aws-cloudformation">here</a>) and CloudFormation Refers to Alexa Docs (<a target="_blank" href="https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/alexa-resource-ask-skill.html">here</a>). That is why this article uses Alexa skill kit cli to deploy in this article.</p>
<p>As part of the next step, I will find a way to deploy the skill via CDK, also som<a target="_blank" href="https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/alexa-resource-ask-skill.html">e de</a>mands can be based on</p>
<p>To use the CloudFormation AWS::ASK::Skill resource you need to provide the Client<a target="_blank" href="https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/alexa-resource-ask-skill.html">Id,</a> Secret, and Refresh token, apart from the required sensitive information that seems a bit insecure, we need to use the Alexa skill kit cli to retrieve the refresh token, the cli command follows the allowed origin and allowed redirect (as mentioned above) in Alexa Skill section which is not documented well.</p>
<p>Improving the prompt to take into account other situations like time in conversat<a target="_blank" href="https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/alexa-resource-ask-skill.html">ion,</a> an apple can be ok at 16:00 but not at 12:00 as it‘ ’s launch time.</p>
<p>A last part that I would like to try will be if the Alexa works with Response str<a target="_blank" href="https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/alexa-resource-ask-skill.html">eami</a>ng, so I’ll give it a try.</p>
]]></content:encoded></item><item><title><![CDATA[Experimenting Multiple triggers for Amazon SQS]]></title><description><![CDATA[AWS lambda is a core component of a wide range of software designs, as an advantage of lambda service, we can focus on its simple and efficient integration with other services in the AWS ecosystem. The Amazon SQS is an old-school member of this ecosy...]]></description><link>https://blogs.serverlessfolks.com/experimenting-multiple-triggers-for-amazon-sqs</link><guid isPermaLink="true">https://blogs.serverlessfolks.com/experimenting-multiple-triggers-for-amazon-sqs</guid><category><![CDATA[eventsourcemapping]]></category><category><![CDATA[lambda]]></category><category><![CDATA[SQS]]></category><dc:creator><![CDATA[Omid Eidivandi]]></dc:creator><pubDate>Tue, 30 Apr 2024 12:21:50 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/cDwZ40Lj9eo/upload/d13dcb6fd4240917608e4d87384dd4ea.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>AWS lambda is a core component of a wide range of software designs, as an advantage of lambda service, we can focus on its simple and efficient integration with other services in the AWS ecosystem. The Amazon SQS is an old-school member of this ecosystem that offers a highly scalable messaging queue service and integrates perfectly with AWS lambda.</p>
<h1 id="heading-queue-offering">Queue offering</h1>
<p>The queuing services are used to decouple the software systems by their ability to act as an asynchronous transitional service to send messages from one system to another and let the producers and consumers be decoupled and independent.</p>
<p>In a queueing system, the producers write the messages to the queue, and the consumers fetch the messages from the queue process, and the message gets removed when no longer needed. by default, the queuing services keep the message as long as there is no removal of demand from consumers.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1713897486701/43260c57-06f3-42da-b181-74c6be3afe69.png" alt class="image--center mx-auto" /></p>
<p>This design allows the producers and consumers to be independent and decoupled in terms of processing capacity, and manage better their internal state in an isolated way such as managing eventual failures, scaling, etc.</p>
<h1 id="heading-event-vs-job-consumer">Event vs Job Consumer</h1>
<p>Amazon SQS like any queuing service offers the basic functionality of message retrieval by consumers, this is often done by some sort of batch jobs or instances that periodically ask for messages. Any consumer can ask for removal at the end of processing. On the other hand, event-based consumption is more of a serverless concept that offers to send messages to the consumers per message availability, Amazon SQS being a serverless messaging service does not offer event-based message distribution unlike Amazon SNS and works with a job consumer approach.</p>
<h1 id="heading-source-code">Source Code</h1>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://github.com/XaaXaaX/aws-sqs-lambda-message-filtering">https://github.com/XaaXaaX/aws-sqs-lambda-message-filtering</a></div>
<p> </p>
<h1 id="heading-aws-lambda-sqs">AWS Lambda / SQS</h1>
<p>AWS lambda integrates with SQS and this integration offers smooth event-based consumption but this impression is by the excellent way that AWS lambda manages this integration. How it works behind the scenes is that the Lambda service asks for a batch of visible messages in the queue and will manage the consumption and message lifecycle on its own, for sure the principal management of messages inside the queue is under SQS ownership like visibility timeout, delays, etc.</p>
<p>The lambda poller receives the messages from SQS, giving the desired maximum number of messages and the maximum wait time to let the messages be gathered.</p>
<p>Consequently, in lambda event source mapping, the filtering will be applied followed by a process of batching to prepare the batch of records per function configuration before invoking the function.</p>
<h1 id="heading-filtering">Filtering</h1>
<p>As the following diagram illustrates, the filtering will be applied on the Lambda service side, and if the configured filtering can be applied the record will be considered to be passed to the lambda function. But in the case that the configured filtering can not be applied to the record what happens is that the lambda service discards the message to be processed but also considers it as a message to be deleted and it will remove that message from SQS.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1713900296181/d1f51a78-ea52-4666-b550-888f91c44c50.png" alt class="image--center mx-auto" /></p>
<p>If you are a fan of reading documentation like me you already know this, but I had a mental challenge to see how the SQS would behave if I used the SQS as a central queue in my system, and my reason was that in some central part of the system we can offload the events to multiple lambda consumers.</p>
<p>Logically the idea was that if I applied the filtering as each lambda listens to a different event each message could have a single consumer and this would not be against the recommendation and that helps me have central control of my events without having a monolith function to handle a significant amount of process or doing some sort of lambda based orchestration. The following diagram demonstrates the exact scenario.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1713906896852/86dbc5b8-fc6d-4d89-ba58-9d59cd47f454.png" alt class="image--center mx-auto" /></p>
<p>As part of my tests, I sent 3 different message with different payloads</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"isProductTransitEvent"</span>: <span class="hljs-string">"true"</span>,
  <span class="hljs-attr">"productId"</span>: <span class="hljs-string">"123456"</span>,
  <span class="hljs-attr">"deliveryId"</span>: <span class="hljs-string">"1"</span>
}
</code></pre>
<pre><code class="lang-json">{
  <span class="hljs-attr">"isProductSynchroEvent"</span>: <span class="hljs-string">"true"</span>,
  <span class="hljs-attr">"productId"</span>: <span class="hljs-string">"123456"</span>,
  <span class="hljs-attr">"lotId"</span>: <span class="hljs-string">"12345"</span>
}
</code></pre>
<pre><code class="lang-json">{
  <span class="hljs-attr">"isProductStockEvent"</span>: <span class="hljs-string">"true"</span>,
  <span class="hljs-attr">"productId"</span>: <span class="hljs-string">"123456"</span>,
  <span class="hljs-attr">"quantity"</span>: <span class="hljs-number">10</span>
}
</code></pre>
<p>Only the first function received the messages, but sending more messages for 3rd payload results in randomly receiving messages but not all.</p>
<p>Another hypothesis was sending a single payload with different values for filtered fields, this time none of the second and third functions received their messages per filtering. trying to send 20 messages resulted in the same behavior as we experienced previously.</p>
<p>Lambda integration with SQS using Event source mapping at a high level can be presented as illustrated in the following sequence diagram, and the discarded events will be treated the same as successful messages.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1713916197827/1aaa848b-86d5-486a-89f4-bcac4aa8b419.png" alt class="image--center mx-auto" /></p>
<p>The results ensure that the first trigger receives most of the time a large part of events, but what about if the records fail? what happens to the discarded events being part of the same batch of messages passed through the first trigger?</p>
<p>The answer is simple, In case of failure the discarded messages are deleted like in the success scenario, and the failed messages are retried, but the failed messages also after experiencing the error become visible after visibility timeout and will be reflected for another time, this leads potentially to loss the message in retry phase. as per the tests I did for this article, I never experienced a second retry for 10 messages.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1713919970374/265fbc19-9a80-4ef5-9750-046b1fa438fe.png" alt class="image--center mx-auto" /></p>
<p>Hope this can be useful ;)</p>
]]></content:encoded></item><item><title><![CDATA[Impersonation using AWS Congito]]></title><description><![CDATA[Security stands as a foundational element in software development, often taking center stage in architecture decisions and assessments. The approach to security, both in mindset and execution, can differ depending on factors like the intended usage s...]]></description><link>https://blogs.serverlessfolks.com/impersonation-using-aws-congito</link><guid isPermaLink="true">https://blogs.serverlessfolks.com/impersonation-using-aws-congito</guid><dc:creator><![CDATA[Omid Eidivandi]]></dc:creator><pubDate>Tue, 30 Apr 2024 10:12:43 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/fEXeyNYmO2Y/upload/2191aa827a719bbccb64b3d2f6e7b3c9.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Security stands as a foundational element in software development, often taking center stage in architecture decisions and assessments. The approach to security, both in mindset and execution, can differ depending on factors like the intended usage scenario and the specific layers of the system, whether being public-facing, private, isolated, or serving as gateways. Additionally, a significant challenge arises in ensuring security while navigating the complexities of data ownership in multi-tenant software serving diverse customers. Prioritizing data isolation emerges as a crucial compromise, essential for the efficacy of the software solution.</p>
<p>However, in straightforward software systems, the notion of security can be distinct into two phases: Authentication and Authorization. Authentication facilitates the identification of users seeking interaction with the available services. At the same time, Authorization determines granular access permissions, enabling the system to either permit or deny access to the resources owned by the user.</p>
<p>At a high level, this diagram illustrates the functioning of Authentication (AuthN) and Authorization (AuthZ)</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1714163380264/5c887408-2820-4345-a050-3345c5a9654b.png" alt class="image--center mx-auto" /></p>
<p>This diagram illustrates the public entry points to the internal ecosystem: the Web App, Mobile App, and Gateway. The initial interaction step involves authentication or signing in for users. Users can be real individuals or machines seeking access to services within the internal ecosystem. In the case of mobile or web apps, users typically sign in using account credentials, employing a username and password mechanism to obtain an access token. Gateways utilize credential grant standards such as authorization code, implicit, client credentials, or resource password (defined by <a target="_blank" href="https://datatracker.ietf.org/doc/html/rfc6749#page-8">IETF Authorization grant</a>).</p>
<p>The diagram below illustrates the flow of the Authorization Protocol Flow.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1714166494996/dd34ad18-2472-44c6-bf49-6d81a9ea0f54.png" alt class="image--center mx-auto" /></p>
<h1 id="heading-json-web-token-jwt">Json Web Token ( JWT )</h1>
<p>The JWT is often utilized as a security measure to fulfill security needs. However, for the sake of clarity, let's assume that '<strong>JWT is not inherently Secure'.</strong></p>
<p>The JWT consists of three main parts: the header, payload, and signature.</p>
<ul>
<li><p><strong>Header</strong>: Provides information about the token, including the algorithm used.</p>
</li>
<li><p><strong>Payload</strong>: Contains the claims, such as email, expiration time, and user unique identifier (sub).</p>
</li>
<li><p><strong>Signature</strong>: Utilizes a private key owned by the server to ensure the token's</p>
<p>  integrity and authenticity.</p>
</li>
</ul>
<h3 id="heading-jwt-attacks"><strong>JWT Attacks:</strong></h3>
<ul>
<li><p><strong>Signature Stripping:</strong> JWTs are signed and contain a signature as the third part of the token. Attackers attempt to recreate an unsigned token to gain unauthorized access. To counter this, the authorization server must validate the token signature as part of the validation process.</p>
</li>
<li><p><strong>CSRF (Cross-Site Request Forgery):</strong> Attackers obtain a signed-in user token and attempt to submit requests to the server from another site. To mitigate this attack, it is advisable to avoid using persisted tokens like cookies unnecessarily. If session persistence is necessary, using short-lived tokens can help. Additionally, incorporating extra meta information such as a unique header in requests, generated previously by the server, adds an extra layer of security.</p>
</li>
<li><p><strong>XSS (Cross-Site Scripting):</strong> This occurs when injected scripts reside in the browser and attempt to exploit and steal tokens from some legitimate storage, Often injecting using query strings or textboxes and sending requests using the user's logged-in session cookies or local storage. To mitigate XSS attacks, validating and sanitizing the received data on the server side is essential.</p>
</li>
</ul>
<p>Learn about JWT and all related types like JWK, JWE <a target="_blank" href="https://auth0.com/resources/ebooks/jwt-handbook/thankyou">here</a></p>
<h1 id="heading-oauth-and-oidc">OAuth and OIDC</h1>
<p>Historically, OAuth 1 was introduced to apply a security mechanism to give access to parties for using resources on behalf of the resource owner without credential sharing, OAuth2 replaces the first protocol version and tries to safeguard server-side resources, either on behalf of the owner by initiating an access approval workflow or allowing the third party software gain access on behalf of the owner. However, a significant issue with OAuth was the sole responsibility of authorization and access control being with the authorization server. This resulted in a lack of fine-grained control on the application side, as the access token did not furnish adequate information for software to validate resource access and ownership.</p>
<p>OIDC, serving as an extended layer, sought to address this deficiency by introducing the ID token. Generated by the server, the ID token provides software with additional user information and metadata, thereby enhancing control and insight into user identities.</p>
<p>This is just a small part of history, learn more in this ebook by Okta: <a target="_blank" href="https://auth0.com/resources/ebooks/oauth-openid-connect-professional-guide">here</a></p>
<h1 id="heading-amazon-cognito">Amazon Cognito</h1>
<p>Amazon Cognito stands out as an Identity and Access Management service offered by AWS, allowing you to incorporate a managed layer of security into your software. Amazon Cognito User Pools serve as the cornerstone for managing application-level security and adhering to the OAuth standard.</p>
<p>A user pool serves as a repository for application users (e.g., “<a target="_blank" href="mailto:eidivandi@live.com">eidivandi@live.com</a>” as a user), with pricing based on Monthly Active Users (MAU). Within a user pool, one or more App Clients can be established. An app client represents an isolated client integration, not only ensuring application isolation but also managing Authentication (AuthN) and Authorization (AuthZ) flow isolation.</p>
<p>Cognito also introduces the concept of triggers, where specific actions within Cognito can invoke custom Lambda code, enhancing its extensibility. The diagram below outlines key actions and their corresponding triggers for any given action.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1714169922390/312212e5-86ef-4ca7-96f0-6c1cb579b256.png" alt class="image--center mx-auto" /></p>
<h1 id="heading-what-we-gonna-build">What we gonna build</h1>
<p>In this article, our focus is on configuring an authorization server and implementing the impersonation feature to enable access to sub-accounts or customer resources for an already logged-in user. We'll walk through the setup process, following the structure outlined in the accompanying diagram.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1714469986243/e83c6c80-f689-4660-b1c8-d409ab1eaee5.png" alt class="image--center mx-auto" /></p>
<p>The User authentication will be based on Email/Password credentials, allowing establishing a user session, and the impersonation will allow access to multiple tenant resources. This example will delve into the Password authentication flow and Custom authentication flow, demonstrating how they can work together synergistically. Additionally, it aims to offer insights into understanding Custom Authentication flows within Amazon Cognito more effectively.</p>
<h1 id="heading-source-code">Source Code</h1>
<p>This article source code can be found on GitHub in the following link.</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://github.com/XaaXaaX/aws-cognito-impersonation">https://github.com/XaaXaaX/aws-cognito-impersonation</a></div>
<p> </p>
<h1 id="heading-user-pool-and-app-client">User Pool and App Client</h1>
<p>Establishing a user pool and app client is straightforward, especially with the assistance of AWS CDK. The advantage of leveraging Cognito user pools lies in their simplicity for standard authentication and authorization procedures.</p>
<p>Cognito user pools offer various authentication flows, including Password, SRP, Admin Password, and Custom. In this article, we will use both the password flow for the Sign-In process and custom flow for impersonation and accessing resources on behalf of a tenant.</p>
<p>Below is a snippet demonstrating the CDK code to create the User pool, along with both password and custom flow based user pool clients:</p>
<pre><code class="lang-typescript">
    <span class="hljs-built_in">this</span>.userPool = <span class="hljs-keyword">new</span> UserPool(<span class="hljs-built_in">this</span>, <span class="hljs-string">'MultiTenantUserPool'</span>, {
      removalPolicy: RemovalPolicy.DESTROY, 
      accountRecovery: AccountRecovery.NONE,
      email: UserPoolEmail.withCognito(<span class="hljs-string">'eidivandi@live.com'</span>),
      selfSignUpEnabled: <span class="hljs-literal">false</span>,
      signInAliases: { email: <span class="hljs-literal">true</span> },
      autoVerify: { email: <span class="hljs-literal">true</span> },
      lambdaTriggers:{
        defineAuthChallenge: props.triggers.defineAuthChallenge,
        preTokenGeneration: props.triggers.preTokenGeneration,
        verifyAuthChallengeResponse: props.triggers.verifyAuthChallengeResponse,
      },
      customAttributes: {
        tenants: <span class="hljs-keyword">new</span> StringAttribute({ mutable: <span class="hljs-literal">true</span> }),
      }
    });

    <span class="hljs-built_in">this</span>.passwordAuthClient =  <span class="hljs-keyword">new</span> UserPoolClient(<span class="hljs-built_in">this</span>, <span class="hljs-string">'MultiTenantPasswordAuthClient'</span>, {
      userPool: <span class="hljs-built_in">this</span>.userPool,
      generateSecret: <span class="hljs-literal">false</span>,
      idTokenValidity: Duration.minutes(<span class="hljs-number">5</span>),
      accessTokenValidity: Duration.minutes(<span class="hljs-number">5</span>),
      refreshTokenValidity: Duration.days(<span class="hljs-number">1</span>),
      authFlows: { userPassword: <span class="hljs-literal">true</span> }
    });

    <span class="hljs-built_in">this</span>.secureAuthClient = <span class="hljs-keyword">new</span> UserPoolClient(<span class="hljs-built_in">this</span>, <span class="hljs-string">'MultiTenantSecureAuthClient'</span>, {
      userPool: <span class="hljs-built_in">this</span>.userPool,
      generateSecret: <span class="hljs-literal">false</span>,
      accessTokenValidity: Duration.minutes(<span class="hljs-number">5</span>),
      refreshTokenValidity: Duration.hours(<span class="hljs-number">1</span>),
      idTokenValidity: Duration.minutes(<span class="hljs-number">5</span>),
      authFlows: { custom: <span class="hljs-literal">true</span> }
    });
</code></pre>
<p>The UserPool is a source of authentication process and the users source of trust. The two UserPoolClient are the isolated boundaries for the different required auth flows. Having two separate UserPoolClient is just a preference but a single app client can manage different auth flows.</p>
<pre><code class="lang-typescript">authFlows: { 
    userPassword: <span class="hljs-literal">true</span>,
    custom: <span class="hljs-literal">true</span>
}
</code></pre>
<h1 id="heading-sign-up-sign-in">Sign Up / Sign In</h1>
<p>The sign up is managed by a lambda function using AdminCreateUser command. The function creates a user by generating a temporary password and user attributes including some default and reserved ones but also a custom attribute presenting a list of allowed tenants this user can interact with ( use of custom attribute is for article simplicity and can be done using any other type of storage). The function also forces the verification of email, this is done just for the sake of simplicity and to avoid changing the password behind the first signin while testing.</p>
<pre><code class="lang-typescript"> <span class="hljs-keyword">const</span> UserAttributes = [
      { Name: <span class="hljs-string">'family_name'</span>, Value: lastName },
      { Name: <span class="hljs-string">'given_name'</span>, Value: firstName },
      { Name: <span class="hljs-string">'email'</span>, Value: email },
      { Name: <span class="hljs-string">'email_verified'</span>, Value: <span class="hljs-string">'true'</span> },
      { Name: <span class="hljs-string">'name'</span>, Value: <span class="hljs-string">`<span class="hljs-subst">${lastName}</span> <span class="hljs-subst">${firstName}</span>`</span> },
      { Name: <span class="hljs-string">'custom:tenants'</span>, Value: tenants?.join(<span class="hljs-string">','</span>) }
    ]

    <span class="hljs-keyword">const</span> adminCreateUserParams = {
      UserPoolId: process.env.COGNITO_USER_POOL_ID,
      Username: email,
      TemporaryPassword: generator.generate({
        length: <span class="hljs-number">10</span>,
        numbers: <span class="hljs-literal">true</span>,
        symbols: <span class="hljs-literal">true</span>,
        strict: <span class="hljs-literal">true</span>,
        exclude: <span class="hljs-string">'&amp;%#?+:/;'</span>,
      }),
      DesiredDeliveryMediums: [<span class="hljs-string">'EMAIL'</span>],
      UserAttributes,
      ClientMetadata: {
        step: <span class="hljs-string">'SignUp_CreateUser'</span>,
      }
    } satisfies AdminCreateUserCommandInput;
</code></pre>
<p>After signing up, an email will be sent with the temporary password, we use this email to signins, the signin function will use InitiateAuth command to authenticate and force the change password challenge by keeping the temporary password ( This step is just for demo and often in production the user will change the password, so the NEW-PASSWORD_REQUIRED challenge will be used to force the user to change the password )</p>
<pre><code class="lang-typescript"> <span class="hljs-keyword">const</span> signInParams: InitiateAuthCommandInput = {
      AuthFlow: AuthFlowType.USER_PASSWORD_AUTH,
      ClientId: process.env.COGNITO_USER_POOL_CLIENT_ID,
      AuthParameters: {
        USERNAME: username,
        PASSWORD: password,
      },
      ClientMetadata: {
        step: <span class="hljs-string">'Signin_InitAuth'</span>
      }
    };

<span class="hljs-keyword">const</span> signinResponse = <span class="hljs-keyword">await</span> client.send(<span class="hljs-keyword">new</span> InitiateAuthCommand(signInParams));
</code></pre>
<p>The InitiateAuth response has a session that must be used to trigger the challenge as following snippet demonstrates.</p>
<pre><code class="lang-typescript"> <span class="hljs-keyword">if</span>( signinResponse.ChallengeName === <span class="hljs-string">'NEW_PASSWORD_REQUIRED'</span> ) {
      <span class="hljs-keyword">const</span> challengeParams: RespondToAuthChallengeCommandInput = {
        ChallengeName: <span class="hljs-string">'NEW_PASSWORD_REQUIRED'</span>,
        ClientId: process.env.COGNITO_USER_POOL_CLIENT_ID,
        ChallengeResponses: {
          USERNAME: username,
          NEW_PASSWORD: password,
        },
        Session: signinResponse.Session,
        ClientMetadata: {
          step: <span class="hljs-string">'Signin_Respond_Challenge'</span>
        }
      };

      <span class="hljs-keyword">const</span> challengeResponse = <span class="hljs-keyword">await</span> client.send(<span class="hljs-keyword">new</span> RespondToAuthChallengeCommand(challengeParams));
      authenticationResult = challengeResponse.AuthenticationResult;
    }
</code></pre>
<p>The challenge result will include the IdToken, AccessToken and RefreshToken and session that will be returned as the response.</p>
<h1 id="heading-impersonation">Impersonation</h1>
<p>The impersonation step is a custom Auth process including to step process, <strong>InitiateAuth</strong> and <strong>RespondToAuthChallenge</strong> command. before looking at how that behaves it is important to see how this custom flow works.</p>
<p>The user pool has 3 lambda trigger:</p>
<ul>
<li><p>DefineAuthChallenge</p>
</li>
<li><p>VerifyAuthChallenge</p>
</li>
<li><p>PreTokenGeneration</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1714250910141/766e0ef1-6f2b-4822-8da0-bf0755ea421a.png" alt class="image--center mx-auto" /></p>
<p>Triggering the challenge can be done as per AWS Documentation <a target="_blank" href="https://docs.aws.amazon.com/cognito/latest/developerguide/user-pool-lambda-verify-auth-challenge-response.html">here</a> , when using a Custom auth flow, both the <strong>InitiateAuth</strong> and <strong>RespondToChallenge</strong> will trigger the <strong>DefineAuthChallenge</strong> with <strong>VerifyAuthChallengeResponse</strong> triggered between two invocations after the challenge verification passes and if the second <strong>DefineAuthChallenge</strong> invocation response indicates the generation of token the <strong>PreTokenGenration</strong> trigger will be invoked.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> initiateAuthCommandInput = <span class="hljs-keyword">new</span> InitiateAuthCommand({
      AuthFlow: AuthFlowType.CUSTOM_AUTH,
      ClientId: clientId,
      AuthParameters: {
        USERNAME: email,
      },
      ClientMetadata: {
        step: <span class="hljs-string">'Impersonation_InitAuth'</span>,
        tenant,
      }
    });
    <span class="hljs-keyword">const</span> response = <span class="hljs-keyword">await</span> client.send(initiateAuthCommandInput);

    <span class="hljs-keyword">const</span> challengeCommandInput = <span class="hljs-keyword">new</span> RespondToAuthChallengeCommand({
      ChallengeName: <span class="hljs-string">'CUSTOM_CHALLENGE'</span>,
      ClientId: clientId,
      ChallengeResponses: {
        USERNAME: response.ChallengeParameters?.USERNAME || <span class="hljs-string">''</span>,
        ANSWER: <span class="hljs-string">'impersonation'</span>,
      },
      ClientMetadata: {
        authFlow: <span class="hljs-string">'impersonation'</span>,
        step: <span class="hljs-string">'Impersonation_RespondToChallenge'</span>,
        tenant
      },
      Session: response.Session,
    });
    <span class="hljs-keyword">const</span> challengeResponses = <span class="hljs-keyword">await</span> client.send(challengeCommandInput);
</code></pre>
<h3 id="heading-initiate-auth-defineauthchallenge">Initiate Auth : DefineAuthChallenge</h3>
<p>In the first step the DefineAuthChallenge trigger will receive an invocation with an event payload including the user attributes and an empty array of sessions. the sessions array represents the previous steps in auth challenge, for this first invocation the array is empty.This first invocation is triggered by InitiateAuth command.</p>
<pre><code class="lang-json">{
    ...
    <span class="hljs-attr">"triggerSource"</span>: <span class="hljs-string">"DefineAuthChallenge_Authentication"</span>,
    <span class="hljs-attr">"request"</span>: {
        <span class="hljs-attr">"userAttributes"</span>: {
            <span class="hljs-attr">"sub"</span>: <span class="hljs-string">"e29524e4-f0a1-7018-7650-4bee0ca85f4b"</span>,
            <span class="hljs-attr">"email_verified"</span>: <span class="hljs-string">"true"</span>,
            <span class="hljs-attr">"cognito:user_status"</span>: <span class="hljs-string">"CONFIRMED"</span>,
            <span class="hljs-attr">"name"</span>: <span class="hljs-string">"Hills Samir"</span>,
            <span class="hljs-attr">"given_name"</span>: <span class="hljs-string">"Samir"</span>,
            <span class="hljs-attr">"custom:tenants"</span>: <span class="hljs-string">"CUS-01,CUS-02"</span>,
            <span class="hljs-attr">"family_name"</span>: <span class="hljs-string">"Hills"</span>,
            <span class="hljs-attr">"email"</span>: <span class="hljs-string">"eidivandi@live.com"</span>
        },
        <span class="hljs-attr">"session"</span>: []
   },
   ...
}
</code></pre>
<p>The Lambda Function will orchestrates this steps as below and change the response elements.</p>
<pre><code class="lang-typescript">  <span class="hljs-keyword">if</span> ( event.request.session.length === <span class="hljs-number">0</span> ) {
         event.response.issueTokens = <span class="hljs-literal">false</span>;
         event.response.failAuthentication = <span class="hljs-literal">false</span>;
         event.response.challengeName = <span class="hljs-string">'IMPERSONATE_CHALLENGE'</span>;
  } <span class="hljs-keyword">else</span> <span class="hljs-keyword">if</span> (...) {
      ....
  } <span class="hljs-keyword">else</span> {
    event.response.issueTokens = <span class="hljs-literal">false</span>;
    event.response.failAuthentication = <span class="hljs-literal">true</span>;
  }
  <span class="hljs-keyword">return</span> event;
</code></pre>
<h3 id="heading-respond-auth-challenge-verifyauthchallenge">Respond Auth Challenge: VerifyAuthChallenge</h3>
<p>The RespondToAuthChallenge Command will initiate the rest of flow, and as first step to verify the previously initiated challenge by invoking the VerifyAuthChallenge trigger, this is where the validation verifies if the <strong>ChallengeAnswer</strong> corresponds to the response provided. As part of the validation, the verification of tenant parameters with the user's custom tenants attributes validates if interacting with this tenant resources is authorized for this user.</p>
<pre><code class="lang-typescript"> <span class="hljs-keyword">if</span> (event.request.clientMetadata?.tenant &amp;&amp;
      event.request.userAttributes?.[<span class="hljs-string">'custom:tenants'</span>]?.split(<span class="hljs-string">','</span>)?.includes(event.request.clientMetadata?.tenant)
  ) {
      event.response.answerCorrect = event.request.challengeAnswer === <span class="hljs-string">'impersonation'</span>;
  }
  <span class="hljs-keyword">return</span> event;
</code></pre>
<h3 id="heading-respond-auth-challenge-defineauthchallenge">Respond Auth Challenge: DefineAuthChallenge</h3>
<p>In the next step the DefineAuthChallenge will be invoked for a second time but the event will be partially different, including more information about previous session initiated and some metadata.</p>
<pre><code class="lang-json">{
   ...
   <span class="hljs-attr">"triggerSource"</span>: <span class="hljs-string">"DefineAuthChallenge_Authentication"</span>,
   <span class="hljs-attr">"request"</span>: {
        <span class="hljs-attr">"userAttributes"</span>: {
            <span class="hljs-attr">"sub"</span>: <span class="hljs-string">"e29524e4-f0a1-7018-7650-4bee0ca85f4b"</span>,
            <span class="hljs-attr">"email_verified"</span>: <span class="hljs-string">"true"</span>,
            <span class="hljs-attr">"cognito:user_status"</span>: <span class="hljs-string">"CONFIRMED"</span>,
            <span class="hljs-attr">"name"</span>: <span class="hljs-string">"Hills Samir"</span>,
            <span class="hljs-attr">"given_name"</span>: <span class="hljs-string">"Samir"</span>,
            <span class="hljs-attr">"custom:tenants"</span>: <span class="hljs-string">"CUS-01,CUS-02"</span>,
            <span class="hljs-attr">"family_name"</span>: <span class="hljs-string">"Hills"</span>,
            <span class="hljs-attr">"email"</span>: <span class="hljs-string">"eidivandi@live.com"</span>
        },
        <span class="hljs-attr">"session"</span>: [
            {
               <span class="hljs-attr">"challengeName"</span>: <span class="hljs-string">"CUSTOM_CHALLENGE"</span>,
               <span class="hljs-attr">"challengeResult"</span>: <span class="hljs-literal">true</span>,
               <span class="hljs-attr">"challengeMetadata"</span>: <span class="hljs-literal">null</span>
            }
        ],
        <span class="hljs-attr">"clientMetadata"</span>: {
            <span class="hljs-attr">"step"</span>: <span class="hljs-string">"Impersonation_RespondToChallenge"</span>,
            <span class="hljs-attr">"authFlow"</span>: <span class="hljs-string">"impersonation"</span>,
            <span class="hljs-attr">"tenant"</span>: <span class="hljs-string">"CUS-01"</span>
        }
   },
   ...
}
</code></pre>
<p>Here the ClientMetadata is the only available option to pass the information between different triggers in the same challenge. this helps to share an ephemeral state between invocations. The function handler code will verify the session length and the challenge result has a truthy value.</p>
<pre><code class="lang-typescript">
  <span class="hljs-keyword">if</span> ( event.request.session.length === <span class="hljs-number">0</span> ) {
    ...
  } <span class="hljs-keyword">else</span> <span class="hljs-keyword">if</span> (
    event.request.session.length === <span class="hljs-number">1</span> &amp;&amp;
    event.request.session[<span class="hljs-number">0</span>].challengeResult === <span class="hljs-literal">true</span>
  ) {
       event.response.issueTokens = <span class="hljs-literal">true</span>;
       event.response.failAuthentication = <span class="hljs-literal">false</span>;
  } <span class="hljs-keyword">else</span> {
    ...
  }
  <span class="hljs-keyword">return</span> event;
</code></pre>
<p>As part of the event response, the <strong>issueTokens</strong> and <strong>failAuthentication</strong> indicate if a token can be generated or not, or even failing the authentication.</p>
<h3 id="heading-respond-auth-challenge-pretokengeneration">Respond Auth Challenge: PreTokenGeneration</h3>
<p>The next step will be the PreTokenGeneration to apply customization of token claims. The default token customization can be applied only to the idToken but advanced security options if activated will give the possibility of access token customization. ( Advanced Security Options applies extra cost )</p>
<pre><code class="lang-typescript">  <span class="hljs-keyword">let</span> tenant;
  <span class="hljs-keyword">if</span>( 
    event.triggerSource == <span class="hljs-string">'TokenGeneration_Authentication'</span> &amp;&amp;
    event.request.clientMetadata?.step == <span class="hljs-string">'Impersonation_RespondToChallenge'</span>
  ) {
    tenant = event.request.clientMetadata?.tenant;
  };

  event.response = {
    claimsOverrideDetails: {
      claimsToAddOrOverride: {
        tenant
      },
      claimsToSuppress: [],
      groupOverrideDetails: {
        groupsToOverride: [],
        iamRolesToOverride: [],
        preferredRole: <span class="hljs-string">""</span>,
      },
    }
  };
  <span class="hljs-keyword">return</span> event;
</code></pre>
<p>The function verifies the trigger source and custom metadata provided in impersonation handler and injects the tenant claim in the token.</p>
<h1 id="heading-authorizer">Authorizer</h1>
<p>As per the diagram shown in ‘<strong>What we gonna build</strong>’ section, the solution needs to have two layers of authorization. The first one is to validate the user sessions and authorize them to access protected endpoints, and the second one is to validate and authorize the impersonation token when trying to access the downstream services.</p>
<h3 id="heading-user-session">User Session</h3>
<p>The first Authorizer is a CognitoAuthorizer attached to protected endpoints like <strong>impersonate</strong> endpoint. This can be achieved using aws CDK as below</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> cognitoPasswordAuthorizer = <span class="hljs-keyword">new</span> HttpUserPoolAuthorizer(<span class="hljs-string">'CognitoPasswordUserPoolAuthorizer'</span>, props.cognito.userPool, {
   userPoolClients: [ props.cognito.passwordAuthClient ]
});

...

api.addRoutes({
    path: <span class="hljs-string">'/user/impersonate'</span>,
    methods: [ HttpMethod.POST ],
    integration: <span class="hljs-keyword">new</span> HttpLambdaIntegration(<span class="hljs-string">'ImpersonassionAuthFunctionIntegration'</span>, impersonassionAuthFunction),
    authorizer: cognitoPasswordAuthorizer
});
</code></pre>
<p>This authorizer will validate the signin token generated behind email and password authentication, to prevent unauthorized access to impersonation endpoint.</p>
<h3 id="heading-impersonation-1">Impersonation</h3>
<p>The impersonation authorier is a Lambda CustomAuthorizer integrating with ApiGateway and validates that the authorization token received is allowed to access the tenant resources. For this article's simplicity we use a single apigatway shared between downstream and authorization service but add a Lambda Authorizer attached at the route level for the downstream endpoint.</p>
<pre><code class="lang-typescript">
    <span class="hljs-keyword">const</span> customAuthorizer =  <span class="hljs-keyword">new</span> LambdaFunction(<span class="hljs-built_in">this</span>, <span class="hljs-string">'CustomAuthorizer'</span>, {
      entry: resolve(join(__dirname, <span class="hljs-string">'../../src/authorizer/handler.ts'</span>)),
      bundling: {
        banner: <span class="hljs-string">`import { createRequire } from 'module';const require = createRequire(import.meta.url);`</span>,
      },
      environment: {
        COGNITO_USER_POOL_CLIENT_ID: props.cognito.secureAuthClient.userPoolClientId,
        COGNITO_USER_POOL_ID: props.cognito.userPool.userPoolId,
        TABLE_NAME: props.table.tableName
      }
    });

    <span class="hljs-keyword">const</span> cognitoImpersonationAuthorizer = <span class="hljs-keyword">new</span> HttpLambdaAuthorizer(<span class="hljs-string">'CognitoImpersonationAuthorizer'</span>, customAuthorizer , {});
</code></pre>
<p>The authorizer validates the Token against cognito userpool and client app.</p>
<pre><code class="lang-typescript">    <span class="hljs-keyword">const</span> token = event.authorizationToken!.replace(<span class="hljs-string">'Bearer '</span>, <span class="hljs-string">''</span>);
    <span class="hljs-keyword">const</span> decodedToken = decodeAndGetToken(token);
    <span class="hljs-keyword">const</span> user = <span class="hljs-keyword">await</span> verifyToken(token, decodedToken.token_use, process.env.COGNITO_USER_POOL_CLIENT_ID!);
</code></pre>
<p>The <strong>verifyToken</strong> method uses the '<strong>aws-jwt-verify</strong>' to validate the token using cognito userpool client.</p>
<pre><code class="lang-typescript">  <span class="hljs-keyword">const</span> verifierParams = {
    userPoolId: process.env.COGNITO_USER_POOL_ID!,
    tokenUse: use <span class="hljs-keyword">as</span> CognitoVerifyProperties[<span class="hljs-string">'tokenUse'</span>],
    clientId: process.env.COGNITO_USER_POOL_CLIENT_ID!,
  };
  <span class="hljs-keyword">const</span> verifier = CognitoJwtVerifier.create(verifierParams);

  <span class="hljs-keyword">return</span> verifier.verify(token, verifierParams);
</code></pre>
<p>The authorizer returns a policy allowing or denying access to invoke api gateway</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">return</span> {
      principalId: <span class="hljs-string">'user'</span>,
      policyDocument: {
          Version: <span class="hljs-string">'2012-10-17'</span>,
          Statement: [
              {
                  Action: <span class="hljs-string">'execute-api:Invoke'</span>,
                  Effect: <span class="hljs-string">'Allow'</span>,
                  Resource: event.methodArn,
              },
          ],
      },
  };
</code></pre>
<p>The authorizer simply validates the sanity of the token and if it was originally owned by authorization server, but the fact of allowing that token to access tenant resources must be under the responsibility of downstream.</p>
<p>Let’s say, that <a target="_blank" href="mailto:omid@gmail.com">omid@gmail.com</a> signed in and asked to impersonate the tenant CUS-01 now the generated token will be validated even if the requested resource to downstream is owned by CUS-02, this is the responsibility of the downstream service to validate if the authorized token corresponds well the CUS-02 tenant or not.</p>
<p>In the following section, we will see this in practice and deploy the sample solution.</p>
<h1 id="heading-running-the-authorization-server">Running the authorization server</h1>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://github.com/XaaXaaX/aws-cognito-impersonation">https://github.com/XaaXaaX/aws-cognito-impersonation</a></div>
<p> </p>
<p>The source code provides a brief README to follow and deploy the solution, and a postman collection and environment are also provided to simplify the testing purpose.</p>
<p>To start and test the solution lets start by signing up, Amazon Cognito will send an email containing a temporary password.</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://youtu.be/uMwp44P1H_o">https://youtu.be/uMwp44P1H_o</a></div>
<p> </p>
<p>After account creation, an email will be received containing the temporary password</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1714344853050/7b48b951-e60c-468a-9676-59b551f19f99.png" alt class="image--center mx-auto" /></p>
<p>Using this password the sign-in endpoint allows to establish a session using the password auth flow. Passing the received <strong>IdToken</strong> from the <strong>sign-in</strong> step as an authorization header lets the cognito authorizer identify the session and validate the access to <strong>impersonate</strong> api route.</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://youtu.be/RCkxXoix3fg">https://youtu.be/RCkxXoix3fg</a></div>
<p> </p>
<p>The Impersonation token will be generated by custom flow and must be passed along the call to the downstream, the custom lambda authorizer will validate the generated token, the responsibility of validating the resource is by downstream service to check the asked resource and validate it against the authorizer context provided.</p>
<p>In this example the downstream verifies the requested resource against authorization token context.</p>
<pre><code class="lang-typescript">  <span class="hljs-keyword">try</span> {
    <span class="hljs-keyword">const</span> eventBody = <span class="hljs-built_in">JSON</span>.parse(event.body || <span class="hljs-string">'{}'</span>);
    <span class="hljs-keyword">if</span>( eventBody.tenant !== event.requestContext.authorizer?.lambda.tenant ) {
      <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> <span class="hljs-built_in">Error</span>(<span class="hljs-string">'Tenant does not match'</span>);
    }
    <span class="hljs-keyword">return</span> ActionResults.Success(eventBody);
  } <span class="hljs-keyword">catch</span> (e) {
    <span class="hljs-built_in">console</span>.error(e);
    <span class="hljs-keyword">return</span> ActionResults.InternalServerError({ message: e.message || e.name });
  }
</code></pre>
<h1 id="heading-conclusion">Conclusion</h1>
<p>Security being part of our daily software development is not always simple such as providing an Api key or a JWT token, and when it gets complicated, this is important to think about shared responsibility. Understanding the Security foundation like AuthN, AuthZ and different components such as Client, Authorization server, and Resource Server will help to better give responsibility and scopes to each part of the communication flow.</p>
<p>Applying standards is not always free of effort and needs a higher level of deep understanding so looking at some resources like IETF helps to achieve a deeper vision and perspective.</p>
<p>Here some resources:</p>
<ul>
<li><p>JWT (<strong>rfc7519</strong>) : <a target="_blank" href="https://datatracker.ietf.org/doc/html/rfc7519#page-9">https://datatracker.ietf.org/doc/html/rfc7519#page-9</a></p>
</li>
<li><p>OAuth2 (<strong>rfc6749</strong>) : <a target="_blank" href="https://datatracker.ietf.org/doc/html/rfc6749#page-7">https://datatracker.ietf.org/doc/html/rfc6749#page-7</a></p>
</li>
</ul>
<p>Hope this article will be useful.</p>
]]></content:encoded></item><item><title><![CDATA[Automating EventCatalog at Scale]]></title><description><![CDATA[EventCatalog became a preferred cataloging tool for event-driven architecture because of its simplicity and structured way of organizing the specifications at enterprise, but also by the extensibility provided using plugins. It works well in many ent...]]></description><link>https://blogs.serverlessfolks.com/automating-eventcatalog-at-scale</link><guid isPermaLink="true">https://blogs.serverlessfolks.com/automating-eventcatalog-at-scale</guid><category><![CDATA[event-driven-architecture]]></category><category><![CDATA[event-catalog]]></category><category><![CDATA[AsyncApi]]></category><category><![CDATA[Governance]]></category><dc:creator><![CDATA[Omid Eidivandi]]></dc:creator><pubDate>Mon, 15 Apr 2024 11:31:06 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/sCKtNbIKOuQ/upload/425b2e4d69c3238d158c9846a4e2107d.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>EventCatalog became a preferred cataloging tool for event-driven architecture because of its simplicity and structured way of organizing the specifications at enterprise, but also by the extensibility provided using plugins. It works well in many enterprises but has yet some improvement gaps to be applied and the community is working actively on new ideas and enhancements to bring more simplicity at any scale.</p>
<p>I have been using EventCatalog for 3 years now, but this time I tried to use the EventCatalog to simplify the service documentation transparently and automatically, and I had some deep thinking moments about how I can make it as suitable as it can be based on my own needs.</p>
<p>Here, the way i would like to manage the cataloging is as a central engine as illustrated below.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1713119599312/a9806236-265e-4e0c-969e-64510a2213c3.png" alt class="image--center mx-auto" /></p>
<p>However, the way that EventCatalog was designed was about a central source of trust, and this was not the way I was looking for a cataloging solution as I have preferred always to keep the documentation as part of the service source control system and in the same repository service resides. This is the operational model of EventCatalog and seems OK but the enterprise ecosystem often has other constraints during adoption.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1713119961448/1c8b0e63-7b2d-4926-9393-9bd1dd647e4d.png" alt class="image--center mx-auto" /></p>
<p>The actual operational model of the EventCatalog seems enough standard but I need to consider also the Software development practices. so I decided to stop waiting and make something operational based on our practices and using EventCatalog.</p>
<h1 id="heading-asyncapi-extension">AsyncApi Extension</h1>
<p>First, I tried to jump for a PR on <a target="_blank" href="https://github.com/boyney123/eventcatalog/tree/master/packages/eventcatalog-plugin-generator-asyncapi">EventCatalog AsyncApi Extension</a> Github, but I just had the solution open for 1 week without finding time to start (As I am often in meetings ). EventCatalog AsyncApi plugin works on a generator concept in EventCatalog by letting have some simple config in the default <code>js</code> config file under <code>eventcatalog.config.js</code> name.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> path = <span class="hljs-built_in">require</span>(<span class="hljs-string">'path'</span>);

<span class="hljs-built_in">module</span>.<span class="hljs-built_in">exports</span> = {
  title: <span class="hljs-string">'EventCatalog'</span>,
  tagline: <span class="hljs-string">'Discover, Explore and Document your Event Driven Architectures'</span>,
  organizationName: <span class="hljs-string">'Your Company'</span>,
  projectName: <span class="hljs-string">'Event Catalog'</span>,
  ...
  generators: [
    [
      <span class="hljs-string">'@eventcatalog/plugin-doc-generator-asyncapi'</span>,
      {
        pathToSpec: [
          path.join(__dirname, <span class="hljs-string">'../specs/Order/placement/1.0.0'</span>, <span class="hljs-string">'asyncapi.yaml'</span>),
          path.join(__dirname, <span class="hljs-string">'../specs/Order/Shipment/1.0.0'</span>, <span class="hljs-string">'asyncapi.yaml'</span>)
        ],
        versionEvents: <span class="hljs-literal">false</span>,
        renderNodeGraph: <span class="hljs-literal">true</span>,
        renderMermaidDiagram: <span class="hljs-literal">true</span>,
        domainName: <span class="hljs-string">'Order'</span>
      },
    ],
    [
      <span class="hljs-string">'@eventcatalog/plugin-doc-generator-asyncapi'</span>,
      {
        pathToSpec: [
          path.join(__dirname, <span class="hljs-string">'../specs/Product/stock/1.0.0'</span>, <span class="hljs-string">'asyncapi.yaml'</span>)
        ],
        versionEvents: <span class="hljs-literal">false</span>,
        renderNodeGraph: <span class="hljs-literal">true</span>,
        renderMermaidDiagram: <span class="hljs-literal">true</span>,
        domainName: <span class="hljs-string">'Product'</span>
      },
    ],
  ]
}
</code></pre>
<p>Having the generators, the <code>npm run generate</code> command will read all specifications and generate the respective Domain, Events, and Services under <code>domains</code> folder.</p>
<h1 id="heading-complexity">Complexity</h1>
<p>We can start asking any service owner team to do a Pull Request when new services are created or changes are necessary but this approach is against our learnings from the past when dealing with software that was managed by multiple teams, in the best case, this can work at some scale, but can become frustrating in the worst case.</p>
<p>Another challenge I encountered was, how big the EventCatalog config file can become when there are already hundreds of services in around 10 domains without counting internal modules communicating in an event-driven way, and this will be just a nightmare to live with.</p>
<h1 id="heading-listing-desired-state">Listing Desired State</h1>
<p>Finally with all the blocking points and having a list of possibilities that the actual state of EventCatalog does not cover there was no starting point without listing what I need.</p>
<ul>
<li><p>Specification under the service ownership.</p>
</li>
<li><p>No effort more than having updated specifications in the service repository.</p>
</li>
<li><p>Each service is responsible for formatting and validating specifications.</p>
</li>
<li><p>Specifications must follow the governance conventions.</p>
</li>
<li><p>A catalog to represent all events and service specifications.</p>
</li>
<li><p>The catalog must be in sync with the real service specification.</p>
</li>
<li><p>The catalog integration must be automated and autonomously.</p>
</li>
<li><p>Catalog must be reproduced rapidly.</p>
</li>
</ul>
<h1 id="heading-catalog-design">Catalog Design</h1>
<p>To define the catalog design, the considerations were as below</p>
<ul>
<li><p>For simplicity Static and bundled</p>
</li>
<li><p>Respond to changes based on events</p>
</li>
<li><p>Decoupled and without adding hard dependencies</p>
</li>
<li><p>Autonomous</p>
</li>
</ul>
<p>The final design is represented in the following diagram</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1713131664386/eaac09ae-91f5-4fa5-809b-ac4196024926.png" alt class="image--center mx-auto" /></p>
<h1 id="heading-source-code">Source Code</h1>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://github.com/XaaXaaX/eventcatalog-automation">https://github.com/XaaXaaX/eventcatalog-automation</a></div>
<p> </p>
<h1 id="heading-build-configuration">Build Configuration</h1>
<p>It was simple to build the configuration dynamically by just adding a <code>nodejs</code> script as explained below.</p>
<p>The source code is available <a target="_blank" href="https://github.com/XaaXaaX/eventcatalog-automation">here</a> on GitHub</p>
<p>The config generate is a simple script that does the following steps:</p>
<ul>
<li><p>Get all yaml files in specs folder</p>
</li>
<li><p>Feed the generators array by creating a generator element per specification</p>
</li>
<li><p>Merge the default config and generators</p>
</li>
<li><p>Write the results and rewrite the <code>eventcatalog.config.js</code> if presents</p>
</li>
</ul>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> baseConfig = { ... }
<span class="hljs-keyword">const</span> generatorDefaultConfig = { ... }
<span class="hljs-keyword">const</span> createGenerators = <span class="hljs-function">(<span class="hljs-params">specsFolder = <span class="hljs-string">'specs'</span></span>) =&gt;</span> {
  <span class="hljs-keyword">const</span> schemas = getDirectories(specsFolder)?.
    sort(<span class="hljs-function">(<span class="hljs-params">a, b</span>) =&gt;</span> b.localeCompare(a) ).
    reverse().
    filter(<span class="hljs-function">(<span class="hljs-params">fileName</span>) =&gt;</span> fileName.includes(<span class="hljs-string">'.yaml'</span>));

  <span class="hljs-keyword">if</span> (!schemas) <span class="hljs-keyword">return</span> [];

  <span class="hljs-keyword">let</span> asyncApiGenerators = [];
  schemas.map(<span class="hljs-function">(<span class="hljs-params">schemaName</span>) =&gt;</span> {
    asyncApiGenerators.push([
      <span class="hljs-string">'@eventcatalog/plugin-doc-generator-asyncapi'</span>,
      {
        ...generatorDefaultConfig,
        domainName: schemaName.split(<span class="hljs-string">'/'</span>)[<span class="hljs-number">2</span>],
        pathToSpec: [ path.join(__dirname, <span class="hljs-string">`<span class="hljs-subst">${schemaName}</span>`</span>) ]
      },
    ]);
  });

  <span class="hljs-keyword">return</span> asyncApiGenerators
}

<span class="hljs-keyword">const</span> generators = createGenerators(<span class="hljs-string">'../specs'</span>);

fs.writeFileSync(<span class="hljs-string">'./eventcatalog.config.js'</span>, 
  <span class="hljs-string">`module.exports = <span class="hljs-subst">${<span class="hljs-built_in">JSON</span>.stringify({
    ...defaultConfig,
    generators
  }</span>, null, 2)}`</span>, <span class="hljs-string">'utf8'</span>);
</code></pre>
<p>Running the script can be done by adding a new script in <code>package.json</code> that runs the <code>node ./config.generator.js</code>. The package json scripts section will be represented as below.</p>
<pre><code class="lang-json"> {
  ...
  <span class="hljs-attr">"scripts"</span>: {
    <span class="hljs-attr">"start"</span>: <span class="hljs-string">"eventcatalog start"</span>,
    <span class="hljs-attr">"dev"</span>: <span class="hljs-string">"eventcatalog dev"</span>,
    <span class="hljs-attr">"build"</span>: <span class="hljs-string">"eventcatalog build"</span>,
    <span class="hljs-attr">"pregenerate"</span>: <span class="hljs-string">"node ./config.generator.js"</span>,
    <span class="hljs-attr">"generate"</span>: <span class="hljs-string">"eventcatalog generate"</span>
  },
</code></pre>
<p>By prefixing the script name by <code>pre</code> the npm will run automatically the <code>pregenerate</code> script when we run <code>npm run generate</code>, this is handy to keep the cicd scripts un-touched and extending the capabilities.</p>
<h1 id="heading-specifications">Specifications</h1>
<p>About Specifications there was two considerations, first each service takes ownership of its documentation and specification, second the catalog must be autonomous. by these facts i ended up 3 correlated notes.</p>
<ul>
<li><p>EventCatalog project available options be used, so the project will continue to use a local based spec source for generation to avoid any complicated personalisation in catalog project repository or in CICD pipeline.</p>
<pre><code class="lang-yaml">  <span class="hljs-attr">version:</span> <span class="hljs-number">0.2</span>
  <span class="hljs-attr">env:</span>
    <span class="hljs-attr">parameter-store:</span>
      <span class="hljs-attr">SPEC_BUCKET_NAME:</span> <span class="hljs-string">/eventcatalog/bucket/specs/name</span>
  <span class="hljs-attr">phases:</span>
    <span class="hljs-attr">install:</span>
      <span class="hljs-attr">commands:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-string">echo</span> <span class="hljs-string">Installing</span> <span class="hljs-string">dependencies...</span>
        <span class="hljs-bullet">-</span> <span class="hljs-string">npm</span> <span class="hljs-string">cache</span> <span class="hljs-string">clean</span> <span class="hljs-string">--force</span>
        <span class="hljs-bullet">-</span> <span class="hljs-string">cd</span> <span class="hljs-string">catalog</span> <span class="hljs-string">&amp;&amp;</span> <span class="hljs-string">npm</span> <span class="hljs-string">install</span> <span class="hljs-string">--froce</span> <span class="hljs-string">&amp;&amp;</span> <span class="hljs-string">cd</span> <span class="hljs-string">..</span>
    <span class="hljs-attr">pre_build:</span>
      <span class="hljs-attr">commands:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-string">echo</span> <span class="hljs-string">"Pre build command"</span>
        <span class="hljs-bullet">-</span> <span class="hljs-string">rm</span> <span class="hljs-string">-rf</span> <span class="hljs-string">specs</span>
        <span class="hljs-bullet">-</span> <span class="hljs-string">mkdir</span> <span class="hljs-string">specs</span>
        <span class="hljs-bullet">-</span> <span class="hljs-string">aws</span> <span class="hljs-string">s3</span> <span class="hljs-string">sync</span> <span class="hljs-string">s3://$SPEC_BUCKET_NAME/</span> <span class="hljs-string">specs</span>
    <span class="hljs-attr">build:</span>
      <span class="hljs-attr">commands:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-string">cd</span> <span class="hljs-string">catalog</span>
        <span class="hljs-bullet">-</span> <span class="hljs-string">npm</span> <span class="hljs-string">run</span> <span class="hljs-string">generate</span>
        <span class="hljs-bullet">-</span> <span class="hljs-string">npm</span> <span class="hljs-string">run</span> <span class="hljs-string">build</span>
  <span class="hljs-attr">artifacts:</span>
    <span class="hljs-attr">files:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">'**/*'</span>
    <span class="hljs-attr">base-directory:</span> <span class="hljs-string">catalog/out</span>
</code></pre>
</li>
<li><p>The Local source of catalog must be updated and synced with all services by in a decoupled manner. using a GitHub action to sync any repository with catalog s3 on a change in the specs folder.</p>
<pre><code class="lang-yaml">  <span class="hljs-attr">name:</span> <span class="hljs-string">AsyncApi</span> <span class="hljs-string">Spec</span> <span class="hljs-string">Sync</span>
  <span class="hljs-attr">on:</span>
    <span class="hljs-attr">push:</span>
      <span class="hljs-attr">branches:</span> [ <span class="hljs-string">main</span> ]
      <span class="hljs-attr">paths:</span> 
        <span class="hljs-bullet">-</span> <span class="hljs-string">'specs/**'</span>
  <span class="hljs-attr">env:</span>
    <span class="hljs-attr">AWS_REGION:</span> <span class="hljs-string">eu-west-1</span>
  <span class="hljs-attr">jobs:</span>
    <span class="hljs-attr">sync-spec-to-s3:</span>
      <span class="hljs-attr">runs-on:</span> <span class="hljs-string">ubuntu-latest</span>
      <span class="hljs-attr">steps:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Checkout</span>
        <span class="hljs-attr">uses:</span> <span class="hljs-string">actions/checkout@v4</span>
        <span class="hljs-attr">with:</span>
          <span class="hljs-attr">fetch-depth:</span> <span class="hljs-number">0</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Configure</span> <span class="hljs-string">AWS</span> <span class="hljs-string">Credentials</span>
        <span class="hljs-attr">uses:</span> <span class="hljs-string">aws-actions/configure-aws-credentials@master</span>
        <span class="hljs-attr">with:</span>
          <span class="hljs-attr">aws-access-key-id:</span> <span class="hljs-string">${{</span> <span class="hljs-string">secrets.AWS_ACCESS_KEY_ID</span> <span class="hljs-string">}}</span>
          <span class="hljs-attr">aws-secret-access-key:</span> <span class="hljs-string">${{</span> <span class="hljs-string">secrets.AWS_SECRET_ACCESS_KEY</span> <span class="hljs-string">}}</span>
          <span class="hljs-attr">aws-region:</span> <span class="hljs-string">${{</span> <span class="hljs-string">env.AWS_REGION</span> <span class="hljs-string">}}</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Sync</span> <span class="hljs-string">spec</span> <span class="hljs-string">to</span> <span class="hljs-string">S3</span>
        <span class="hljs-attr">run:</span> <span class="hljs-string">|</span>
          <span class="hljs-string">aws</span> <span class="hljs-string">s3</span> <span class="hljs-string">sync</span> <span class="hljs-string">./spec</span> <span class="hljs-string">s3://$(aws</span> <span class="hljs-string">ssm</span> <span class="hljs-string">get-parameter</span> <span class="hljs-string">--name</span> <span class="hljs-string">"/eventcatalog/bucket/specs/name"</span> <span class="hljs-string">|</span> <span class="hljs-string">jq</span> <span class="hljs-string">-r</span> <span class="hljs-string">'.Parameter.Value'</span><span class="hljs-string">)/</span>
</code></pre>
</li>
<li><p>Each Service register documentation in a conventional way, being a root level specs folder, subfolder named by domain, and includes a folder representing the version inside domain.</p>
<p>  <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1713132222866/c321e8e0-503c-4301-bc0b-ad5664ea3976.png" alt class="image--center mx-auto" /></p>
</li>
</ul>
<h1 id="heading-service-integration">Service integration</h1>
<p>it will be simple by copying and pasting the workflow in all services, and after any changes in specs folder the specs folder will be synced by catalog specs s3 bucket,</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1713133113802/b0893a97-d2db-48bb-9170-1bfbb63e3d76.png" alt class="image--center mx-auto" /></p>
<p>The S3 changes will trigger the CodePipeline using EventBridge default bus. the pipeline will process the catalog and spec sync, the following diagram demonstrates the actual code pipeline workflow</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1713133538032/cd1ae6f4-1a87-489b-8456-46d41131aec7.png" alt class="image--center mx-auto" /></p>
<h1 id="heading-conclusion">Conclusion</h1>
<p>This approach was one of the fastest and simpler approach to have a result with no complexity. but again i m looking to tackle some pain points to make EventCatalog more suitable.</p>
<p>I could add a wrapper around EventCatalog but i found this approach toward a wrong direction, the reason is that leads to extra complexity for a single place usage in the whole enterprise and hides a lot of intention of EventCatalog existence.</p>
<p>The last point for me to tackle is how to automate the Markdown consumers section fill up. as this approach focuses on documentation and specifications the gap of consumers was not the priority on my side but this is a thing i actively think of to achieve system landscapes from EventCatalog.</p>
<p>Enjoy Reading</p>
]]></content:encoded></item><item><title><![CDATA[The Meaningfulness of Events via Standardization ( Part 5 )]]></title><description><![CDATA[This is part 5 of the 'Meaningfulness of Events via Standardization' series. In this part, we cover how we can bring the ease of adoption of a standard and remove the burden of adoption in development teams.
Following the establishment of a standardi...]]></description><link>https://blogs.serverlessfolks.com/the-meaningfulness-of-events-via-standardization-part-5</link><guid isPermaLink="true">https://blogs.serverlessfolks.com/the-meaningfulness-of-events-via-standardization-part-5</guid><category><![CDATA[Governance]]></category><category><![CDATA[event-driven-architecture]]></category><category><![CDATA[event catering]]></category><category><![CDATA[cloudevents]]></category><category><![CDATA[AWS EventBridge]]></category><category><![CDATA[software-catalog]]></category><dc:creator><![CDATA[Omid Eidivandi]]></dc:creator><pubDate>Thu, 11 Apr 2024 23:26:43 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1712877758135/a2e89601-2cc6-48a1-bb22-9d676013def1.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This is part 5 of the '<strong>Meaningfulness of Events via Standardization'</strong> series. In this part, we cover how we can bring the ease of adoption of a standard and remove the burden of adoption in development teams.</p>
<p>Following the establishment of a standardization process incorporating elements such as Documentation, Versioning, Filtering, and Event Envelopes as detailed in the preceding articles, the next step is to implement these standards. This phase poses the greatest challenge, as effective collaboration and communication among software stakeholders play a crucial role in successful adoption.</p>
<h1 id="heading-governance">Governance</h1>
<p>Governance in IT has different definitions based on company culture, size, and the value it brings. For this article, Governance is defined as a means to identify the system's complexity, observe system behavior, and simplify the decision-making process.</p>
<p>When dealing with software, Often the source of knowledge is the development team per their detailed vision of implemented business complexity, software communication process, and potential risks behind any change. This is ok on its own but putting this knowledge in a local place being a team and adding some dependencies to people can become an obstacle when speed becomes an important pillar of success.</p>
<h1 id="heading-software-vs-people-communication">Software vs People Communication</h1>
<p>Back in 2005-2010, Often software was designed using a monolithic approach, where all complexity was localized in a single place. That approach suffered from software complexity, availability, and scaling problems, But there was less network and communication burden being Software or People.</p>
<p>Distributed systems are a solution but in reality, they resolve monolithic suffering points, but adding new problems by distributing software and knowledge into multiple locations and adding a higher level of communication and alignment problems.</p>
<p>EDA is a distributed communication pattern applied at communication boundaries and resolves some distributed system problems by decoupling. But add some level of difficulties over the traditional distributed systems, by reversing the dependency direction. so any degradation will be identified at the subsequence layers behind producer service and not in preceding layers and this makes things harder to be identified.</p>
<h1 id="heading-decisions">Decisions</h1>
<p>In a competitive business where competitors move by the fastest change rate to approve ideas and bring unique problem-solving ideas into the products, being able to make faster, cleaner, and more accurate decisions about a problem is important.</p>
<p>To this goal relying on data and having enough information become crucial to achieve rapidity and clarity while making decisions. Having detailed data is hard but an approach to deal with when the business needs to move at scale.</p>
<h1 id="heading-event-cataloging">Event Cataloging</h1>
<p>One of the useful approaches for observing the internal state of a system is cataloging, in the cataloging approach we gather information about the overall system communication like Documentation, Schemes, Versions, and Consumption.</p>
<p>A Catalog must offer a result set to cover the following details</p>
<ul>
<li><p>Producers - Who does introduce that change?</p>
</li>
<li><p>Consumers - Who are the interested actors behind that change?</p>
</li>
<li><p>Event Models - What are the event models transiting?</p>
</li>
<li><p>Event Versions - What are the active and outdated event versions?</p>
</li>
<li><p>Service Specifications - What does each producer offer to consumers?</p>
</li>
<li><p>System Communication: How do the systems communicate together</p>
</li>
</ul>
<h1 id="heading-cloudevents">Cloudevents</h1>
<p>Cloud events is a new specification for event-driven design approach and helps to put some standards over how the event model must be done. Cloudevents bring the separation of context and data but keep them correlated.</p>
<h2 id="heading-event-definition">Event definition</h2>
<p>The Cloudevents specification describes an event as:</p>
<blockquote>
<p><em>An "<strong><strong>event</strong></strong>" is a</em> <strong><em>data record</em></strong> <em>expressing an</em> <strong><em>occurrence</em></strong> <em>and its</em> <strong><em>context</em></strong>.</p>
<p><em>Events are</em> <strong><em>routed</em></strong> <em>from an event</em> <strong><em>producer</em></strong>(the source) to interested event <strong><em>consumers</em></strong>.</p>
<p><em>The</em> <strong><em>routing</em></strong> <em>can be performed based on</em> <strong><em>information</em></strong> <em>contained in the</em> <strong><em>event</em></strong>, but an event will not identify a specific routing destination.*</p>
<p><em>Events will contain two types of information: the</em> <strong><em>Event Data</em></strong> <em>representing the</em> <strong><em>Occurrence</em></strong> <em>and</em> <strong><em>Context</em></strong> <em>metadata providing contextual information</em> <strong><em>about the Occurrence</em></strong>.</p>
</blockquote>
<p>The takeaways from the above description are :</p>
<ul>
<li><p>Events express occurrence and context.</p>
</li>
<li><p>Events are routed so an event transits along many communication hopes before reaching the consumer.</p>
</li>
<li><p>The routing is done based on event information, so the event must provide enough information to simplify routing and event transition.</p>
</li>
<li><p>The Context includes some information about the occurrence, so the context represents even-related information.</p>
</li>
</ul>
<h2 id="heading-event-transition">Event Transition</h2>
<p>As the event-driven architecture is a distributed communication pattern, Event transition is one important point to take into consideration. An event can be transited from any starting point and be consumed in one or more places. The event will be routed along many network or software hops, like a broker, enrichment process or aggregation process.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1712425441593/d2dd07cb-414a-49b9-b9b2-4edacee73202.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-event-occurrence">Event Occurrence</h2>
<p>An occurrence is a definition of '<strong>What is the change in a process</strong>', it is important to well define the occurrence and the principal events in a context but keeping separation between internal state changes and external ones helps to define better and guarantee the distributed communication quality.</p>
<p>Keeping the occurrence internal state helps reduce the distributed communication complexity, by abstracting the internal process from the external process.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1712427192129/1937286f-84eb-43ac-897f-9e4ceeeeeff2.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-event-envelope">Event Envelope</h2>
<p>Looking at a simple envelope, there are two levels of information, a letter inside the envelope, representing the detailed intentional information and some information presented on the envelope like an Identifier, Confidentiality stamp, Sender, and Date.</p>
<p><img src="https://lh7-us.googleusercontent.com/o98OQB4b3WvJCBmvF-nSFr6bFNlhgi5VnyGJGkTkeUIEhpYJWxLKeblIW__68XpNWRDfagZrqvmkb_lkNaAMMnsdDb2H8jkFg084j6ShCmjzQSivnQYt3nrsxH9qO1WUxwk1ptrcXOiYCEbXNLNDEyTKYQ=s2048" alt /></p>
<p>The internal letter will be interesting when the envelope arrives at its destination and will be in hand. but the external information is useful for tracking, distributing, and routing.</p>
<h2 id="heading-cloudevents-design-goal">Cloudevents Design Goal</h2>
<p>Cloudevents specification represents some standards and principles around the event-driven architecture but the initiative like all other standards is to tackle some real problems.</p>
<blockquote>
<p>The goal of the CloudEvents specification is to define the interoperability of event systems that allow services to produce or consume events.</p>
<p>CloudEvents are typically used in a distributed system to allow for services to be loosely coupled during development, deployed independently, and later can be connected to create new applications.</p>
</blockquote>
<p>The above is from the cloudevents <a target="_blank" href="https://github.com/cloudevents/spec/blob/main/cloudevents/primer.md#design-goals">primer design goal documentation</a>, the primer focuses on the following considerations:</p>
<ul>
<li><p>Protocol and Channel agnostic</p>
</li>
<li><p>Extensibility</p>
</li>
<li><p>On top of standards</p>
</li>
</ul>
<h3 id="heading-channel-protocol-agnostic">Channel / Protocol Agnostic</h3>
<p>Cloudevents standard relies on the principle of being protocol and channel-agnostic, the specification understands well the presence of different protocols and communication channels in real systems and provides some guidelines to approach them.</p>
<p>The Specification provides guidelines for the following protocols:</p>
<ul>
<li><p>AMQP</p>
</li>
<li><p>MQTT</p>
</li>
<li><p>NATS</p>
</li>
<li><p>Websockets</p>
</li>
<li><p>HTTP</p>
</li>
</ul>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://github.com/cloudevents/spec/tree/main/cloudevents/bindings">https://github.com/cloudevents/spec/tree/main/cloudevents/bindings</a></div>
<p> </p>
<p>Also, The specification represents some guidelines related to channels to address the adoption of a standard on tom of channel presence. Cloudevents introduces the concept of adapters and provides some practices to overcome channels as a mean of event distribution.</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://github.com/cloudevents/spec/tree/main/cloudevents/adapters">https://github.com/cloudevents/spec/tree/main/cloudevents/adapters</a></div>
<p> </p>
<h3 id="heading-extensibility">Extensibility</h3>
<p>Being a specification related to a distributed event-driven architecture, Cloudevents introduces extensions to overcome communication and process problems by using extensions as a standard way of extensibility on top of Cloudevents.</p>
<p>An extension is a set of context-level attributes that help the adoption of standards or practices to tackle an event-driven design problem. extension must follow the primary data types provided by Cloudevents.</p>
<p>Cloudevents supported types are:</p>
<ul>
<li><p>Binary: Sequence of Bytes <a target="_blank" href="https://tools.ietf.org/html/rfc4648">RFC4648.</a></p>
</li>
<li><p>Integer: Signed 32bit numeric range between -2,147,483,648 to +2,147,483,647 <a target="_blank" href="https://tools.ietf.org/html/rfc7159#section-6">RFC 7159, Section 6</a></p>
</li>
<li><p>String: Sequence of allowable Unicode characters</p>
</li>
<li><p>Boolean: True or False</p>
</li>
<li><p>Timestamp: <a target="_blank" href="https://tools.ietf.org/html/rfc3339">RFC 3339.</a></p>
</li>
<li><p>URI: Absolute URI <a target="_blank" href="https://tools.ietf.org/html/rfc3986#section-4.3">RFC 3986 Section 4.3.</a></p>
</li>
<li><p>URI-Reference: Relative URI <a target="_blank" href="https://tools.ietf.org/html/rfc3986#section-4.1">RFC 3986 Section 4.1</a>.</p>
</li>
</ul>
<p>Cloudevents available extensions are:</p>
<ul>
<li><p>Distributed Tracing: The extension goes on top of Open Tracing standard.</p>
</li>
<li><p>Expirytime: solves the event validity problem</p>
</li>
<li><p>Sequence: solves the event ordering problem</p>
</li>
<li><p>Partitioning: Solves the scaling problem by adding the related partition key to events to help brokers and consumers better identify events.</p>
</li>
<li><p>Dataref: Solves the problem of large event payload by producing the payload file/storage location as part of a smaller event.</p>
</li>
<li><p>Authcontext: solves the problem of identifying the principal or actor initiating the occurrence.</p>
</li>
</ul>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://github.com/cloudevents/spec/tree/main/cloudevents/extensions">https://github.com/cloudevents/spec/tree/main/cloudevents/extensions</a></div>
<p> </p>
<h1 id="heading-putting-cloudevents-on-aws">Putting Cloudevents on AWS</h1>
<p>AWS provides a wide range of infrastructure including communication channels that help to distribute the events between different software such as Event Bridge, SQS, SNS, and kinesis. choosing the right service per requirement is an important part of design.</p>
<h2 id="heading-implementing-examples">Implementing examples</h2>
<p>The provided example represents a distributed event-driven approach for an e-commerce software system.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1712434492686/3e770bcc-624b-4ea1-ba0a-e1d1a4e23e25.png" alt class="image--center mx-auto" /></p>
<p>The example source can be found in the following GitHub repository, Follow the Readme instructions to deploy and test the provided examples</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://github.com/XaaXaaX/aws-cloudevents-eda">https://github.com/XaaXaaX/aws-cloudevents-eda</a></div>
<p> </p>
<p>The above design follows a simple basket item approval leading to an order validation and delivery process.</p>
<ul>
<li><p>The basket <strong>item approval command</strong> reaches the order system.</p>
</li>
<li><p>The ordering system distributes an event of type <strong>order.placed</strong></p>
</li>
<li><p>The Shipment system starts preparing the packaging</p>
</li>
<li><p>The product system validates the availability of the product</p>
</li>
<li><p>The order system listens to product availability</p>
<ul>
<li><p>If the product is available, distribute an <strong>order.confirmed</strong> event</p>
</li>
<li><p>If the product is not available distribute an <strong>order.cancelled</strong> event</p>
</li>
</ul>
</li>
<li><p>The Shipment system sends <strong>order.shipped</strong> event if received an order.confirmed event</p>
</li>
<li><p>The Notification system listens to <strong>order.shipped</strong> events</p>
</li>
</ul>
<h2 id="heading-order-system">Order system</h2>
<p>The Order System has two modules, The ingestion is responsible for receiving the orders and the product listener is responsible for acting behind any product state change.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1712436713463/5edaa2ee-689c-4aec-88d4-58584decf197.png" alt class="image--center mx-auto" /></p>
<p>The ingestion receives the <strong>basket.item-approved</strong> event that is an event respecting cloudevent standard envelope, coming over HTTP protocol using ApiGateway.</p>
<p>The integration of ApiGateway and SQS using AWS CDK takes care of considering headers and body and adapting them as a standard event payload. The approach is what cloudevents adapter specification represents.</p>
<pre><code class="lang-typescript">  <span class="hljs-keyword">private</span> <span class="hljs-keyword">static</span> <span class="hljs-keyword">readonly</span> sqsRequestTemplate = <span class="hljs-string">`Action=SendMessage&amp;MessageBody={
    "data" : $util.urlEncode($input.body),
    #foreach($param in $input.params().header.keySet())
    "$param": "$util.escapeJavaScript($input.params().header.get($param))" #if($foreach.hasNext),#end

    #end
  }`</span>
</code></pre>
<p>The order DDB stream lambda sends multiple versions of <strong>order.placed</strong> to an SNS topic.</p>
<h2 id="heading-product-system">Product System</h2>
<p>The product system is a subscriber of the order system, listening to order.placed events using a SQS queue. the SQS is configured with RawMessageDelivery at SNS subscription level. this helps to avoid the message being wrapped in an SNS envelope. ( AWS <a target="_blank" href="https://docs.aws.amazon.com/sns/latest/dg/sns-large-payload-raw-message-delivery.html">RawMessageDelivery Documentation</a> )</p>
<p>The SNS/SQS subscription adaptation must be done in two steps IAC adapter using <strong>RawMessageDelivery</strong> and Software adapters to fetch the event out of wrapped event in SQS event model.</p>
<pre><code class="lang-typescript">    ordersTopic.addSubscription(<span class="hljs-keyword">new</span> SqsSubscription(productsQueue, {
      rawMessageDelivery: <span class="hljs-literal">true</span>,
      filterPolicyWithMessageBody: {
        source: FilterOrPolicy.filter(SubscriptionFilter.stringFilter({
          allowlist: [
            <span class="hljs-string">'ecommerce.orders.service'</span>
          ],
        })),
        <span class="hljs-keyword">type</span>: FilterOrPolicy.filter(SubscriptionFilter.stringFilter({
          allowlist: [
            <span class="hljs-string">'order.placed'</span>
          ],
        })),
        dataversion: FilterOrPolicy.filter(SubscriptionFilter.stringFilter({
          allowlist: [
            <span class="hljs-string">'v1.0'</span>
          ],
        })),
      }
    }))
</code></pre>
<p>The product system listens to v1.0 of order.placed event, so if any new version of order.placed will be introduced, this will not impact or introduce duplicated reception in product system.</p>
<p>The snippet shows the code adapter to retrieve the envelope from SQS event model.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> { EventBridgeEvent, SNSEvent, SQSEvent } <span class="hljs-keyword">from</span> <span class="hljs-string">"aws-lambda"</span>;
<span class="hljs-keyword">import</span> { EventModel } <span class="hljs-keyword">from</span> <span class="hljs-string">"../models/cloud-event"</span>;

<span class="hljs-keyword">type</span> EventType = SQSEvent | SNSEvent | EventBridgeEvent&lt;<span class="hljs-built_in">string</span>, <span class="hljs-built_in">any</span>&gt; | EventModel&lt;<span class="hljs-built_in">any</span>, <span class="hljs-built_in">any</span>&gt; | <span class="hljs-built_in">any</span>;
<span class="hljs-keyword">const</span> getEvent = &lt;T,U&gt;(
  event: EventType
  ): EventModel&lt;T,U&gt; | Record&lt;<span class="hljs-built_in">string</span>, <span class="hljs-built_in">any</span>&gt; | <span class="hljs-function"><span class="hljs-params">null</span> =&gt;</span> {
  ...
  <span class="hljs-keyword">if</span>( event.Records[<span class="hljs-number">0</span>].eventSource == <span class="hljs-string">"aws:sqs"</span> )
    <span class="hljs-keyword">return</span> <span class="hljs-built_in">JSON</span>.parse(event.Records[<span class="hljs-number">0</span>].body);
  ...
  <span class="hljs-keyword">return</span> <span class="hljs-literal">null</span>;
}

<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> DeSerialize = &lt;T,U&gt;(
  event: EventType
  ):EventModel&lt;T,U&gt; | Record&lt;<span class="hljs-built_in">string</span>, <span class="hljs-built_in">any</span>&gt; | <span class="hljs-function"><span class="hljs-params">null</span> =&gt;</span> {
  <span class="hljs-keyword">const</span> evt = getEvent&lt;T,U&gt;(event);
  <span class="hljs-built_in">console</span>.log({
    ...evt, 
    recipient: process.env.SOURCE,
  });
  <span class="hljs-keyword">return</span> evt;
}
</code></pre>
<h2 id="heading-shipment-system">Shipment System</h2>
<p>The shipment system listens to all <strong>order.placed</strong>, <strong>confirmed</strong>, and <strong>cancelled</strong> events, and listens to v2.0 of events.</p>
<pre><code class="lang-typescript">ordersTopic.addSubscription(<span class="hljs-keyword">new</span> SqsSubscription(productsQueue, {
      rawMessageDelivery: <span class="hljs-literal">true</span>,
      filterPolicyWithMessageBody: {
        source: FilterOrPolicy.filter(SubscriptionFilter.stringFilter({
          allowlist: [
            <span class="hljs-string">'ecommerce.orders.service'</span>
          ],
        })),
        <span class="hljs-keyword">type</span>: FilterOrPolicy.filter(SubscriptionFilter.stringFilter({
          allowlist: [
            <span class="hljs-string">'order.placed'</span>,
            <span class="hljs-string">'order.cancelled'</span>,
            <span class="hljs-string">'order.confirmed'</span>
          ],
        })),
        dataversion: FilterOrPolicy.filter(SubscriptionFilter.stringFilter({
          allowlist: [
            <span class="hljs-string">'v2.0'</span>
          ],
        })),
      }
    }))
</code></pre>
<h2 id="heading-notification-system">Notification System</h2>
<p>The shipment service sends the events to an event bridge bus, and the Notification system listens to the shipment service using an event bridge rule.</p>
<p>The rule removes extracts $.details representing the Cloudevent respected event from the EventBridge wrapped event payload.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">new</span> Rule(<span class="hljs-built_in">this</span>, <span class="hljs-string">'rule'</span>, {
        eventBus: shipmentEventBus,
        eventPattern: {
          detail: {
            <span class="hljs-keyword">type</span>: [<span class="hljs-string">'order.shipped'</span>],
            source: [<span class="hljs-string">'ecommerce.shipment.service'</span>],
            dataversion: [<span class="hljs-string">'v1.0'</span>]
          }
        },
        targets: [
          <span class="hljs-keyword">new</span> targets.SqsQueue(notificationQueue, {
            deadLetterQueue: dlq,
            message: RuleTargetInput.fromEventPath(<span class="hljs-string">'$.detail'</span>),
          }),
        ]
    });
</code></pre>
<h2 id="heading-producing-events">Producing Events</h2>
<p>The producers of events use a helper method to generate the events, the method will generate an Event Id, Idempotency key , correlation Id, sequence id, and accepts the event type , event payload, version, and causation id as parameters.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> InitEvent = &lt;TData, TEventType&gt;(
    source: <span class="hljs-built_in">string</span>,
    eventType: TEventType,
    eventData: TData,
    dataVersion: <span class="hljs-built_in">string</span>,
    dataSchema?: <span class="hljs-built_in">string</span>,
    causationId?: <span class="hljs-built_in">string</span>,
    correlationid?: <span class="hljs-built_in">string</span>
     ): EventModel&lt;TData, TEventType&gt; =&gt; {

    <span class="hljs-keyword">return</span> {
        idempotencykey: uuidV5(<span class="hljs-built_in">JSON</span>.stringify(eventData), <span class="hljs-string">"40781d63-9741-40a6-aa25-c5a35d47abd6"</span>),
        id: nanoid(),
        time: <span class="hljs-keyword">new</span> <span class="hljs-built_in">Date</span>().toISOString(),
        data: eventData,
        <span class="hljs-keyword">type</span>: eventType,
        source,
        dataversion: dataVersion,
        dataschema: dataSchema,
        causationid: causationId,
        correlationid: correlationid ?? nanoid(),
        specversion: <span class="hljs-string">"1.0.2"</span>,
        sequence: ulid(),
    }
  }
</code></pre>
<p>The <strong>Idempotency key</strong> helps to avoid unintended behavior on the consumer side in case of event duplication.</p>
<p>The <strong>Sequence</strong> helps to keep track of the ordering of events on the consumer side.</p>
<h1 id="heading-observing-events">Observing Events</h1>
<p>To observe the event production and consumption for simplicity we use cloudwatch service, the goal is to discover how the cloudevent context info is important to be observed.</p>
<p>As the events can reach the lambda service from different services using the adapter concept to extract the event payload is the proposed approach by cloudevents. All lambda handlers in this example use a custom Deserialize helper to extract the Cloudevent model from the infrastructure event.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> DeSerialize = &lt;T,U&gt;(
  event: EventType
  ):EventModel&lt;T,U&gt; | Record&lt;<span class="hljs-built_in">string</span>, <span class="hljs-built_in">any</span>&gt; | <span class="hljs-function"><span class="hljs-params">null</span> =&gt;</span> {
  <span class="hljs-keyword">const</span> evt = getEvent&lt;T,U&gt;(event);
  <span class="hljs-built_in">console</span>.log({
    ...evt, 
    recipient: process.env.SOURCE,
  });
  <span class="hljs-keyword">return</span> evt;
}
</code></pre>
<p>The helper function logs the cloudevent event payload to simplify the observability and data extraction.</p>
<h2 id="heading-running-the-example">Running the example</h2>
<p>The source code provides a Postman collection in '<strong>assets</strong>' folder under <strong>Cloudevents.postman_collection.json</strong> name, to run it simply import the collection in postman and change the request URL by the ApiGateway url returned at the end of orders system deployment. or use the following curl command to send an event.</p>
<pre><code class="lang-bash">curl --location <span class="hljs-string">'https://xxxxxxxx.execute-api.eu-west-1.amazonaws.com/live/sqs'</span> \
--header <span class="hljs-string">'x-api-key: ec1a9e8f-b8fc-4a6d-9069-108775d67af8'</span> \
--header <span class="hljs-string">'causationid: 6e67e1a4-e323-492e-a7ff-a489a54ba63d'</span> \
--header <span class="hljs-string">'source: ecommerce.baskets.service'</span> \
--header <span class="hljs-string">'type: basket.item-approved'</span> \
--header <span class="hljs-string">'id: 872fab6b-4f22-4951-874d-021d68d39154'</span> \
--header <span class="hljs-string">'specversion: 1.0.2'</span> \
--header <span class="hljs-string">'time: 2024-04-06T22:40:33.413Z'</span> \
--header <span class="hljs-string">'dataversion: v1.0'</span> \
--header <span class="hljs-string">'correlationid: 3a02915a-ba3e-4e58-b7c3-642efaa31a1a'</span> \
--header <span class="hljs-string">'Content-Type: application/json'</span> \
--data <span class="hljs-string">'{
    "orderDate": "2024-01-01T12:55:00.990Z",
    "price": 1000,
    "quantity": 2,
    "productId": "PRD_12345643",
    "userId": "a5449147-ab45-4bec-a0be-f00daf5f2871"
}'</span>
</code></pre>
<p>The process behind this request will place an order but the order will be canceled later because the product availability will not be confirmed by the lack of product in the product system DynamoDB table.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1712444476282/2823cc02-5c1d-49c1-b430-48e9410a8f79.png" alt class="image--center mx-auto" /></p>
<p>As represented we can observe the event type and version consumption. and see the event transition process. To simulate the process for an available product we can add the following product in the table.</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"productId"</span>: <span class="hljs-string">"PRD_12345643"</span>,
  <span class="hljs-attr">"price"</span>: <span class="hljs-number">500</span>,
  <span class="hljs-attr">"stock"</span>: <span class="hljs-number">1</span>,
  <span class="hljs-attr">"status"</span>: <span class="hljs-string">"IN_STOCK"</span>
}
</code></pre>
<p>Sending a new request will result a full order process, including confirmation and shipment approval.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1712445818786/1dcfaf6c-3929-4de6-8712-c3369779f607.png" alt class="image--center mx-auto" /></p>
<p>Also extracting some statistics helps to see the active consumption and for example find outdated event versions and active ones.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1712444873296/d26f5660-0da2-4a37-83dc-2314a7335156.png" alt class="image--center mx-auto" /></p>
<h1 id="heading-event-discovery">Event Discovery</h1>
<p>As the example already uses the cloudwatch, to prepare a catalog of events use of lambda extensions can be a solution to achieve event discovery and schema extraction and documentation.</p>
<p>The following design demonstrates how lambda extensions can be used as a sidecar to feed the event discover process.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1712446046317/5842d55f-e8f6-4d39-a1d3-767a57bd3dc0.png" alt class="image--center mx-auto" /></p>
<p>The extension receives the logs and moves them to a kinesis data stream that results in triggering a lambda function that puts events into an EventBridge custom bus with an enabled discovery option.</p>
<p>The extension subscribes to the lambda telemetry api by registering to the extension api to receive the Invoke and shutdown invocations.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> RUNTIME_EXTENSION_URL = <span class="hljs-string">`http://<span class="hljs-subst">${process.env.AWS_LAMBDA_RUNTIME_API}</span>/2020-01-01/extension`</span>;

<span class="hljs-keyword">await</span> fetch(<span class="hljs-string">`<span class="hljs-subst">${RUNTIME_EXTENSION_URL}</span>/register`</span>, {
   method: <span class="hljs-string">'post'</span>,
   body: <span class="hljs-built_in">JSON</span>.stringify({
       <span class="hljs-string">'events'</span>: [
           <span class="hljs-string">'INVOKE'</span>,
            <span class="hljs-string">'SHUTDOWN'</span>
        ],
   }),
   headers: {
        <span class="hljs-string">'Content-Type'</span>: <span class="hljs-string">'application/json'</span>,
        <span class="hljs-string">'Lambda-Extension-Name'</span>: basename(__dirname),
   }
});
</code></pre>
<p>Also, the extension needs to subscribe to the telemetry api and provide a HTTP listener to allow the telemetry api send the logs to the extension.</p>
<p>Adding the extension to the lambda can be done as shown in the following CDK code.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> orderPlacedFunction = <span class="hljs-keyword">new</span> NodejsFunction(<span class="hljs-built_in">this</span>, <span class="hljs-string">'OrderPlacedFunction'</span>, {
   entry: resolve(join(__dirname, <span class="hljs-string">'../../src/service/ingestion/order-receiver/index.ts'</span>)),
   handler: <span class="hljs-string">'handler'</span>,
   ...LambdaConfiguration,
   role: orderPlacedFunctionRole,
   layers: [
     telemetryExtensionLayerVersion
   ],
   environment: {
     SOURCE: <span class="hljs-string">'ecommerce.orders.service'</span>,
     TABLE_NAME: <span class="hljs-built_in">this</span>.OrdersTable.tableName,
   }
});
</code></pre>
<p>The example has the extension attached to all lambda functions, this will send all logs to the kinesis data stream for all functions and let the <strong>schema-registerer</strong> send those logged events to the custom event bus.</p>
<p>The schema-registerer function has a simple logic as below</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">await</span> <span class="hljs-built_in">Promise</span>.all(event.Records.map(<span class="hljs-keyword">async</span> (record) =&gt; {
    <span class="hljs-keyword">const</span> eventData = <span class="hljs-built_in">JSON</span>.parse(Buffer.from(record.kinesis.data, <span class="hljs-string">'base64'</span>).toString(<span class="hljs-string">'ascii'</span>));
    <span class="hljs-keyword">await</span> client.send(
      <span class="hljs-keyword">new</span> PutEventsCommand({
        Entries: [
          {
            EventBusName: process.env.EVENT_BUS_ARN!,
            Detail: <span class="hljs-built_in">JSON</span>.stringify(event),
            Source: event.source,
            DetailType: <span class="hljs-string">`<span class="hljs-subst">${event.<span class="hljs-keyword">type</span>}</span>.<span class="hljs-subst">${event.dataversion}</span>`</span>
          },
        ],
      }),
    );
}
</code></pre>
<p>In the above example, the DetailType in PutEvents call is a concatenation of type and version</p>
<p>After sending a request the event schemas will be available in the schema section of the EventBridge.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1712447930068/eef4ba2d-8c9c-4204-9303-b587ba087357.png" alt class="image--center mx-auto" /></p>
<h1 id="heading-cataloging">Cataloging</h1>
<p>Automation and documentation are two principal points in governance where we capture how the system behaves, and completing everything in an automated manner will be one important part of that journey.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1712852402827/7a127956-8a09-4f4f-b115-4ac2bf852b66.png" alt class="image--center mx-auto" /></p>
<p>In the above solution, The catalog will be generated using GitHub trigger and aws Code Pipeline, The pipeline will generate, build, and deploy the <a target="_blank" href="https://www.eventcatalog.dev/">EventCatalog</a> as a static website.</p>
<p>The source of AsyncApi specs will be a s3 bucket with a domain-based scaffolding as below.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1712851667309/06c4ff61-239b-4676-8d34-e79363fd4ac1.png" alt class="image--center mx-auto" /></p>
<p>The pipeline build process will Synchronize the s3 bucket to the local folder and generate the Domain, service, and events followed by building the eventcatalog bundle.</p>
<pre><code class="lang-yaml"> <span class="hljs-attr">version:</span> <span class="hljs-number">0.2</span>

<span class="hljs-attr">env:</span>
  <span class="hljs-attr">parameter-store:</span>
    <span class="hljs-attr">SPEC_BUCKET_NAME:</span> <span class="hljs-string">/catalog/bucket/specs/name</span>

<span class="hljs-attr">phases:</span>
  <span class="hljs-attr">install:</span>
    <span class="hljs-attr">commands:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">echo</span> <span class="hljs-string">Installing</span> <span class="hljs-string">dependencies...</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">npm</span> <span class="hljs-string">cache</span> <span class="hljs-string">clean</span> <span class="hljs-string">--force</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">cd</span> <span class="hljs-string">catalog</span> <span class="hljs-string">&amp;&amp;</span> <span class="hljs-string">npm</span> <span class="hljs-string">install</span> <span class="hljs-string">--froce</span> <span class="hljs-string">&amp;&amp;</span> <span class="hljs-string">cd</span> <span class="hljs-string">..</span>

  <span class="hljs-attr">pre_build:</span>
    <span class="hljs-attr">commands:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">echo</span> <span class="hljs-string">"Pre build command"</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">mkdir</span> <span class="hljs-string">specs</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">aws</span> <span class="hljs-string">s3</span> <span class="hljs-string">sync</span> <span class="hljs-string">s3://$SPEC_BUCKET_NAME/</span> <span class="hljs-string">specs</span>
  <span class="hljs-attr">build:</span>
    <span class="hljs-attr">commands:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">cd</span> <span class="hljs-string">catalog</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">npm</span> <span class="hljs-string">run</span> <span class="hljs-string">generate</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">npm</span> <span class="hljs-string">run</span> <span class="hljs-string">build</span>

<span class="hljs-attr">artifacts:</span>
  <span class="hljs-attr">files:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-string">'**/*'</span>
  <span class="hljs-attr">base-directory:</span> <span class="hljs-string">catalog/out</span>
</code></pre>
<p>The catalog project has two stacks, the pipeline stack and the catalog stack, the catalog stack represents the following diagram.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1712852268228/93ffdf88-c653-4cfe-942a-6e85c06d0a94.png" alt class="image--center mx-auto" /></p>
<p>The Catalog and Pipeline stacks can be found as part of the source code in '<strong>./src/platform/cdk'</strong></p>
<ul>
<li><p>AsyncApi Spec will be uploaded in Specs Bucket</p>
</li>
<li><p>The Event bridge will trigger the code pipeline</p>
</li>
<li><p>The Pipeline will Sync all Specs in s3 and regenerate the EventCatalog</p>
</li>
<li><p>The Static S3 WebSite will get updated by new bundle</p>
</li>
</ul>
<h2 id="heading-catalog-of-thousands-of-services">Catalog of Thousands of Services</h2>
<p>The above cataloging section focused on a simplified way of automating the Catalog generation but there is a last question to answer, <strong>How we manage thousands of Specs under ownership of hundreds of teams?</strong></p>
<p>The answer is, a service owns the software and all corresponding documentation being OpenApi, AsyncApi, Readme, etc., By defining the spec under ownership of a team and as part of service source code, we need a way to simplify the cataloging in a central and automated way by relying on each service spec in each service repository.</p>
<p>Github actions are a good candidate to replicate the AsyncApi spec and make a copy in specs bucket. The original workflow will be a bit verbose as below</p>
<p>Github offers reusable workflows that can simplify the process of adoption in service teams, for this article simplicity a simple workflow action is the choice</p>
<pre><code class="lang-yaml"><span class="hljs-attr">name:</span> <span class="hljs-string">AsyncApi</span> <span class="hljs-string">Spec</span> <span class="hljs-string">Sync</span>
<span class="hljs-attr">on:</span>
  <span class="hljs-attr">push:</span>
    <span class="hljs-attr">branches:</span> [ <span class="hljs-string">main</span> ]
<span class="hljs-attr">env:</span>
  <span class="hljs-attr">AWS_REGION:</span> <span class="hljs-string">eu-west-1</span>
<span class="hljs-attr">jobs:</span>
  <span class="hljs-attr">sync_spec:</span>
    <span class="hljs-attr">runs-on:</span> <span class="hljs-string">ubuntu-latest</span>
    <span class="hljs-attr">steps:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Checkout</span>
      <span class="hljs-attr">uses:</span> <span class="hljs-string">actions/checkout@v4</span>

    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Configure</span> <span class="hljs-string">AWS</span> <span class="hljs-string">Credentials</span>
      <span class="hljs-attr">uses:</span> <span class="hljs-string">aws-actions/configure-aws-credentials@master</span>
      <span class="hljs-attr">with:</span>
        <span class="hljs-attr">aws-access-key-id:</span> <span class="hljs-string">${{</span> <span class="hljs-string">secrets.AWS_ACCESS_KEY_ID</span> <span class="hljs-string">}}</span>
        <span class="hljs-attr">aws-secret-access-key:</span> <span class="hljs-string">${{</span> <span class="hljs-string">secrets.AWS_SECRET_ACCESS_KEY</span> <span class="hljs-string">}}</span>
        <span class="hljs-attr">aws-region:</span> <span class="hljs-string">${{</span> <span class="hljs-string">env.AWS_REGION</span> <span class="hljs-string">}}</span>

    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Sync</span> <span class="hljs-string">spec</span> <span class="hljs-string">to</span> <span class="hljs-string">S3</span>
      <span class="hljs-attr">run:</span> <span class="hljs-string">|</span>
        <span class="hljs-string">aws</span> <span class="hljs-string">s3</span> <span class="hljs-string">sync</span> <span class="hljs-string">./spec</span> <span class="hljs-string">s3://$(aws</span> <span class="hljs-string">ssm</span> <span class="hljs-string">get-parameter</span> <span class="hljs-string">--name</span> <span class="hljs-string">"/catalog/bucket/specs/name"</span> <span class="hljs-string">|</span> <span class="hljs-string">jq</span> <span class="hljs-string">-r</span> <span class="hljs-string">'.Parameter.Value'</span><span class="hljs-string">)/</span>
</code></pre>
<p>The sync will trigger the regeneration process and update the website.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1712867590589/b6127e10-d7a2-4a06-9cfd-5129763cbcab.png" alt class="image--center mx-auto" /></p>
<p>The EventBridge rule listens to default event bus and trigger the Catalog CodePipeline when event changes are pushed to the bus. The EB rule using AWS CDK will be as below.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">new</span> Rule(<span class="hljs-built_in">this</span>, <span class="hljs-string">'rule'</span>, {
  eventPattern: {
    source: [<span class="hljs-string">'aws.s3'</span>],
    detailType: [
      <span class="hljs-string">'Object Created'</span>,
      <span class="hljs-string">'Object Deleted'</span>
    ],
    resources: [ props.specsBucket.bucketArn ]
  },
  targets: [ <span class="hljs-keyword">new</span> CodePipeline(pipeline) ]
});
</code></pre>
<p>To receive the S3 event notifications to the EventBridge default bus the parameter must be enabled on S3 bucket.</p>
<pre><code class="lang-typescript"><span class="hljs-built_in">this</span>.specsBucket = <span class="hljs-keyword">new</span> Bucket(<span class="hljs-built_in">this</span>, <span class="hljs-string">'CatalogSpecsBucket'</span>, {
  objectOwnership: ObjectOwnership.BUCKET_OWNER_ENFORCED,
  removalPolicy: RemovalPolicy.DESTROY,
  autoDeleteObjects: <span class="hljs-literal">true</span>,
  eventBridgeEnabled: <span class="hljs-literal">true</span>
});
</code></pre>
<h2 id="heading-event-catalog-config">Event Catalog Config</h2>
<p>The event catalog use a generators section as part of the config file to execute <a target="_blank" href="https://www.eventcatalog.dev/docs/api/plugins/@eventcatalog/plugin-doc-generator-asyncapi">AsynAPI plugin</a>. the config looks like the following snippet.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> path = <span class="hljs-built_in">require</span>(<span class="hljs-string">'path'</span>);

<span class="hljs-built_in">module</span>.<span class="hljs-built_in">exports</span> = {
  ... ,
  generators: [
    [
      <span class="hljs-string">'@eventcatalog/plugin-doc-generator-asyncapi'</span>,
      {
        pathToSpec: [
          path.join(__dirname, <span class="hljs-string">'../specs/Order/1.0.0'</span>, <span class="hljs-string">'asyncapi.yaml'</span>)
        ],
        versionEvents: <span class="hljs-literal">false</span>,
        renderNodeGraph: <span class="hljs-literal">true</span>,
        renderMermaidDiagram: <span class="hljs-literal">true</span>,
        domainName: <span class="hljs-string">'Orders System'</span>
      },
    ],
    [
      <span class="hljs-string">'@eventcatalog/plugin-doc-generator-asyncapi'</span>,
      {
        pathToSpec: [
          path.join(__dirname, <span class="hljs-string">'../specs/Product/1.0.0'</span>, <span class="hljs-string">'asyncapi.yaml'</span>)
        ],
        versionEvents: <span class="hljs-literal">false</span>,
        renderNodeGraph: <span class="hljs-literal">true</span>,
        renderMermaidDiagram: <span class="hljs-literal">true</span>,
        domainName: <span class="hljs-string">'Product System'</span>
      },
    ],
    [
      <span class="hljs-string">'@eventcatalog/plugin-doc-generator-asyncapi'</span>,
      {
        pathToSpec: [
          path.join(__dirname, <span class="hljs-string">'../specs/Shipment/1.0.0'</span>, <span class="hljs-string">'asyncapi.yaml'</span>)
        ],
        versionEvents: <span class="hljs-literal">false</span>,
        renderNodeGraph: <span class="hljs-literal">true</span>,
        renderMermaidDiagram: <span class="hljs-literal">true</span>,
        domainName: <span class="hljs-string">'Shipment System'</span>
      },
    ],
  ]
}
</code></pre>
<p>For the moment, the article has no solution to sync the config generators section in an automated way, so here for any new service there is a bit of effort adding the service.</p>
<h2 id="heading-run-the-solution">Run the solution</h2>
<p>The only required step to trigger the catalog generation process is to push a change to the main branche. this will trigger the s3 sync from local folder.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1712870473452/7cfef6ce-ff16-4898-8e7d-573bc91e20e6.png" alt class="image--center mx-auto" /></p>
<p>The S3 sync action will lead to the event bridge rule event match and trigger the catalog pipeline</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1712870584320/67796fda-bb78-4825-9f3a-4e5ed7ddc7c9.png" alt class="image--center mx-auto" /></p>
<p>Using the CloudFront distribution url the event catalog will be online and available</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1712870639190/5dbb01f4-a591-4c04-b899-b408f3ee8f59.png" alt class="image--center mx-auto" /></p>
<h1 id="heading-conclusion">Conclusion</h1>
<p>The distributed systems are really hard and when it comes to EDA in some mesures the pain-points of distributed system stays with an additional level of complexity that will not be visible in the start of EDA adoption journey, but becomes a real obstacle when system evolves and lots of services communicate using events.</p>
<p>The EDA being a great candidate for agility can become a break of agility without having a minimum of standards. in this article we had a look at a simplified journey of putting standards , operating and observing them. for sure there are the missions parts that can varie per company size , culture and existing tools but the idea stays the same.</p>
<p>Enjoy reading</p>
]]></content:encoded></item><item><title><![CDATA[Contextualised and Responsibilized Eventing on AWS]]></title><description><![CDATA[The adoption of event-driven architecture became a real challenge of many enterprises since some years and lately raised this adoption bar. The companies achieve the principal pillars of a well architected system by relying on decoupled and asynchron...]]></description><link>https://blogs.serverlessfolks.com/contextualised-eventing-on-aws</link><guid isPermaLink="true">https://blogs.serverlessfolks.com/contextualised-eventing-on-aws</guid><dc:creator><![CDATA[Omid Eidivandi]]></dc:creator><pubDate>Wed, 06 Mar 2024 23:06:56 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1714480027875/e4a048be-d0fb-4567-82ef-1627b244eed1.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The adoption of event-driven architecture became a real challenge of many enterprises since some years and lately raised this adoption bar. The companies achieve the principal pillars of a well architected system by relying on decoupled and asynchronous services communication in real time to keep the overall system state consistent. This approach improves very important bottlenecks of traditional distributed systems like reliability, scalability and performance.</p>
<p>Adoption of micro services was a finer level of improvement in distributed systems but actually those fine grained services behind achieving localised scalability , reliability and performance couldn't solve the overall system experience. Cascading latencies , Cascading failures , cascading downtimes was yet present and hard to resolve. The micro-service design approach was again relied on Request / Response communication over unfair network with its potential drawbacks in terms of system experience.</p>
<p>The actual Distributed system problems was at communication boundaries and putting a huge amount of effort to make every thing service scalable at 10x higher rate seemed overkill. The resolution to those challenges was putting all puzzle pieces together and find from where the local problems are propagated all over the system and the response was the Request / response lines.</p>
<p>The simplest part of these distributed system complexities is achieving the localised improvements. and the hardest part was solving the communication problems.s</p>
<h2 id="heading-event-driven-challenges">Event Driven Challenges</h2>
<p>The event driven design is distributed design like traditional distributed and micro service design has some complexities to solve, solving the above mentioned distributed bottlenecks just guided the softwares to the localised optimised state but the synchronous communication bottlenecks became an obstacle. Adopting EDA was a solution to improve those communication bottlenecks but added some inter-component and localised complexities.</p>
<p>The EDA biggest deal, when comes to overall system state and inter-component communication, is the state consistency. delayed propagated state and duplication.</p>
<h2 id="heading-event-streaming">Event Streaming</h2>
<p>Event streaming is an operational approach relying on a number of events representing small changes in different services. Event Streaming helps to resolve the Consistency problem as the most important system level challenge in EDA. The delayed and duplication challenges mentioned earlier will be kept localised at service level and must be handled by any context Owner.</p>
<p>Event Streaming on AWS can be achieved using kinesis data streams as a Pull based solution:</p>
<p>The pull based streaming can be achieved by using Kinesis Data Streams with its own pros and cons.</p>
<p>Here an example figure showing the pulling from Kinesis</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1706051356694/90cd2def-58f7-482e-bc49-01281b099bf4.png" alt class="image--center mx-auto" /></p>
<ul>
<li><p>The producer write records in stream</p>
</li>
<li><p>Lambda event source mapping pull the records</p>
</li>
</ul>
<p>Some details :</p>
<ul>
<li><p>ESM pulls the kinesis data streams once per second</p>
</li>
<li><p>The Data Streams Guarantee the ordering per partition key</p>
</li>
<li><p>Kinesis has a 2 MB/S , 5 RPS shared by all consumers.</p>
</li>
<li><p>Kinesis Share limits are per shard , adding new shards can be a solution</p>
</li>
</ul>
<p>Let's add some new consumers and run a test to see what happens. imagine we add some consumers up to 15 ( 15 seems enough for this test )</p>
<p>Creating a kinesis Data stream and 15 consumes using CDK</p>
<pre><code class="lang-typescript">
    <span class="hljs-keyword">const</span> functionRole = <span class="hljs-keyword">new</span> Role(<span class="hljs-built_in">this</span>, <span class="hljs-string">'FunctionRole'</span>, {
      assumedBy: <span class="hljs-keyword">new</span> ServicePrincipal(<span class="hljs-string">'lambda.amazonaws.com'</span>),
      managedPolicies: [
        ManagedPolicy.fromAwsManagedPolicyName(<span class="hljs-string">'service-role/AWSLambdaBasicExecutionRole'</span>)
      ]
    });

    <span class="hljs-keyword">const</span> dataStream = <span class="hljs-keyword">new</span> Stream(<span class="hljs-built_in">this</span>, <span class="hljs-string">'DataStream'</span>, {
      shardCount: <span class="hljs-number">1</span>,
    });

    <span class="hljs-keyword">for</span>(<span class="hljs-keyword">let</span> i = <span class="hljs-number">0</span>; i &lt; <span class="hljs-number">15</span>; i++) {
      <span class="hljs-keyword">const</span> functionName = <span class="hljs-string">`function-<span class="hljs-subst">${i}</span>`</span>;
      <span class="hljs-keyword">const</span> logGroup = <span class="hljs-keyword">new</span> LogGroup(<span class="hljs-built_in">this</span>, <span class="hljs-string">`Function-<span class="hljs-subst">${i}</span>-LogGroup`</span>, {
        logGroupName: <span class="hljs-string">`/aws/lambda/<span class="hljs-subst">${functionName}</span>`</span>,
        ....
      });

      <span class="hljs-keyword">const</span> consumerFunction = <span class="hljs-keyword">new</span> NodejsFunction(<span class="hljs-built_in">this</span>, <span class="hljs-string">`Function-<span class="hljs-subst">${i}</span>`</span>, {
        .....
      });
      consumerFunction.addEventSource(<span class="hljs-keyword">new</span> KinesisEventSource(dataStream, {
        startingPosition: StartingPosition.LATEST,
        batchSize: <span class="hljs-number">10</span>,
        bisectBatchOnError: <span class="hljs-literal">true</span>,
        retryAttempts: <span class="hljs-number">3</span>
      }));
    }

    dataStream.grantRead(functionRole);

  }
</code></pre>
<p>Now it can be interesting to see how this solution behaves at build and runtime</p>
<p><strong>Build time:</strong></p>
<ul>
<li><p>During the deploy randomly you will face consumer source mapping errors as you reach some small limits but at longterm this will not be a problem</p>
</li>
<li><p>If you have a batch of related records all consumers are implementing the same logic to consume, evaluate last state and processing those events, again this can not be a problem at a given time but think of presenting a new Pending event state between Created and Validated , here all consumers need to reevaluate the logic of state evaluation.</p>
</li>
</ul>
<p><strong>Run time:</strong></p>
<ul>
<li><p>How the consumers behave from a latency point of view that is absolutely possible with the known kinesis data streams limits , 5 RPS, 1 second Polling per consumer by ESM. when a throttle happens this leads to a delay in consumer polling.</p>
</li>
<li><p>A single malformed record can act as a poison pill and stop the sequence of records being consumed.</p>
</li>
<li><p>The DLQ does not have the record data, passing the record retention period the event will be completely lost.</p>
</li>
<li><p>If two consumers communicate together after processing of initial reception of an event from kinesis the delay represented above about throttling and latency can initiate problems or add extra communication complexity.</p>
<p>  <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1706706157787/111d21be-2862-4928-b640-370ceedf0537.png" alt class="image--center mx-auto" /></p>
</li>
</ul>
<h3 id="heading-broadcasting">Broadcasting</h3>
<p>The event broadcasting is another way of distributing events in an event driven system, this can be achieved using amazon SNS or event bridge bus.</p>
<p>The following figure represents the event broadcasting using SNS</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1706836484797/6d89a9ea-b438-4019-8d56-1986a552b9cc.png" alt class="image--center mx-auto" /></p>
<ul>
<li><p>The producer publish events to the topic</p>
</li>
<li><p>The sns will send a copy of event to all subscribers</p>
</li>
<li><p>The sns can distribute the events to different type of subscribers as Lambda, SQS, Http/2, SMS, Mobile Platform and email.</p>
</li>
</ul>
<p>Some details :</p>
<ul>
<li><p>The SNS Standard guarantees at least once delivery and does not guarantee ordering</p>
</li>
<li><p>The SNS FIFO guarantees the ordering and partitioning of group of related events.</p>
</li>
<li><p>SNS Quota indicates 100 TPS for subscribe action per account</p>
</li>
<li><p>SNS has a 256KB event size limit</p>
</li>
<li><p>SNS FIFO supports deduplication</p>
</li>
</ul>
<p>Creating a above design using CDK</p>
<pre><code class="lang-typescript">    <span class="hljs-keyword">const</span> functionRole = <span class="hljs-keyword">new</span> Role(<span class="hljs-built_in">this</span>, <span class="hljs-string">'FunctionRole'</span>, {
      assumedBy: <span class="hljs-keyword">new</span> ServicePrincipal(<span class="hljs-string">'lambda.amazonaws.com'</span>),
      managedPolicies: [
        ManagedPolicy.fromAwsManagedPolicyName(<span class="hljs-string">'service-role/AWSLambdaBasicExecutionRole'</span>)
      ]
    });
    <span class="hljs-keyword">const</span> topic = <span class="hljs-keyword">new</span> Topic(<span class="hljs-built_in">this</span>, <span class="hljs-string">'Topic'</span>);

    <span class="hljs-keyword">for</span>(<span class="hljs-keyword">let</span> i = <span class="hljs-number">0</span>; i &lt; <span class="hljs-number">15</span>; i++) {
      <span class="hljs-keyword">const</span> functionName = <span class="hljs-string">`function-<span class="hljs-subst">${i}</span>`</span>;
      <span class="hljs-keyword">const</span> logGroup = <span class="hljs-keyword">new</span> LogGroup(<span class="hljs-built_in">this</span>, <span class="hljs-string">`Function-<span class="hljs-subst">${i}</span>-LogGroup`</span>, {
        logGroupName: <span class="hljs-string">`/aws/lambda/<span class="hljs-subst">${functionName}</span>`</span>,
        ...
      });

      <span class="hljs-keyword">const</span> consumerFunction = <span class="hljs-keyword">new</span> NodejsFunction(<span class="hljs-built_in">this</span>, <span class="hljs-string">`Function-<span class="hljs-subst">${i}</span>`</span>, {
        ...
      });
      consumerFunction.addEventSource(<span class="hljs-keyword">new</span> SnsEventSource(topic, {}));
    }
</code></pre>
<p><strong>Build time:</strong></p>
<ul>
<li><p>During the deploy randomly you will face subscription errors if number of consumers reach limits but at longterm this will not be a problem</p>
</li>
<li><p>Having the same problem of KDS earlier the consumers can have a duplicated logic to evaluate events, but using SNS this will be harder at consumer side as the SNS distribute events one by one and has no supported batching. a new Pending event state between Created and Validated , here all consumers need to reevaluate the logic of state evaluation but to be able to execute that logic they need to persist the events first and fetch them per batch to evaluate.</p>
</li>
<li><p>The SQS can subscribe to the SNS topic so this can be a serverless and managed way of having Offloading, Batching, Temporary Storage.</p>
</li>
</ul>
<p><strong>Run time:</strong></p>
<ul>
<li><p>The SNS event distribution is fast and near realtime.</p>
</li>
<li><p>The events are distributed based on filter policy if configured</p>
</li>
<li><p>The failures are handled in a exponential way and up to 23 days if the subscriber is not reachable.</p>
</li>
<li><p>Each subscription can have a DLQ</p>
</li>
<li><p>The latency and performance of distribution is near realtime even a significant number of subscribers.</p>
</li>
</ul>
<p>EventBridge is a powerfull service that helps to achieve the event broadcasting, it is a highly scalable and managed service.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1707692340705/adea1d64-3d6d-4ac9-b3cb-fe720ade7ced.png" alt class="image--center mx-auto" /></p>
<ul>
<li><p>The producer put an event into the bus</p>
</li>
<li><p>The event bus will send a copy of event to each subscriber via rules</p>
</li>
<li><p>The event bridge supports a wide variety of services as a rule target like Api gateway, Api destination, SQS, Kinesis Data Streams, Lambda ( ASYNC ) , AppSync and etc ..</p>
</li>
</ul>
<p>Some details :</p>
<ul>
<li><p>The EventBridge provides a at-least-once delivery and does not guarantee ordering</p>
</li>
<li><p>EventBridge Quota indicates 18,750 TPS for rule targeting beyond that the throttling cause delays in event distribution.</p>
</li>
<li><p>EventBridge has 100 rule per bus limit</p>
</li>
<li><p>EventBridge has a 10,000 TPS for put event into the bus</p>
</li>
</ul>
<p>Creating a above design using CDK</p>
<pre><code class="lang-typescript">    <span class="hljs-keyword">const</span> functionRole = <span class="hljs-keyword">new</span> Role(<span class="hljs-built_in">this</span>, <span class="hljs-string">'FunctionRole'</span>, {
      assumedBy: <span class="hljs-keyword">new</span> ServicePrincipal(<span class="hljs-string">'lambda.amazonaws.com'</span>),
      managedPolicies: [
        ManagedPolicy.fromAwsManagedPolicyName(<span class="hljs-string">'service-role/AWSLambdaBasicExecutionRole'</span>)
      ]
    });
    <span class="hljs-keyword">const</span> eventbus = <span class="hljs-keyword">new</span> EventBus(<span class="hljs-built_in">this</span>, <span class="hljs-string">'EventBus'</span>);

    <span class="hljs-keyword">for</span>(<span class="hljs-keyword">let</span> i = <span class="hljs-number">0</span>; i &lt; <span class="hljs-number">15</span>; i++) {
      <span class="hljs-keyword">const</span> functionName = <span class="hljs-string">`function-<span class="hljs-subst">${i}</span>`</span>;
      <span class="hljs-keyword">const</span> logGroup = <span class="hljs-keyword">new</span> LogGroup(<span class="hljs-built_in">this</span>, <span class="hljs-string">`Function-<span class="hljs-subst">${i}</span>-LogGroup`</span>, {
        logGroupName: <span class="hljs-string">`/aws/lambda/<span class="hljs-subst">${functionName}</span>`</span>,
        ...
      });

      <span class="hljs-keyword">const</span> consumerFunction = <span class="hljs-keyword">new</span> NodejsFunction(<span class="hljs-built_in">this</span>, <span class="hljs-string">`Function-<span class="hljs-subst">${i}</span>`</span>, {
        ...
      });

      <span class="hljs-keyword">new</span> Rule(<span class="hljs-built_in">this</span>, <span class="hljs-string">`EventBridgeLambdaInvokeRule-<span class="hljs-subst">${i}</span>`</span>, {
        eventBus: eventbus,
        eventPattern: {
          detailType: [<span class="hljs-string">'my.custom.detailtype'</span>],
        },
        targets: [ <span class="hljs-keyword">new</span> LambdaFunction(consumerFunction) ]
      }); 
    }
</code></pre>
<p><strong>Build time:</strong></p>
<ul>
<li><p>The duplicated logic problem again persists fro event evaluation, The complexity is same as using SNS per distribution of events one by one and has no supported batching.</p>
</li>
<li><p>The SQS can a target of rule so this can be a serverless and managed way of having Offloading, Batching, Temporary Storage.</p>
</li>
</ul>
<p><strong>Run time:</strong></p>
<ul>
<li><p>The events are distributed fast</p>
</li>
<li><p>The events are distributed based on filter policy if configured</p>
</li>
<li><p>The failures are handled in a exponential way and up to 24 hours and up to 185 times if the target is not reachable.</p>
</li>
<li><p>Each rule target can have a DLQ, and each rume can have up to 5 targets.</p>
</li>
<li><p>The latency and performance of distribution is near realtime.</p>
</li>
</ul>
<h2 id="heading-responsibilities-in-design">Responsibilities in Design</h2>
<p>When designing Event driven architecture the requirements are different based on requirements and the rate of changes by entity that application manages. When designing EDA the relevant questions to ask and brainstorm are somehow as below:</p>
<ul>
<li><p>The Rate of changes on a single entity</p>
</li>
<li><p>The variety of event types</p>
</li>
<li><p>The relation or ordered priority between event types</p>
</li>
<li><p>Who are the consumers?</p>
</li>
<li><p>What does care a consumer about?</p>
</li>
<li><p>Is the consumer in the same cell or business domain?</p>
</li>
</ul>
<p>Defining each software responsibilities and how it manages the value to other participants in a distributed system is one of the important pillars of a design, actually this is important to define how we operate.</p>
<h2 id="heading-ecommerce-example">Ecommerce Example</h2>
<p>The example provides a simple scenario to demonstrate a distributed system and how responsibilities around events are shared between different applications.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1707780105020/0ec229d5-af6f-4d58-91d7-6f4137940354.png" alt class="image--center mx-auto" /></p>
<p>The operational process will be as illustrated, from the order arrival until the product delivery. it s clear that in operational perspective this design works.</p>
<p>Zooming and focusing on Product and order service shows more details about the details.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1707781518263/34b39f23-4e24-4a6f-b1c7-29093a60f0c3.png" alt class="image--center mx-auto" /></p>
<p>In this level the e-commerce website process will be:</p>
<ul>
<li><p>The website shows the products listing</p>
</li>
<li><p>The order sent from website for selected products</p>
</li>
<li><p>Order service notifies product service by reception of order</p>
</li>
<li><p>The product service sends a replay to confirm the availability of product</p>
</li>
</ul>
<p>Looking at Order and Product communication.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1707782353558/2b4f3f83-0473-4334-acdb-3180d688c591.png" alt class="image--center mx-auto" /></p>
<p>The Order service disseminates the order events and listen to product availability response.</p>
<ul>
<li><p>The Product service by reception of <strong><em>order.recieved</em></strong> event verifies the availability of product and update the product stock availability</p>
</li>
<li><p>The product service send the <strong><em>product.available</em></strong> response message to notify the order service by availability of products in order</p>
</li>
<li><p>The order service updates the order status to <strong>IN_PROGRESS</strong> and</p>
</li>
</ul>
<p>The product lifecycle will be:</p>
<ul>
<li><p>The sellers add products via backoffice</p>
</li>
<li><p>The product service sends events to listing service</p>
</li>
<li><p>The listing service updates its local datasource to respond to website listing and search options.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1707784557342/6bc99154-544c-4234-9c97-65b5d377908c.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-edge-cases">Edge cases</h3>
<ul>
<li><p>If the user modifies the order by removing an item, the order system need to notify the the product service to avoid the stock being exhausted this is achievable by sending a new <strong><em>order.modified*</em></strong>event letting the product service to reattribute the canceled product count to the product stock. but a better approach will be sending a meaningful event behind a real fact, so order service will be responsible of verifying the change in the modified order and send the corresponding event to the product service like order.product_canceled giving the product id and the original count.*</p>
</li>
<li><p>If the user send in really short duration of time the creation and modification demands this can be useless to distribute both events to the product service, lets imagine the modification arrives before creation to the product service. it is clear this situation brings a lot of inconsistency to the product service.</p>
</li>
</ul>
<h3 id="heading-responsibilities">Responsibilities</h3>
<ul>
<li><p>Order service must guarantee the health of order reception, handling the change tracking and notify product service at a highest rate of trust. this means the order service must be able to react behind a stream of commands and take the best decision internally before distributing the events.</p>
</li>
<li><p>The product service must be able to delay the distribution, merge or abandon an event due to its actual internal state.</p>
</li>
</ul>
<h3 id="heading-using-right-services">Using right services</h3>
<p>To accomplish some sort of tradeoffs discussed in Edge cases and avoid giving the responsibility to event consumers take decision out of their domain context, choosing the write services and design is important and in cases can be a vital decision.</p>
<p>The following diagram shows how the Order service can adopt a design to cover those tradeoffs and become a self serving microservice.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1707785525094/c2826ef5-3f2c-4ff2-8637-4c635c304fc8.png" alt class="image--center mx-auto" /></p>
<p>In this design the order service treat the stream of orders , this let the order service internally use the techniques like batching to verify a batch of related events and verify them and validate if any event state must be calculated or refined before broadcasting the notifications or integration events to other systems.</p>
<h1 id="heading-source-code">Source Code</h1>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://github.com/XaaXaaX/aws-event-streaming/">https://github.com/XaaXaaX/aws-event-streaming/</a></div>
<p> </p>
<h1 id="heading-summary">Summary</h1>
<p>In this article i tried to show some characteristics of the Evenbridge, SNS, Kinesis that are highly used when it comes to event distribution and represent the detailed limits to help better you see the tradeoffs when deciding the best service or a combination.</p>
<p>Walking through the examples, some real representative scenarios are explored to help better understand the core problem behind the technical consideration and help align the Bounded contexts, softwares and responsibilities related to them at enterprise level.</p>
<p>Thinking about responsibilities any service has in a distributed design can bring a lot of advantages in long-term and protect the software to be more contextualised and well mastered on a single business problem, avoiding producing a high level of technical debt related to a miss of well definition of context and responsabilities.</p>
]]></content:encoded></item><item><title><![CDATA[AWS Bedrock - Suspicious message detection]]></title><description><![CDATA[The main role of any IT department is to enhance business capacity and make a real impact when dealing with technologies. As an internet user, I often come across various e-commerce platforms and use them to buy different items, whether it's a T-Shir...]]></description><link>https://blogs.serverlessfolks.com/aws-bedrock-suspicious-message-detection</link><guid isPermaLink="true">https://blogs.serverlessfolks.com/aws-bedrock-suspicious-message-detection</guid><dc:creator><![CDATA[Omid Eidivandi]]></dc:creator><pubDate>Wed, 06 Mar 2024 23:06:52 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1714487290540/866f6172-05f3-491e-9018-b76c5bd058cc.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The main role of any IT department is to enhance business capacity and make a real impact when dealing with technologies. As an internet user, I often come across various e-commerce platforms and use them to buy different items, whether it's a T-Shirt, fryer, car or even a house. Regardless of the product, I am always a buyer, and the person on the other end of the transaction is the seller.</p>
<p>You are looking for some products on DealOfDay and find one that you are interested in buying. You reach out to the seller either by asking for more details or expressing your interest in the product. DealOfDay provides a messaging feature to facilitate communication between buyers and sellers. However, some customers have reported receiving suspicious messages from sellers, which could potentially be fraudulent. To prevent such problems, the DayOfDeal product team has decided to implement real-time fraud detection to better protect the system and its users.</p>
<h2 id="heading-design">Design</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1708560174347/a1eb651c-059c-4a00-a196-8b8fc9ba8ca0.png" alt class="image--center mx-auto" /></p>
<p>The starting point is an Apigateway that connects directly with the conversation Dynamodb table. If you want to learn more about direct integrations and how API gateway interacts with other AWS services like DynamoDb, I previously wrote about those integrations and their behavior. You can find more information about it ( <a target="_blank" href="https://serverlessfolks.hashnode.dev/adaptable-scaling-using-api-gateway-direct-integration-part1">Here</a> ).</p>
<h2 id="heading-the-source-code-here"><strong><em>The source code here</em></strong></h2>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://github.com/XaaXaaX/aws-bedrock-conversation-fraud-detection">https://github.com/XaaXaaX/aws-bedrock-conversation-fraud-detection</a></div>
<p> </p>
<h2 id="heading-triggering-state-machine">Triggering State Machine</h2>
<p>Using the event bridge Pipes to integrate the DynamoDb streams and step function is as easy as the following snippet</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">new</span> CfnPipe(<span class="hljs-built_in">this</span>, <span class="hljs-string">'ConversationTablePipe'</span>, {
      roleArn: pipeRole.roleArn,
      source: <span class="hljs-built_in">this</span>.conversationTable.Table.tableStreamArn!,
      sourceParameters: {
        dynamoDbStreamParameters: {
          startingPosition: <span class="hljs-string">'LATEST'</span>,
        },
        filterCriteria: {
          filters: [{
            pattern: <span class="hljs-string">`{ 
              "eventName": [ "INSERT" ] 
            }`</span>}]
        }
      },
      target: <span class="hljs-built_in">this</span>.stateMachineStack.stateMachine.stateMachineArn,
      targetParameters: {
        stepFunctionStateMachineParameters: {
          invocationType: <span class="hljs-string">'FIRE_AND_FORGET'</span>,
        },
        inputTemplate: <span class="hljs-string">`
          {
            "Id": &lt;$.dynamodb.NewImage.Id.S&gt;, 
            "message": &lt;$.dynamodb.NewImage.message.S&gt;,
            "timestamp": &lt;$.dynamodb.NewImage.timestamp.S&gt;
          }`</span>
      }
</code></pre>
<p>The pipe reads from DDB streams and invokes the state machine asynchronously.</p>
<h2 id="heading-conversation-history">Conversation History</h2>
<p>The solution receives the messages one by one, having messages individually is interesting but as a conversation needs to keep track of a contextual exchange we need to register all conversations to give a context back to Bedrock, this helps to prepare a better Prompt and leading the LLM provide a better result.</p>
<p>The DDB database keeps track of all messages in a conversation by keeping a Conversation Id as PK alongside the message and the message submission timestamp.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>PK</td><td>Timestamp</td><td>Message</td></tr>
</thead>
<tbody>
<tr>
<td>o10TkLBnRjF9sNx65xaRB</td><td>2024-02-22T19:40:03+00:00</td><td>Hello, i m interested and would like to schedule a visit, when is your preferred schedule for next week?</td></tr>
</tbody>
</table>
</div><h2 id="heading-the-workflow">The workflow</h2>
<p>The workflow consists of the following steps</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1708635969583/045458d2-4be9-4142-8161-9c4a3c50a770.png" alt class="image--center mx-auto" /></p>
<p>The Workflow can be described as below</p>
<ul>
<li><p>Query DDB Table per ConversationId</p>
</li>
<li><p>Fetch Prompt Markdown from S3 bucket</p>
</li>
<li><p>Transform The prompt using a Pass State</p>
</li>
<li><p>Call Bedrock Model to get results</p>
</li>
</ul>
<p>The snippet represents the workflow implementation using AWS CDK</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> defintion = <span class="hljs-keyword">new</span> CustomState(<span class="hljs-built_in">this</span>, <span class="hljs-string">'Query Conversatuion'</span>, {
      stateJson: {
        Type: <span class="hljs-string">'Task'</span>,
        Resource: <span class="hljs-string">"arn:aws:states:::aws-sdk:dynamodb:query"</span>,
        Parameters: {
          TableName: props.Table.tableName,
          ScanIndexForward: <span class="hljs-literal">true</span>,
          KeyConditionExpression: <span class="hljs-string">`Id = :id`</span>,
          ExpressionAttributeValues: {
            <span class="hljs-string">":id"</span>: {
              <span class="hljs-string">"S.$"</span>: JsonPath.stringAt(<span class="hljs-string">'$[0].Id'</span>)
            }
          }
        },
        ResultSelector: {
          <span class="hljs-string">'messages.$'</span>: <span class="hljs-string">'$.Items'</span>
        },
        ResultPath: <span class="hljs-string">'$'</span>
      }
    }).next(<span class="hljs-keyword">new</span> CustomState(<span class="hljs-built_in">this</span>, <span class="hljs-string">'Recap Conversation'</span>, {
      ....
      }
    })).next(<span class="hljs-keyword">new</span> CustomState(<span class="hljs-built_in">this</span>, <span class="hljs-string">'Prompt Preparation'</span>, {
      stateJson: {
        Type: <span class="hljs-string">'Task'</span>,
        Resource: <span class="hljs-string">"arn:aws:states:::aws-sdk:s3:getObject"</span>,
        Parameters: {
          Bucket: props.Bucket.bucketName,
          Key: <span class="hljs-string">"prompt.txt"</span>
        },
        ResultSelector: {
          <span class="hljs-string">'body.$'</span>: <span class="hljs-string">'$.Body'</span>
        },
        ResultPath: <span class="hljs-string">'$.prompt'</span>
      }
    })).next(<span class="hljs-keyword">new</span> Pass(<span class="hljs-built_in">this</span>, <span class="hljs-string">'Format Prompt'</span>, {
      parameters: {
        <span class="hljs-string">"output.$"</span>: <span class="hljs-string">"States.Format($.prompt.body, $.messages)"</span>
      }
    })).next(<span class="hljs-keyword">new</span> BedrockInvokeModel(<span class="hljs-built_in">this</span>, <span class="hljs-string">'Invoke Model With Prompt'</span>, {
      contentType: <span class="hljs-string">"application/json"</span>,
      model: {
        modelArn: props.modelArn,
      },
      body: TaskInput.fromObject(
        {
          inputText: JsonPath.stringAt(<span class="hljs-string">'$.output'</span>),
        },
      ),
    }));
</code></pre>
<h2 id="heading-run-the-solution">Run the Solution</h2>
<p>After deploying the solution we can call the API gateway giving a message, a conversation id and a message date time.</p>
<pre><code class="lang-json">{
    <span class="hljs-attr">"conversationid"</span>: <span class="hljs-string">"Qt-K5wvjjF4O4m-qSYWtW"</span>,
    <span class="hljs-attr">"timestamp"</span>: <span class="hljs-string">"2024-02-22T19:40:03+00:00"</span>,
    <span class="hljs-attr">"message"</span>: <span class="hljs-string">"Buyer: Hello i am interested in your house, can we fix a visit?"</span>
}
</code></pre>
<p>Sending the request will trigger the step function workflow</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1708636376952/4cfb9ff3-18df-464e-a25f-b9fa4bbcb6f5.png" alt class="image--center mx-auto" /></p>
<p>These are the most whole workflow steps from start to the end resulting in a Suspicious rate being 0 or 1.</p>
<h2 id="heading-the-prompt">The Prompt</h2>
<p>This is a really simple part of this example, defining a prompt for this use case was not too hard, The prompt is a predefined paragraph that includes a variable placeholder.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">You are a conversation moderator on the DeadOfDay site and you are monitoring the following conversation between these two people:</span> <span class="hljs-string">' {} '</span><span class="hljs-string">,</span> <span class="hljs-string">Rate</span> <span class="hljs-string">me</span> <span class="hljs-string">the</span> <span class="hljs-string">suspicious</span> <span class="hljs-string">nature</span> <span class="hljs-string">of</span> <span class="hljs-string">this</span> <span class="hljs-string">conversation</span> <span class="hljs-string">based</span> <span class="hljs-string">on</span> <span class="hljs-string">the</span> <span class="hljs-string">messages</span> <span class="hljs-string">exchanged.</span> <span class="hljs-string">Only</span> <span class="hljs-string">answer</span> <span class="hljs-string">me</span> <span class="hljs-string">with</span> <span class="hljs-number">1</span> <span class="hljs-string">if</span> <span class="hljs-string">the</span> <span class="hljs-string">conversation</span> <span class="hljs-string">seems</span> <span class="hljs-string">suspicious</span> <span class="hljs-string">to</span> <span class="hljs-string">you</span> <span class="hljs-string">and</span> <span class="hljs-number">0</span> <span class="hljs-string">otherwise.</span>
</code></pre>
<p>You. can find the ' <strong>{}</strong> ' placeholder in the middle of the prompt, this is the way to let the ASL replace it with the corresponding conversation.</p>
<pre><code class="lang-yaml"><span class="hljs-string">States.Format($.prompt.body,</span> <span class="hljs-string">$$.Execution.Input[0].message)</span>
</code></pre>
<h2 id="heading-bedrock-results">Bedrock results</h2>
<p>Sending the prompt to Bedrock is simple using the step-functions service integration, the state machine definition will be as below</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"End"</span>: <span class="hljs-literal">true</span>,
  <span class="hljs-attr">"Type"</span>: <span class="hljs-string">"Task"</span>,
  <span class="hljs-attr">"Resource"</span>: <span class="hljs-string">"arn:aws:states:::bedrock:invokeModel"</span>,
  <span class="hljs-attr">"Parameters"</span>: {
    <span class="hljs-attr">"ModelId"</span>: <span class="hljs-string">"arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-text-lite-v1"</span>,
    <span class="hljs-attr">"ContentType"</span>: <span class="hljs-string">"application/json"</span>,
    <span class="hljs-attr">"Body"</span>: {
      <span class="hljs-attr">"inputText.$"</span>: <span class="hljs-string">"$.output"</span>
    }
  }
}
</code></pre>
<p>Calling bedrock and based on our prompts the results will be 0 or 1 to indicate <strong>'Is It Suspicious?',</strong> 1 means Yes, and 0 No.</p>
<p>Here is the Bedrock response for the test message shown above.</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"Body"</span>: {
    <span class="hljs-attr">"inputTextTokenCount"</span>: <span class="hljs-number">77</span>,
    <span class="hljs-attr">"results"</span>: [
      {
        <span class="hljs-attr">"tokenCount"</span>: <span class="hljs-number">2</span>,
        <span class="hljs-attr">"outputText"</span>: <span class="hljs-string">" 0"</span>,
        <span class="hljs-attr">"completionReason"</span>: <span class="hljs-string">"FINISH"</span>
      }
    ]
  },
  <span class="hljs-attr">"ContentType"</span>: <span class="hljs-string">"application/json"</span>
}
</code></pre>
<p>The <strong>outputText</strong> represents the LLM answer.</p>
<p>Now, Let's continue the conversation</p>
<pre><code class="lang-json">{
    <span class="hljs-attr">"conversationid"</span>: <span class="hljs-string">"Qt-K5wvjjF4O4m-qSYWtW"</span>,
    <span class="hljs-attr">"timestamp"</span>: <span class="hljs-string">"2024-02-22T19:54:44+00:00"</span>,
    <span class="hljs-attr">"message"</span>: <span class="hljs-string">"Seller: Yes sure, just you need to send me a 5000 dollar of deposit before the visit"</span>
}
</code></pre>
<p>The above request will result in a Suspicious message detection if we apply a context, being the previous conversation message</p>
<pre><code class="lang-json">- Buyer: Hello, I am interested in your house, can we fix a visit? 
- Seller: Yes sure, just you need to send me a <span class="hljs-number">5000</span> dollar of deposit before the visit
</code></pre>
<h2 id="heading-the-importance-of-context">The importance of context</h2>
<p>On 7 April 2020, I became a dad, this was a big step but at the same time during frustration I learned a lot, Never say a single Yes or No, This is not helpful when you talk with a brain, a brain needs a context either the result will not be effectively corresponding to the wished one, but it can be helpful in a second phase.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1708633965324/a0615a31-c341-4a41-95dd-9863e98bceb4.jpeg" alt class="image--center mx-auto" /></p>
<p>Giving the second request ( the seller suspicious message) without providing the conversation history results in the following response from <strong>Amazon Titan Text V1 Lite,</strong> With an Output text as a <strong>'0'</strong> meaning that the message is not suspicious.</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"Body"</span>: {
    <span class="hljs-attr">"inputTextTokenCount"</span>: <span class="hljs-number">83</span>,
    <span class="hljs-attr">"results"</span>: [
      {
        <span class="hljs-attr">"tokenCount"</span>: <span class="hljs-number">2</span>,
        <span class="hljs-attr">"outputText"</span>: <span class="hljs-string">"0"</span>,
        <span class="hljs-attr">"completionReason"</span>: <span class="hljs-string">"FINISH"</span>
      }
    ]
  },
  <span class="hljs-attr">"ContentType"</span>: <span class="hljs-string">"application/json"</span>
}
</code></pre>
<p>To better prove the need for a context I tried to invert the conversation message order just by setting the <strong>'ScanIndexForward: false</strong>' this will revert the query order.</p>
<p>Having the previous conversation as</p>
<ul>
<li><p>Buyer......</p>
</li>
<li><p>Seller......</p>
</li>
</ul>
<p>The inverted conversation will be as</p>
<ul>
<li><p>Seller.....</p>
</li>
<li><p>Buyer ....</p>
</li>
</ul>
<p>Having a suspicious conversation because of seller message previously, this time the LLM will indicate the conversation as an unsuspicious conversation even having the seller message inside.</p>
<p>Enjoy Reading</p>
]]></content:encoded></item><item><title><![CDATA[AWS Lambda Runtime debate]]></title><description><![CDATA[I enjoy following the AWS Lambda runtime battle since I use AWS Lambda as part of my work. I have experimented with several runtimes including C#, Node.js, Python, and Java, and have learned a lot about their respective pros and cons. It has been an ...]]></description><link>https://blogs.serverlessfolks.com/aws-lambda-runtime-debate</link><guid isPermaLink="true">https://blogs.serverlessfolks.com/aws-lambda-runtime-debate</guid><dc:creator><![CDATA[Omid Eidivandi]]></dc:creator><pubDate>Wed, 06 Mar 2024 23:06:48 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1714487350024/5937978c-0d53-4f6e-b882-139d980efb54.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I enjoy following the AWS Lambda runtime battle since I use AWS Lambda as part of my work. I have experimented with several runtimes including C#, Node.js, Python, and Java, and have learned a lot about their respective pros and cons. It has been an interesting journey with many movements between runtimes, but ultimately a great learning experience.</p>
<h1 id="heading-lambda-runtime">Lambda Runtime</h1>
<p>Lambda functions run inside a container image for managed and custom runtimes. In both cases, the runtime prepares an execution environment, and the container provides a runtime interface to interact with it.</p>
<p>AWS lambda offers the following supported managed runtimes</p>
<ul>
<li><p>NodeJs</p>
</li>
<li><p>Python</p>
</li>
<li><p>.Net</p>
</li>
<li><p>Java</p>
</li>
<li><p>Ruby</p>
</li>
</ul>
<p>AWS Lambda provides OS-Only runtimes with a runtime interface integrated to offer an operational runtime. The OS-Only runtime is an excellent choice for ahead-of-time (AOT) compiled languages. On the other hand, custom images are perfect for runtimes not managed by AWS and those that are not AOT.</p>
<h1 id="heading-il-amp-jit">IL &amp; JIT</h1>
<p>Compilers serve the purpose of translating high-level languages into low-level languages for processors to comprehend. This process optimizes the code and enhances performance during runtime. In certain programming languages like C#, the compiler generates the Intermediate Language (IL) to optimize the runtime process and Just in Time (JIT) as a means of communication between the application and the processor during runtime. Although this method has proved to be effective, it requires a language-specific runtime to be available on the container at runtime. To see deeply this process , <a target="_blank" href="http://sharplab.io">sharplab.io</a> will help to visualize the different levels of interpretation of IL and JIT from a high-level language syntax (example <a target="_blank" href="https://sharplab.io/#v2:C4LghgzgtgNAJiA1AHwAICYCMBYAUKgZgAIMiBhIgbzyNpONQBYiBZACgEoqa6BfPXkA">here</a>).</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1708045031842/eb786c05-2a89-4bfa-8724-d5439ee57c81.png" alt class="image--center mx-auto" /></p>
<h1 id="heading-aot">AOT</h1>
<p>AOT, which stands for Ahead of Time, is a type of compilation process that helps to minimize the interpretation process at runtime. During the build phase, the AOT interprets the software code into a processor comprehensive language and creates a self-contained package. This package allows the software to run in any container without requiring a language-specific runtime to be available.</p>
<p>One of the great advantages of AOT is that it provides a way to interpret to a low-level language or for static languages to interpret to machine code. During the compilation, AOT applies the necessary optimizations that were traditionally done during JITing. However, as a disadvantage, the optimization cannot be done at 100% (but mostly done) because some optimizations must be evaluated at runtime. In this kind of situation, runtime JITing shows its proper advantages.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1708045195650/b66aa940-daa8-496f-88e1-e8da3710cfda.png" alt class="image--center mx-auto" /></p>
<h1 id="heading-experience-of-il-jit-and-aot">Experience of IL, JIT and AOT</h1>
<p>In my experience, when working with CRUD applications, using AOT (Ahead-of-Time) compilation was the best option because there were no complexities that required JIT (Just-in-Time) compilation at run-time. However, in other scenarios where there were long-running, CPU-intensive calculations for large batches of data and effective parallelization was needed, using Dotnet6 runtime performed better. For transactional processing with small batches of data (20 items), AOT compilation was found to be better performing. I believe it would be worth revisiting and re-preparing the examples I worked on years ago and writing a dedicated article on this topic.</p>
<h1 id="heading-rust-amp-go-functions">Rust &amp; Go functions</h1>
<p>I have only written a single function in Go throughout my entire experience, and I've never written a line of code in Rust either. However, I find it fascinating to follow the community and hear about these languages. Recently, I learned from <a target="_blank" href="https://www.linkedin.com/in/benjamenpyle/"><strong>Benjamen Pyle</strong></a> that Go and Rust are AOT and provide self-contained packages. This news made my day, and I tried to discuss it with my friends. Everyone was excited about it, and we decided to give it a try.</p>
<p>As a part of an experiment, I followed a blog post by Benjamin Pyle.</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://www.binaryheap.com/serverless-rust-developer-experience/">https://www.binaryheap.com/serverless-rust-developer-experience/</a></div>
<p> </p>
<p>For my first lines of code in rust, I was volunteered to errors, I started to change a simple handler in rust adding an array and using some manipulation on that array, surprisingly I found that the compiler stopped me from producing the runtime race conditions or bugs in production, this is a great DX, I felt this behavior like a good linter but in reality it’s not just a linter and it s a real thing that I can not give it a name. I am discovering rust and see what is under the hood so I let you discover this DX and share your observation.</p>
<h1 id="heading-node-js">Node Js</h1>
<p>Since I moved to AWS, I used mostly typescript ( Nodejs ) and I'm already happy about this jump from C# ( This was due to lack of performance in .netcore1 runtime in 2018 ). The Nodejs performs well in most scenarios and could tackle Python, the new versions are more performant, and in some benchmarks, I found nodejs close to Go ( Go performance is great and enough for me )</p>
<p>Nodejs Runtime is managed by AWS and the migrations from 10 to 12, 14, 16, 18, and now 20 are always done smoothly for me. The nodejs in most cases have a two-digit milliseconds execution duration and the cold start is acceptable in any business in which I was involved (the last time I observed the cold start rate it was ~0.06 % in the production account).</p>
<h1 id="heading-back-to-2018">Back to 2018</h1>
<p>Back in 2018 in our company, we moved from C# to nodejs for serverless solutions, there were points why we preferred to continue using C#.</p>
<ul>
<li><p>500 C# developers and the move was not fast</p>
</li>
<li><p>Existing ecosystem, there was a wide range of internal c# libraries that helped the teams to apply easily the best practices.</p>
</li>
<li><p>Reuse of existing code, For a while, we pushed the hexagonal design in most of our complex software the reuse of business logic was easily achievable.</p>
</li>
<li><p>The risk of degradation while rewriting code was high and as we were adopting a fast delivery approach and a rapid migration it was preferable to use C#.</p>
</li>
</ul>
<h1 id="heading-living-with-nodejs">Living with NodeJs</h1>
<p>There were no particular complexities and problems apart from stopping using the same logic in a new language. With nodejs we had a good-performing event-driven platform with optimised performance and less headache than before. The first benchmarking and results were surprising, for nodejs 300–500 ms cold start and 100–200 ms execution time vs c# based system of 200ms average in the existing container-based system. The gain was not evident but after going into production our observations was</p>
<ul>
<li><p>The cold start happened occasionally</p>
</li>
<li><p>The Managed runtime and support for nodejs was faster than dotnet by aws</p>
</li>
<li><p>Typescript is great as we came from an OOP and static language world.</p>
</li>
<li><p>Project layout was simpler to evaluate and change in the JS ecosystem, we just all-time used tree-shaking, and everything went well</p>
</li>
<li><p>It was simpler to hire AWS serverless Nodejs profiles than C#</p>
</li>
</ul>
<h1 id="heading-keep-going-with-nodejs-while-suffering">Keep going with NodeJs while suffering</h1>
<p>We continued and will continue NodeJs for all the above-mentioned reasons. but there are suffering moments while using NodeJs.</p>
<ul>
<li><p>Changing thousands of functions is not simple with every deprecation</p>
</li>
<li><p>Putting the Build &amp; Run mindset was simple but the mindset for upgrading to the latest runtime is hard to settle for all staff</p>
</li>
<li><p>Every upgrade needs a whole regression test phase to be sure as the code failures are at runtime while everything goes well locally.</p>
</li>
<li><p>While the NodeJs V8 engine does some sort of compilation there is no way of having a 100% assured package vs dotnet as a compiled language.</p>
</li>
</ul>
<h1 id="heading-decouple-the-software-from-container-details">Decouple the software from container details</h1>
<p>This is the exact thing that happens when using AOT and this is possible for dotnet Go and Rust. This may be achieved for Nodejs with some over-engineering but to be honest I tried and never had success.</p>
<h1 id="heading-llrt">LLRT</h1>
<p>Recently AWS provided the LLRT runtime that brings real optimisation and is very close to well-performing runtimes running in AWS lambda functions. The LLRT comes with some sort of limitations or restricted capabilities by design as mentioned here in the documentation.</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://github.com/awslabs/llrt">https://github.com/awslabs/llrt</a></div>
<p> </p>
<p>The runtime provides the very basic functionalities as indicated in the documentation but for me, it covers most of my function needs.</p>
<h2 id="heading-more-needed">More needed</h2>
<p>As nodejs is around modules and packages If a function needs more capabilities for a use case we can add that package and put it as part of the function package. this is actually how it works for all packages I use.</p>
<h1 id="heading-example">Example</h1>
<p>To start my experimentation I found a great repository letting me start my tests on CDK and include the llrt runtime in the bundling process.</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://github.com/tmokmss/cdk-lambda-llrt">https://github.com/tmokmss/cdk-lambda-llrt</a></div>
<p> </p>
<p>The construct use a <strong><em>afterbundling</em></strong> hook to add the llrt as part of packages</p>
<pre><code class="lang-typescript">         afterBundling: <span class="hljs-function">(<span class="hljs-params">i, o</span>) =&gt;</span> [
            <span class="hljs-comment">// Download llrt binary from GitHub release and cache it</span>
            <span class="hljs-string">`if [ ! -e <span class="hljs-subst">${i}</span>/.tmp/<span class="hljs-subst">${arch}</span>/bootstrap ]; then
              mkdir -p <span class="hljs-subst">${i}</span>/.tmp/<span class="hljs-subst">${arch}</span>
              cd <span class="hljs-subst">${i}</span>/.tmp/<span class="hljs-subst">${arch}</span>
              curl -L -o llrt_temp.zip <span class="hljs-subst">${binaryUrl}</span>
              unzip llrt_temp.zip
              rm -rf llrt_temp.zip
             fi`</span>,
            <span class="hljs-string">`cp <span class="hljs-subst">${i}</span>/.tmp/<span class="hljs-subst">${arch}</span>/bootstrap <span class="hljs-subst">${o}</span>/`</span>,
          ],
</code></pre>
<p>I created a sample repository for both NodeJs20 and LLRT lambda functions using CDK.</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://github.com/XaaXaaX/aws-lambda-nodejs-llrt">https://github.com/XaaXaaX/aws-lambda-nodejs-llrt</a></div>
<p> </p>
<p>To deploy the functions just run following command</p>
<pre><code class="lang-bash"><span class="hljs-built_in">cd</span> cdk &amp;&amp; npm i &amp;&amp; npm run cdk:app deploy
</code></pre>
<p>Looking at cdk path the <strong><em>.tmp</em></strong> folder represents the llrt bootstrap on ARM64</p>
<blockquote>
<p>The example use Function url to simplify the testing</p>
</blockquote>
<p>Invoking the LLRT function Url and looking at cloudwatch logs gives interesting insights</p>
<p><strong>Init: 54.52 ms</strong></p>
<p><strong>Duration: 1.63 ms</strong></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1708008807357/c047b4d4-eb0d-4a44-b902-a52357ad2c33.png" alt class="image--center mx-auto" /></p>
<p>i was curious to invoke the warm container again and this was weird to me</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1708009092944/28c6c970-cd95-49b8-8389-1842114fac7c.png" alt class="image--center mx-auto" /></p>
<p>Try again and again this was changing all over the time and in max duration i had a report as below</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1708009314982/c995d724-7e85-4ac9-aac0-7fe5076f1daf.png" alt class="image--center mx-auto" /></p>
<p>This is 7 times more than previous invocations but it is a great performance yet.</p>
<blockquote>
<p>The execution environment while responding to requests permanently has a single-digit duration of around 1.4 to 1.9 ms, but if I leave the execution environment for a while and use the same execution environment again the first duration was 2-digit milliseconds but the average duration is always really interesting</p>
</blockquote>
<p>I would like to compare this with the managed Nodejs20 by invoking nodejs function function url</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1708009998695/295ac7ce-0488-4d7f-ae82-96e75d2a5617.png" alt class="image--center mx-auto" /></p>
<p>Interesting results, this is a classic nodejs function that I observe in production so no surprise, but trying to send more invocations was when I tried to focus more and more on details, something was interesting happening that I never experienced while observing Nodejs18.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1708010186147/b45ef7fd-5636-417a-b23d-13194940f4b0.png" alt class="image--center mx-auto" /></p>
<p>The Nodejs 20 durations are great for the same code , i had same results as LLRT for managed runtime .</p>
<p>The other point got my attention was memory consumption.</p>
<p>LLRT: 20MB to 25MB</p>
<p>Nodejs: 89MB - 93MB</p>
<p>And a last point was the <strong>Billed Duration</strong> and this is normal as LLRT is not a managed runtime and you are billed for init phase + duration</p>
<h1 id="heading-estimation">Estimation</h1>
<p>Finally i was curious to estimate the cost of running LLRT and NodeJS</p>
<p>As i had no more credits in my accounts i decided to estimate per pricing page by introducing a scenario but finally i decided to run a solid test using artillery</p>
<pre><code class="lang-yaml"><span class="hljs-attr">config:</span>
  <span class="hljs-attr">target:</span> <span class="hljs-string">https://XXXXXXXXXXXXXXX.lambda-url.eu-west-1.on.aws</span>
  <span class="hljs-attr">phases:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">duration:</span> <span class="hljs-number">1200</span>
      <span class="hljs-attr">arrivalRate:</span> <span class="hljs-number">100</span>
      <span class="hljs-attr">name:</span> <span class="hljs-string">"Test Nodejs 20"</span>
  <span class="hljs-attr">environments:</span>
    <span class="hljs-attr">production:</span>
      <span class="hljs-attr">target:</span> <span class="hljs-string">https://YYYYYYYYYYYYYYYYY.lambda-url.eu-west-1.on.aws</span>
      <span class="hljs-attr">phases:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">duration:</span> <span class="hljs-number">1200</span>
          <span class="hljs-attr">arrivalRate:</span> <span class="hljs-number">100</span>
          <span class="hljs-attr">name:</span> <span class="hljs-string">"Test LLRT"</span>
<span class="hljs-attr">scenarios:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">flow:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">get:</span>
          <span class="hljs-attr">url:</span> <span class="hljs-string">"/"</span>
</code></pre>
<p>Memory Configured : 128Mb</p>
<p>Architecture: ARM64</p>
<p>Request count: ~6 000 000</p>
<p>After using artillery to do a spike test and verifying the results with following cloudwatch log insight query</p>
<pre><code class="lang-bash">filter ispresent(@duration) |
stats sum(@billedDuration), sum(@maxMemoryUsed/1024/1024), count(*) as requestcount
</code></pre>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Runtime</td><td>Duration GB-Second</td><td>Price</td><td>Request</td><td>Price</td></tr>
</thead>
<tbody>
<tr>
<td>NodeJs 20.x</td><td>19001</td><td>~0.2533</td><td>6 090 000 000</td><td>1218</td></tr>
<tr>
<td>LLRT</td><td>19783</td><td>~0.2637</td><td>6 090 000 000</td><td>1218</td></tr>
</tbody>
</table>
</div><p>This shows off well that the pricing is close and in kind of typical business with these sorts of loads, it is just equivalent. To be honest, the LLRT is in the experimental phase and I will never adopt it for production but what LLRT offers we can achieve using Managed Runtimes, the only GAP is a self-contained package that can be handy in lots of use cases ideally for software code that I run.</p>
<h1 id="heading-a-final-word">A final word</h1>
<p>Often, in companies, we define a tech radar that considers all historical, strategic, and technical aspects. The last consideration is the plan. There are a lot of details to decide whether to adopt a programming language, tooling, or vendor services. The cost of moving to a new programming language is often higher than improving the practices and optimization (at least in my experience), but the fact that tech grows is evident and if we do not question ourselves internally, or externally, we will never grow.</p>
<p>Going to Rust is a nice move. Using LLRT is fascinating. Go is great and C# has and has for me the greatest and cleanest syntax I have found in some languages I have worked with, like Node.js, Python, F#, Delphi, VB and C++, and Java, a lot close to Java, but I moved to Node.js ecosystem after 20 years of .NET programming just because of tradeoffs and the tradeoffs are everywhere. But approving tradeoffs as a fact is also important. Do we stop c# or Serverless ? For sure, c# is the loser in this game. Do we adopt LLRT, or do we improve our current software? For sure, improving is faster than moving to LLRT and doing lots of time-consuming tests.</p>
<p>The adoption of new things in a business is not for fun and must have an impact on the system.</p>
]]></content:encoded></item><item><title><![CDATA[AWS Step Functions Distributed Map]]></title><description><![CDATA[The AWS Serverless ecosystem had a lot of power for a longtime and AWS StepFunctions was for a significant years part of. The decomposition of a big system is a golden concept of design where any single component brings a value and that collaboration...]]></description><link>https://blogs.serverlessfolks.com/aws-step-functions-distributed-map</link><guid isPermaLink="true">https://blogs.serverlessfolks.com/aws-step-functions-distributed-map</guid><dc:creator><![CDATA[Omid Eidivandi]]></dc:creator><pubDate>Wed, 06 Mar 2024 23:06:45 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1714487410241/a678ea7d-293f-4403-91d3-8016fb1eb2a4.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The AWS Serverless ecosystem had a lot of power for a longtime and AWS StepFunctions was for a significant years part of. The decomposition of a big system is a golden concept of design where any single component brings a value and that collaboration gives the real meaning to all those components. The choreography is a way of collaborating but it s not the best fit for all cases as we need to collaborate in an orchestrated manner and a coordinator pattern need to be the center of that system to achieve an optimised collaboration. AWS StepFunctions is part of Serverless ecosystem that helps achieve that desired coordination.</p>
<p>But StepFunction is not today just a Serverless orchestration service and let to achieve more significant results in different distributed scenarios. By the release of Distributed Maps, it became a good fit for distributed data manipulation like large files ( JSON, CSV ) or files in an S3 bucket.</p>
<h2 id="heading-distributed-map">Distributed Map</h2>
<p>In general words , the way distributed map works is by dividing a large asset into chunks and let them be treated in isolation and simultaneously.</p>
<p>In technical words, Distributed map parallelize the treatment of batch of items in isolation logical contexts called <strong>child workflow executions</strong>. This isolation also helps to control in isolation the errors that any batch of items can encounter during treatment.</p>
<p>Distributed map helps control the collaboration between different services via refined configurations like <strong>MaxConcurrency.</strong></p>
<h2 id="heading-configuration">Configuration</h2>
<p>Some useful configuration to understand are explained here but to have a better understanding i recommend to refer to the <a target="_blank" href="https://docs.aws.amazon.com/step-functions/latest/dg/use-dist-map-orchestrate-large-scale-parallel-workloads.html">AWS Documentation</a></p>
<p><strong><em>MaxConcurrency</em></strong>: maxConcurrency( default 1000 ) is a mean to specify the number of concurrent child workflow executions that can be executed simultaneously.</p>
<p><strong><em>ItemReader:</em></strong> This config indicates the source of dataset that the distributed map must use to fetch the data. this can be a S3 bucket, A CSV or a JSON File.</p>
<p><strong><em>ItemBatcher:</em></strong> You can define the Max number of items or Max Bytes of Input Distributed map passes to each child workflow execution.</p>
<p><strong><em>ItemProcessor:</em></strong> The configuration that represent the workflow definition and states of workflow to treat the batch of items as well the ProcessorConfig to indicate the type of StateMachine being STANDARD or EXPRESS and The mode.</p>
<h2 id="heading-how-distribution-works">How distribution works</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1705177485323/1778f154-3833-4dd8-9640-602209638bd5.png" alt class="image--center mx-auto" /></p>
<ul>
<li><p>Reader Reads the DataSet per ItemReader config</p>
</li>
<li><p>Batching Batch data to array of items per ItemBatcher config</p>
</li>
<li><p>Item Processor Distribute the batch sets of items and execute the child workflow executions giving a batch of items as input.</p>
</li>
</ul>
<blockquote>
<p>You can configure Item Selector and Result Writer , to lean more about refer to <a target="_blank" href="https://docs.aws.amazon.com/step-functions/latest/dg/use-dist-map-orchestrate-large-scale-parallel-workloads.html#map-state-distributed-additional-fields">Documentation</a></p>
</blockquote>
<h2 id="heading-map-run-resource">Map Run resource</h2>
<p>Map Run resource behaves as a coordinator for Child Workflow executions by coordinating following details.</p>
<ul>
<li><p>Concurrency</p>
</li>
<li><p>Batching</p>
</li>
<li><p>Keeping Track of Child execution States</p>
</li>
</ul>
<p>in Practice when you define MaxConcurrency or MaxItemsPerBatch as in our example, the Map Run associate items by assuring that the Batch Size does not bypass the 256 KB and based on batching possibilities allocates concurrent child executions.</p>
<h2 id="heading-try-it-on-your-own">Try it on your own</h2>
<p>This article source code is publicly accessible here, If you would like to try the examples and different patterns follow the instructions in README.md file in this repository</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://github.com/XaaXaaX/s3-objects-manipulation-distributed-map">https://github.com/XaaXaaX/s3-objects-manipulation-distributed-map</a></div>
<p> </p>
<h2 id="heading-using-simple-distributed-maps">Using Simple Distributed Maps</h2>
<p>In this example we read a large number of files for a s3 bucket ( 65K Json objects of 200KB ) and will look in details how the workflow behaves.</p>
<p>The parent DISTRIBUTED map configuration has following details</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Configuration</td><td>Value</td></tr>
</thead>
<tbody>
<tr>
<td>MaxItemsPerBatch</td><td>5000</td></tr>
<tr>
<td>MaxConcurrency</td><td>10000</td></tr>
</tbody>
</table>
</div><p>The INLINE map configuration ,inside the parent Distributed map, has following details</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Configuration</td><td>Value</td></tr>
</thead>
<tbody>
<tr>
<td>MaxConcurrency</td><td>40 ( Max Recommendation )</td></tr>
</tbody>
</table>
</div><p>The Example will be as illustrated in the following figure</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1705354457142/6ff47842-3bc7-446c-bc14-b60d872e791f.png" alt class="image--center mx-auto" /></p>
<p>Running this example longs around 10 minutes and will result a failed status after that time due to history limit of 25000 ( hard quota ) .</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1705354740141/ddf34944-d377-48bf-ac36-a5bfee15b02c.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1705354781566/693327f2-460d-4638-877c-fd750304fea7.png" alt class="image--center mx-auto" /></p>
<p>Changing the batching MaxItemsPerBatch configuration is one of the most straight forward solutions to this limitation. After setting the MaxItemsPerBatch to 1000 items the process same s3 objects will result a success status with a duration of <strong>05:40.531 ( 5.5 Minutes )</strong>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1705358279995/f9cf44a8-6e48-4bf6-93ba-41b59ecbec7d.png" alt class="image--center mx-auto" /></p>
<p>This result can reasonably cover a variety of scenarios but for scenrios with large amount of data can take a long time and does not seem the best fit in term of operation, performance and requirements.</p>
<h2 id="heading-using-nested-distributed-map">Using Nested Distributed Map</h2>
<p>In this example, the configuration is partially same as before , we read a large number of files for a s3 bucket ( 65K Json objects of 200KB ), the only difference is that for Chile Workflow execution is encapsulated in a second ( Nested ) Distributed Map to add a second level of parallelism at batch level.</p>
<p>The parent DISTRIBUTED map configuration has following details</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Configuration</td><td>Value</td></tr>
</thead>
<tbody>
<tr>
<td>MaxItemsPerBatch</td><td>5000</td></tr>
<tr>
<td>MaxConcurrency</td><td>10000</td></tr>
</tbody>
</table>
</div><p>The Nested DISTRIBUTED map configuration has following details</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Configuration</td><td>Value</td></tr>
</thead>
<tbody>
<tr>
<td>MaxItemsPerBatch</td><td>50</td></tr>
<tr>
<td>MaxConcurrency</td><td>1000</td></tr>
</tbody>
</table>
</div><p>The nested INLINE map configuration has following details</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Configuration</td><td>Value</td></tr>
</thead>
<tbody>
<tr>
<td>MaxConcurrency</td><td>40 ( Max Recommendation )</td></tr>
</tbody>
</table>
</div><p>The Example will be as illustrated in the following figure</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1705359304742/1186542c-dd7b-47c2-a0de-0ce0fe6960b1.png" alt class="image--center mx-auto" /></p>
<p>Running an execution create a first <strong>Map Run</strong> resource ( More info <a target="_blank" href="https://docs.aws.amazon.com/step-functions/latest/dg/use-dist-map-orchestrate-large-scale-parallel-workloads.html#dist-map-orchestrate-parallel-workloads-key-terms">here</a> )</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1705359830859/ce4bbc3b-7681-4e5f-a127-d7496185df99.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1705360787684/cc20f276-02d1-4d62-8d1c-7475c74784b5.png" alt class="image--center mx-auto" /></p>
<p>Looking at Map Run resource, there are a listing of executions, each with a Batch of Items as below.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1705359974029/a3d98714-06d0-4039-b813-a7406598c1a0.png" alt class="image--center mx-auto" /></p>
<p>In this example looking at any single execution with around 1800 items, the execution is a nested distributed map by its own. by this example we create a Top Down Distributed Map hierarchy at 2 level.</p>
<p>Here an example of what looks a child execution with a workflow State containing an INLINE Map State.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1705360986669/c5692a79-b291-4543-b0f3-51ac973de10d.png" alt class="image--center mx-auto" /></p>
<p>This Nested Distributed Map also has a listing of second level child workflow executions but this time with limited Items per Batch around 50 for each execution.</p>
<p>This seems a reasonable situation to be treated by INLINE Map ( Limited concurrency and History quota )</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1705361182783/5a85743b-306f-4832-a1d6-565fba22cb12.png" alt class="image--center mx-auto" /></p>
<p>Looking at this execution list all executions, the start and end time of executions are approximately close showing the parallelism and resulting a close execution duration ( in our example around 2 minutes ) and looking at the Parent Distributed Map durations we notice the same.</p>
<p>The Execution ran for a duration of 01:20:959 (1.5 Minutes) with a success result.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1705362014428/0919d766-0659-4db6-ba2d-ba027bd4e00b.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Distributed map is a great feature and shows off well the power of AWS Step Functions. The Distributed Map can be used in a variety of situations when the need of performance , simplicity and process isolation can be a concern.</p>
<p>In this article we could achieve a better understanding while processing amount of data in a scalable , reliable and performant manner.</p>
]]></content:encoded></item></channel></rss>