Introduction to AWS Networking for Data Engineers

Basic Principles of Cloud Networking

Understanding Networking in Cloud Computing

  • Networking involves setting up secure and efficient communication between cloud resources.

  • Key aspects include permissions, security, and connectivity between resources.

  • Proficiency in cloud networking builds through understanding core concepts and hands-on practice.

Importance of Networking in Data Engineering

  • Data engineers leverage networking to design and manage cloud-based data systems.

  • Networking ensures secure and seamless connections between data pipelines, databases, and applications.

  • Example: Deploying web apps with backend databases requires well-architected networking setups.

Core Networking Components

  • Virtual Private Clouds (VPCs): Isolated cloud networks for specific use cases.

  • Subnets: Subdivisions of a VPC to group resources based on access and security needs.

  • Gateways: Provide connectivity between resources and the Internet or other networks.

  • Route Tables: Define traffic routing rules for subnets.

  • Security Groups and Network Access Control Lists (NACLs): Manage inbound and outbound traffic to resources.

Setting Up a VPC in AWS

What is a VPC?

  • A Virtual Private Cloud (VPC) is a virtual network in the cloud that is logically isolated from other networks.

  • AWS accounts come with a default VPC; however, custom VPCs are preferred for real-world applications.

Steps to Create a VPC

  • Access the AWS Console and open the VPC dashboard.

  • Click “Create VPC,” provide a name, and define an IPv4 CIDR block.

  • Example: Use 10.0.0.0/16 for the CIDR block, allowing resources to use private IPs ranging from 10.0.x.x .

Key Considerations for VPCs

  • A VPC can span all availability zones (AZs) within its region.

  • Resources in the same VPC can communicate, while cross-VPC communication requires additional configuration.

  • Assign descriptive names to VPCs to identify them easily (e.g., project-1 ).

Subnets: Dividing the VPC

What Are Subnets?

  • Subnets are smaller networks within a VPC, used to group resources based on their access and security requirements.

  • Subnets can either be public (Internet-accessible) or private (isolated from direct Internet access).

Importance of Subnets

  • Subnets control how resources communicate and enhance security.

  • Placing resources across multiple AZs ensures high availability and redundancy in case of failures.

Creating Subnets

  • Access the Subnet section in the VPC dashboard and select the VPC (e.g., project-1 ) for which subnets will be created.

  • Define public and private subnets for different AZs (e.g., us-east-1a and us-east-1b ).

  • Example Subnet IP Ranges:

    • Public Subnet 1: 10.0.1.0/24

    • Private Subnet 1: 10.0.2.0/24

    • Public Subnet 2: 10.0.3.0/24

    • Private Subnet 2: 10.0.4.0/24

Benefits of Subnet Architecture

  • Public subnets host resources like NAT Gateways that require Internet access.

  • Private subnets host sensitive resources like databases (Amazon RDS) and compute instances (EC2) for security.

CIDR Notation: IP Address Allocation

What is CIDR Notation?

  • CIDR (Classless Inter-Domain Routing) specifies IP address ranges and subnet sizes.

  • Format: X.X.X.X/Y (e.g., 10.0.0.0/16 ), where Y represents the number of bits used for the network prefix.

CIDR Breakdown

  • Example: 10.0.0.0/16

    • The first 16 bits ( 10.0 ) are fixed, defining the network prefix.

    • The remaining 16 bits are variable, allowing up to 65,536 unique host addresses in the VPC.

Subnet CIDR Design

  • Public subnet CIDR: 10.0.1.0/24 (256 available addresses: 10.0.1.x ).

  • Private subnet CIDR: 10.0.2.0/24 (256 available addresses: 10.0.2.x ).

Practical Scenario: Web App with EC2 and RDS

Overview of the Example Scenario

  • A web application runs on an EC2 instance, querying data from an RDS database.

  • Resources are deployed in a single VPC with public and private subnets for security and scalability.

Networking Design

  • VPC configuration:

    • Public subnets for NAT Gateways and Internet-facing components.

    • Private subnets for EC2 instances and RDS databases.

  • Resources:

    • EC2 instance in private subnet for enhanced security.

    • RDS database isolated in private subnet to avoid direct Internet access.

High Availability and Redundancy

  • Resources are distributed across two AZs:

    • Public Subnet 1 in us-east-1a and Public Subnet 2 in us-east-1b .

    • Private Subnet 1 in us-east-1a and Private Subnet 2 in us-east-1b .

  • Availability zones provide failover support in case of AZ-level disruptions.

Next Steps: Internet Connectivity

Current State of the VPC

  • The VPC is isolated, with no Internet connectivity for deployed resources.

  • EC2 and RDS instances in private subnets are inaccessible from the Internet by default.

Introducing Gateways

  • Internet Gateway (IGW): Enables public Internet access for resources in public subnets.

  • NAT Gateway: Allows resources in private subnets to initiate outbound Internet connections securely.

Configuring Gateways

  • Attach an Internet Gateway to the VPC for public subnet Internet access.

  • Deploy NAT Gateways in public subnets to route outgoing traffic from private subnets.

Preparing for Application Deployment

  • After configuring gateways, the network will be ready for deploying the EC2 instance, RDS database, and application load balancer.