Embark on a comprehensive journey to build a complete data pipeline on the AWS platform. In this practical course, you will gain hands-on experience, from acquiring data with the Twitter API to analysis, storage, and visualization.
Course Overview
You will learn how to create your own machine learning algorithm and deploy it on AWS using Lambda. The course also covers setting up a Postgres database with Amazon RDS. For result visualization, you'll develop an interactive dashboard with Streamlit and learn to deploy it using Elastic Container Registry (ECR) and Elastic Container Service (ECS). Furthermore, you'll be introduced to the Poetry tool for effective project dependency management.
Course Structure
Twitter API Integration
Twitter API provides an excellent gateway for accessing open data. You will learn to configure API access and retrieve tweets from a user's feed for further processing. Delve into API configuration details and understand the data format (payload) it returns.
Setting Up RDS Database
Data storage is crucial for any platform. You will set up a Postgres database in Amazon RDS and understand the rationale behind storing JSON tweets in it. Get hands-on practice with virtual private clouds (VPC) to make your database internet-accessible. Learn to use PGAdmin to create tables and execute database queries.
Implementing NLP with Lambda
Use the Natural Language Toolkit (NLTK) library to perform text analysis with a pre-built machine learning algorithm. You will create a Lambda function to retrieve tweets, analyze their sentiment, and save the results in your database. Learn to connect necessary dependencies through layers, including how to import pre-made K-Layers and create custom layers. Discover how to set up automatic Lambda function triggers using EventBridge.
Dependency Management and Streamlit App Development
Visualize results by creating a Streamlit application. Establish a local development environment with Anaconda3 and create a conda virtual environment. Manage project dependencies using Poetry as you navigate through the provided Git repository. We will guide you step-by-step through the application code, demonstrating how to run it in a new virtual environment for testing.
Deploying Streamlit Application in ECS
Upon completing your visualization, learn to handle Docker images and containers on AWS. Create an Elastic Container Registry (ECR) and set up AWS CLI. Understand how to create user groups and set access restrictions with IAM. After building your Docker image, upload it to ECR, configure an ECS Fargate cluster, and successfully deploy your Streamlit application as a task on the platform.