Embark on an intriguing journey in this engineering project where you'll learn to trace user movements through their phone scans using Elasticsearch. This project's goal is to employ Elasticsearch as a search system to analyze a comprehensive dataset in which 100,000 users visit stores and make 1,000,000 scans.
Create Your Dataset
Utilize Python and Pandas to craft your own dataset from an open San Francisco stores dataset, which features over 140,000 stores with their names and coordinates. Learn to refine this dataset down to 10,000 selected stores and generate 100,000 fictional users, each performing an average of 10 check-ins. Once the data preparation is complete, you'll upload it to Elasticsearch and build a vibrant user interface with Streamlit for robust data visualization.
Application Interface Features
- Search by store name
- Search by ZIP code to filter stores by area
- Search by business ID for visit analysis
- Search and track by Device ID to observe specific user movements
Skills and Learning Outcomes
Throughout this project, you'll develop the ability to:
- Transform and upload data in parquet format to Elasticsearch
- Utilize Kibana for effective index management and document search
- Design an interactive interface using Streamlit with controls, Folium maps, and tables
- Configure pages and execute intricate queries on Elasticsearch
Course Program Overview
- Preparation of the San Francisco dataset with 10,000 stores
- Generation of 100,000 fictional user profiles
- Integration of user data with store information
- Creation of 1,000,000 app check-ins
- Data preparation for Elasticsearch upload
- Data upload to Elasticsearch
- Streamlit application development including maps, filters, and tables
- Page configuration and Elasticsearch query execution
Prerequisites and Recommendations
It is recommended to complete the “Log Analysis in Elasticsearch” course to gain foundational knowledge in Elasticsearch. Additionally, consider taking the Pandas lessons from the course “Python for Data Engineers” to enhance your data manipulation skills.
This project is best suited for systems equipped with at least 8 GB of RAM.