Resources

Data collection

Tools and resources to analyze the Peruvian Annual Economic Survey
STATA,Python
Repository that explains the structure of the Peruvian Annual Economic Survey (EEA) 2012–2022 and provides tools for its analysis. It includes Stata do-files, Python scripts, and processing templates to help researchers clean, organize, and analyze the dataset using reproducible workflows.
Decode and process survey responses from phone recordings
Python
Python-based tool for decoding Dual-Tone Multi-Frequency (DTMF) tones from phone survey recordings. It enables the extraction of numeric responses from audio data, supporting privacy-preserving data collection in sensitive contexts such as gender or health studies, and seamless integration with platforms like SurveyCTO.
Extract information from PDFs invoices
Python
Python code for extracting structured data from PDF files — including encrypted ones — without needing a decryption key. Ideal for processing electronic invoices from Peru and Chile, converting them into clean, analyzable Excel files.
Consolidate information from multiples CV's.
Python

This repository contains a Python script designed to extract structured information from resumes in PDF format—such as name, email, education, and more—and automatically consolidate it into an Excel file. Spanish version.

Experiments & Coursework

Random Forest for the early detection of IRA
Python
This project implements a Random Forest Regressor for the early detection of acute respiratory diseases (IRA) in Peru. The model predicts the weekly incidence rate by region using climate and historical case data, while excluding the pandemic years (2020–2021). This project was developed in collaboration with Mónica Lozada.
Applied microeconometrics course materials
STATA, Spanish
Teaching materials for a workshop on applied microeconometrics for undergraduate students at the Universidad Nacional de Ingeniería (UNI). The repository includes examples and implementations in Stata covering OLS, instrumental variables, regression discontinuity, differences-in-differences, and discrete choice models.