We are seeking a Senior Data Engineer to lead a critical pilot project focused on modernizing our enterprise customer data consolidation from SQL Server to our Databricks-based data lake. This role combines traditional Oracle/PL/SQL expertise with modern PySpark development to support our East region initiative across three billing platforms.
## Key Responsibilities
Primary Project (Enterprise Customer Table Pilot)
- Design and implement data consolidation solutions moving from SQL Server to Databricks data lake
- Work with business stakeholders and cross-functional teams to define enterprise customer table specifications
- Determine optimal approach for data processing - either within existing Oracle systems or in the data lake environment
- Collaborate with enterprise data lake team to leverage existing PySpark resources and infrastructure
- Produce modified data inputs for the new enterprise customer table consolidation process
- Ensure data quality and consistency across three different billing platform feeds
Secondary Pilot (Code Conversion)
- Convert existing Oracle/PL/SQL code to PySpark for data lake processing
- Evaluate feasibility of migrating current data warehouse operations to PySpark
- Provide proof-of-concept for future large-scale migration initiatives
- Test and validate converted code performance in the data lake environment
Team Development & Knowledge Transfer
- Train and mentor existing PL/SQL team members on PySpark technologies
- Work independently with minimal supervision while collaborating effectively with stakeholders
- Provide technical leadership and architectural guidance for data processing solutions
- Document best practices and create knowledge base for future PySpark implementations
Required Technical Skills
Core Requirements
- **10+ years** of back-end development experience
- **Expert-level PL/SQL and Oracle** database development
- **7+ years of PySpark** experience with data lake implementations
- Strong experience with **Databricks** platform
- Proficiency in **data modeling and schema design**
- Experience with **data pipeline development**
- Custom data warehouse development experience
Preferred Technologies
- **Delta Lake** experience
- **Apache Airflow** for job scheduling and pipeline orchestration
- **GCP (Google Cloud Platform)** - our primary cloud environment
- **AWS or Azure** cloud experience (transferable)
- Data warehouse and ETL/ELT processes
- Experience with enterprise-scale data integration projects
Technical Environment
- **Current Stack**: Oracle-based custom data warehouse, PL/SQL processing
- **Target Stack**: Databricks data lake, PySpark, Delta Lake, GCP
- **Integration Points**: Three separate billing systems, SQL Server consolidation layer
- **Data Volume**: Enterprise-scale customer data across multiple regions
Business Context & Domain Knowledge
- Support for multiple regions: East, Texas (largest), and Panera
- Integration challenges across three separate billing systems with different data formats
- Enterprise-level customer data consolidation and reporting requirements
- Migration from legacy SQL Server data warehouse to modern data lake architecture
- Sales reporting and sales count reporting focus
- Experience with acquired company data integration challenges preferred
Required Competencies
Technical Leadership
- Ability to analyze existing systems and recommend architectural improvements
- Experience designing scalable data processing solutions
- Strong debugging and troubleshooting skills across multiple platforms
- Code review and quality assurance capabilities
Business Acumen
- Understanding of enterprise data warehouse concepts
- Experience with customer data management and consolidation
- Knowledge of sales reporting and business intelligence requirements
- Familiarity with multi-system integration challenges
Communication & Collaboration
- Excellent stakeholder management skills
- Ability to translate technical concepts to business users
- Experience working with cross-functional teams
- Strong documentation and knowledge sharing abilities
Work Arrangement
- **Hybrid role**: 3 days on-site (Monday, Tuesday, Thursday) in Houston, TX
- 2 days remote work
- Candidates willing to relocate to Houston will be considered
- **Office Location**: Houston, Texas (specific location to be provided)