Are you an engineer who is passionate about enabling business-impacting
research and analytics through reliable and transparent processes, and durable
systems? Are you happiest when collaborating with engineers, quants, and
traders to deliver trust and confidence in data, software and outcomes? CTC is
seeking experienced specialists for our Data Engineering team. We are
responsible for the development, infrastructure, and operations of CTC's
common data platform. You will utilize your experience combined with the team
to operate and improve our data platform, pipelines, and processes through
partnered collaboration with quants and traders in our dynamic, data-powered
trading firm.
Responsibilities
- Provide leadership, mentorship and help the team that is responsible for break-fix, uptime and reliability for a petabyte scale cloud data platform
- Manage several production data environments with a high level of service availability for data ETL pipelines
- Collaborate with technology teams, researchers and traders to perform root cause analysis and re-instrument triggers to prevent future pipeline degradation and outages
- Implement automation for manual processes required to meet SLA, be at the heart of developing new insights into internal tools
- Diagnose and resolve critical production issues, running incident tracking, analysis and reporting
- Define a data quality management process and maintain quality standards across our ETL pipelines
- Build automation and processes that improve team efficiency, reduce the impact of mistakes and eliminate recurring incidents
Qualifications
- Experience operating as a Manager, providing day-to-day guidance and support
- Expertise using Python and SQL
- Experience managing large scale data pipelines and highly distributed data platforms
- Cloud (Azure, AWS) and Kubernetes experience required, including infrastructure as code leveraging tools like Terraform
- Experience with observability tools such as Datadog, Prometheus, Splunk Enterprise, SignalFX, CloudWatch, preferred
- Familiarity with distributed computing tools and techniques in the Python ecosystem including PySpark and Dask
- Experience analyzing and efficiently navigating large (TB+) data sets