
Senior/Staff Big Data Storage and Computing Engineer, Recommendation Data Ecosystem
Job Description
Our team plays a crucial role in the data ecosystem of the TikTok Recommendation System, focusing on creating offline and real-time data storage solutions for large-scale recommendation, search, and advertising businesses, serving over 1 billion users. The core goals of the team are to ensure high system reliability, uninterrupted service, and smooth data processing. We are committed to building a storage and computing infrastructure that can adapt to various data sources and meet diverse storage requirements, ultimately providing efficient, cost-effective, and user-friendly data storage and management tools for the business.
Responsibilities
-
Architecture Design and Implementation: Design and implement offline and real-time data architectures for large-scale recommendation, search, and advertising systems based on Paimon and Flink. Ensure efficient data processing and storage to meet the strict requirements of the business for data timeliness and accuracy.
-
System Construction and Optimization: Design and implement flexible, scalable, stable, and high-performance storage systems and computing models. Use Paimon as the storage foundation and combine it with the powerful computing capabilities of Flink. Continuously optimize system performance to cope with the challenges brought by business growth.
-
Troubleshooting and Stability Assurance: Be responsible for troubleshooting production systems. For problems that occur in the Paimon-Flink architecture during operation, design and implement necessary mechanisms and tools, such as data consistency assurance and exception recovery, to ensure the overall stability of the production system.
-
Distributed System Construction: Build industry-leading distributed systems, including offline and online storage based on Paimon and batch and stream processing frameworks based on Flink, providing solid and reliable infrastructure support for massive data and large-scale business systems.
Minimum Qualifications:
- A bachelor's degree or above in computer science, software engineering, or related fields, with experience in building scalable systems.
- Thorough understanding of Paimon and Flink, and be able to understand and use them at the source-code level.
- In-depth understanding of at least one data lake technology (such as Paimon), with practical implementation and customization experience, which should be highlighted in the resume.
- Proficiency in programming languages such as Java, C++, and Scala, with strong coding and problem-solving abilities.
- Experience in data warehouse modeling and be able to design efficient data models that meet complex business scenarios.
Preferred Qualifications:
- Experience in using other big-data systems/frameworks (such as Hive, HBase, Kudu, etc.) and handling large-scale data (PB - level and above) is preferred.
- Have the courage to take on complex problems and be willing to explore problems without clear solutions.
- Passions in learning new technologies and be able to quickly master and apply them to practical work.
- Familiar with the principles of HDFS, and knowledge of columnar storage formats such as Parquet and ORC is preferred.