About the company

Data Engineer R&D (hybrid) (VAC-A2343)

nicosia
Permanent

28 days ago Recruiting Agent

Job description

Our client, a Cybersecurity Company in Nicosia, is looking to hire an experienced Data Engineer working with large language models (LLMs) to join the Research and Development team. This role is crucial for developing and maintaining scalable data pipelines and infrastructure to support the training and deployment of large language models. The ideal candidate will bring a blend of data engineering skills and a deep understanding of the intricacies involved in managing data for LLMs and other advanced modelling from preprocessing to optimization for performance at scale.


Job Duties

  • Design, build, and maintain scalable and efficient data pipelines specifically tailored for training and deploying large language models.
  • Work closely with data scientists and machine learning engineers to understand data requirements for LLM projects, including data collection, processing, and storage needs
  • Implement and manage data ingestion routines from a variety of sources, ensuring data quality and accessibility for LLM training
  • Optimize data infrastructure to support the computational demands of LLMs, including performance tuning and scalability improvements
  • Develop tools and processes for monitoring and analyzing data pipeline performance and data quality, ensuring the integrity and availability of data
  • Collaborate with cross-functional teams to ensure seamless integration of LLMs into production environments, including support for model versioning, deployment, and monitoring
  • Stay abreast of the latest developments in large language models, data engineering practices, and technologies to continually improve pipeline efficiency and model performance
  • Ensure compliance with data governance and security policies throughout the data lifecycle, from ingestion to model deployment.

Job Requirements

  • At least 2 years of proven experience as a Data Engineer, with specific experience working on projects involving large language models
  • Strong expertise in data modelling, ETL processes, and data pipeline tools
  • Proficient in programming languages commonly used in data engineering and machine learning, such as Python and SQL.
  • Experience with big data technologies (e.g., Hadoop, Spark) and cloud services (AWS, Google Cloud, Azure) tailored for machine learning and data processing workloads
  • Knowledge of containerization and orchestration technologies (e.g., Docker, Kubernetes) for deploying and managing LLM applications
  • Familiarity with machine learning operations (MLOps) practices for managing the lifecycle of machine learning models, including large language models
  • Excellent problem-solving skills, with the ability to work independently and as part of a team in a fast-paced environment
  • Strong communication skills, with the ability to explain complex technical concepts to non-technical stakeholders.
  • Fluency in Greek and English

Working hours:

The working hours are 9am-6pm (20 min break), Friday afternoons off (hybrid working)

TO APPLY for this job opportunity, send your CV (in English please) and include the reference: Data Engineer R&D (hybrid) - VAC-A2343. We look forward to hearing from you!