Big Data Distributed Systems Engineer (Remote)

Other
Salary: Competitive Salary
Job Type: Full time
Experience: Senior Level

Insikt AI

Big Data Distributed Systems Engineer (Remote)

Big Data Distributed Systems Engineer | Insikt AI |Spain

We are Insikt Intelligence and our passion is building easy-to-useArtificial Intelligence tools for helping companies and public institutionsvital intelligence from digital sources data by applying Natural Language...

Big Data Distributed Systems Engineer | Insikt AI | Spain

We are Insikt Intelligence and our passion is building easy-to-use Artificial Intelligence tools for helping companies and public institutions vital intelligence from digital sources data by applying Natural Language Processing and Machine Learning techniques to live data streams.

Insikt’s mission is to harness our technology for the good of society by putting tools with the potential to prevent crime and harmful content. Recently, Insikt has been acquired by Logically, a UK company that shares the same mission and purpose.

Position Overview:

We are seeking a talented and experienced Big Data / Distributed Systems Engineer to join our dynamic team. In this role, you will be responsible for designing, implementing, and optimizing large-scale data processing pipelines using Google Cloud Platform (GCP), Apache Spark, and GraphFrames. Your work will be crucial in enabling our data-driven initiatives, with a particular focus on analyzing and understanding social networks.

Key Responsibilities:

– Design and Development: Architect, develop, and maintain scalable and reliable data processing pipelines using GCP, Apache Spark, and GraphFrames, with a focus on social network data.

– Data Integration: Collaborate with data scientists, analysts, and other stakeholders to integrate diverse data sources into our big data platform, including data from social networks.

– Optimization: Optimize data processing jobs for performance, cost-efficiency, and reliability on distributed systems.

– Social Network Analysis: Apply techniques for analyzing social network structures, community detection, and network dynamics.

– Monitoring and Troubleshooting: Implement monitoring solutions and troubleshoot complex distributed systems issues to ensure the smooth operation of data pipelines.

– Collaboration: Work closely with cross-functional teams to understand data requirements, provide technical insights, and contribute to the overall architecture of the data platform.

Key Skills and Qualifications:

  • Google Cloud Platform (GCP): Extensive experience with GCP services related to big data, such as BigQuery, Dataproc, Dataflow, and Cloud Storage.
  • Apache Spark: Strong proficiency in using Apache Spark for large-scale data processing, including experience with Spark SQL, DataFrames, and RDDs.
  • GraphFrames: Hands-on experience with GraphFrames for graph processing and analytics within the Spark ecosystem.
  • Social Network Analysis (Nice to Have): Experience or strong interest in social network analysis, including techniques such as centrality measures, community detection, and network visualization.
  • Databricks (Nice to Have):** Experience with Databricks for building and managing big data pipelines, particularly within a cloud environment.
  • Programming: Proficiency in programming languages such as Python, Scala, or Java.
  • Distributed Systems: Solid understanding of distributed systems concepts, including fault tolerance, scalability, and data consistency.
  • Data Management: Experience with data modelling, ETL processes, and data warehousing solutions.
  • Problem-Solving: Strong analytical skills with the ability to solve complex technical challenges in a distributed environment.
  • Team Collaboration:** Excellent communication and teamwork skills, with a proactive approach to problem-solving and knowledge sharing.

Preferred Qualifications:

  • Machine Learning Pipelines: Experience integrating big data processing pipelines with machine learning workflows.
  • Social Network Analysis Tools: Familiarity with tools and libraries specifically used for social network analysis, such as GraphX, or others.
  • Databricks: Hands-on experience with Databricks is highly desirable.
  • Certifications: Relevant certifications in GCP, Apache Spark, Databricks, or social network analysis are a plus.

Show more

Show less

Tagged as: remote, remote job, virtual, Virtual Job, virtual position, Work at Home, work from home

Load more listings
When applying state you found this job on Pangian.com Remote Network.