Summary

Overview

Work History

Education

Skills

Timeline

Hi, I’m

Hari Krishna Aitha

Copenhagen

Summary

Data Engineer with over 7 years of experience in designing and optimizing data solutions across AWS, Azure, and GCP. Expertise in managing large-scale data pipelines and ensuring data integrity while enhancing performance. Recognized for strong time management and problem-solving abilities, contributing to team success and organizational growth.

Overview

years of professional experience

Work History

Citi Bank

04.2024 - Current

Job overview

Company Overview: Citi Bank is a global financial institution offering banking, credit, investment, and wealth management services to individuals, businesses, and governments
I design, build, and maintain scalable ETL/ELT pipelines using Azure Data Factory (ADF) and Azure Synapse Analytics
Automate data ingestion and transformation processes for structured, semi-structured, and unstructured data
Developed PySpark applications for various ETL operations across various data pipelines
Exploring with the PySpark to improve the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, and Pair RDD's
Analysed and developed a modern data solution with Azure PaaS service to enable data visualization
Understood the application's current Production state and the impact of new installation on existing business processes
Developed Spark Streaming programs to process near real time data from Kafka, and process data with both stateless and state full transformations
Developing data pipelines and workflows using Azure Databricks to process and transform large volumes of data, utilizing programming languages such as Python, Scala, or SQL
Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark data bricks cluster
Created Data tables utilizing PyQt to display customer and policy information and add, delete, update customer records
Working on query languages such as SQL, code languages such as Python or C# and scripting languages such as PowerShell, M-Query (Power Query), or Windows batch commands
Utilized Elasticsearch and Kibana for indexing and visualizing the real-time analytics results, enabling stakeholders to gain actionable insights quickly
Involved in various phases of Software Development Lifecycle (SDLC) of the application, like gathering requirements, design, development, deployment, and analysis of the application
Involved in loading data into Cassandra NoSQL Database
Develop ETL/ELT processes to prepare data for analytics and reporting
Implement data governance policies to ensure data quality, consistency, and compliance
Use Azure Purview or similar tools for data cataloging, lineage tracking, and metadata management
Ensure data security by implementing encryption, access controls, and monitoring using Azure Security Center and Azure Key Vault
Monitor data pipelines and systems for performance, reliability, and cost
Work closely with data scientists to provide clean, structured data for machine learning models and advanced analytics
Use Azure Monitor, Log Analytics, and Application Insights to troubleshoot issues and optimize resource usage
Manage and optimize Azure cloud infrastructure, including virtual machines, storage accounts, and networking components
Support business analysts by enabling access to data through tools like Power BI or Tableau
Use Infrastructure as Code (IaC) tools like Terraform or Azure Resource Manager (ARM) templates for deployment and management
Work with IoT data streams from sensors and equipment used in oil and gas operations
Use Azure IoT Hub, Azure Stream Analytics, or Apache Kafka for real-time data processing and analytics
Document data pipelines, architectures, and processes for future reference and onboarding of new team members
Implemented Synapse Integration with Azure Databricks notebooks which reduce about half of development work
And achieved performance improvement on Synapse loading by implementing a dynamic partition switch
Built and configured Jenkins slaves for parallel job execution
Installed and configured Jenkins for continuous integration and performed continuous deployments
Successfully managed data migration projects, including importing and exporting data to and from MongoDB, ensuring data integrity and consistency throughout the process
Worked on Jenkins pipelines to run various steps including unit, integration and static analysis tools
Skilled in monitoring servers using Nagios, Cloud watch and using ELK Stack- Elastic search and Kibana
Extensively used Azure Athena to ingest structured data from Azure Blob Storage into various systems such as Azure Synapse Analytics or to generate reports
Developed and maintained data models and schemas within Snowflake, including the creation of tables, views, and materialized views to support business reporting and analytics requirements
Good experience with Continuous Integration and Continuous Delivery (CI/CD) of application using Bamboo
Technologies Used: Analytics, API, Athena, Azure, Azure Synapse Analytics, Blob, Cassandra, CI/CD, Elasticsearch, ETL, Java, Jenkins, Kafka, lake, PaaS, PySpark, Python, Scala, Snowflake, Spark, Spark Streaming, SQL

Lundbeck

09.2022 - 03.2024

Job overview

Company Overview: Lundbeck is a global pharmaceutical company specializing in brain diseases, focusing on innovative treatments for psychiatric and neurological disorders
Enhanced the data pipelines for performance, scalability, and reliability and leveraged modern tools and frameworks like Apache Airflow, Spark, and cloud-native services
Design, development and implementation of performant ETL pipelines using Python API of Apache Spark
Creating Lambda functions with Boto3 to deregister unused AMIs in all application regions to reduce the cost for EC2 resources
Worked with HIVE data warehouse infrastructure-creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HQL queries
Imported real time weblogs using Kafka as a messaging system and ingested the data to Spark Streaming and did data quality checks using Spark Streaming and arranged bad and passable flags on the data
Responsible for estimating cluster size, monitoring, and troubleshooting the Spark Databricks cluster
Worked with AWS Terraform templates in maintaining the infrastructure as code
Involved in various phases of Software Development Lifecycle (SDLC) of the application, like gathering requirements, design, development, deployment, and analysis of the application
Used Django evolution and manual SQL modifications were able to modify Django models while retaining all data, while site was in production mode
Used Jira for ticketing and tracking issues and Jenkins for continuous integration and continuous deployment
Ensured data integrity and consistency during migration, resolving compatibility issues with T-SQL scripting
Dockerized applications by creating Docker images from Docker file, collaborated with development support team to setup a continuous deployment environment using Docker
Instantiated, created, and maintained CI/CD (continuous integration & deployment) pipelines and apply automation to environments and applications
Developed Kibana Dashboards based on the Log stash data and Integrated different source and target systems into Elasticsearch for near real time log analysis of monitoring End to End transactions
Migrated the data from Amazon Redshift data warehouse to Snowflake for further financial reporting
Stored the log files in AWS S3
Used versioning in S3 buckets where the highly sensitive information is stored
Technologies Used: APIs, AWS, CI/CD, Docker, EC2, Elasticsearch, ETL, HBase, Java, Jenkins, Jira, Kafka, lake, Lambda, Python, Redshift, S3, Snowflake, Spark, Spark Streaming, SQL

Aon

11.2019 - 08.2022

Job overview

Company Overview: Aon is a global professional services firm providing risk management, insurance, retirement, and health consulting solutions
Designed and implemented the efficient data models for BigQuery to support analytics and reporting needs
Optimized the schemas for performance, scalability, and cost-efficiency
Extensively worked on HIVE, created numerous Internal and external tables as part of the analysis requirements
Involved in data validations and reports using PowerBI
Creating Data Studio report to review billing and usage of services to optimize the queries and contribute in cost saving measures
Worked on NoSQL Databases such as HBase and integrated with PySpark for processing and persisting real-time streaming
Experience in GCP Dataproc, GCS, Cloud functions, BigQuery
Created Amazon VPC to create public-facing subnet for web servers with internet access, and backend databases & application servers in a private-facing subnet with no Internet access
Developed an end-to-end solution that involved ingesting sales data from multiple sources, transforming and aggregating it using Azure Databricks, and visualizing insights through Tableau dashboards
Good knowledge in using Cloud Shell for various tasks and deploying services
Created batch and real time pipelines using Spark as the main processing framework
Used Python to write Data into JSON files for testing Django Websites, Created scripts for data modelling and data import and export
Experienced in Google Cloud components, Google container builders and GCP client libraries and Cloud SDK'S
Managed large datasets using Panda data frames and SQL
Involved in developing data ingestion pipelines on Azure HDInsight Spark cluster using Azure Data Factory and Spark SQL
Also Worked with Cosmos DB (SQL API and Mongo API)
Monitoring BigQuery, Dataproc and Cloud Dataflow jobs via Stack driver for all the different environments
Used Sqoop import/export to ingest raw data into Google Cloud Storage by spinning up Cloud Dataproc cluster
Used Google Cloud Dataflow using Python sdk for deploying streaming jobs in GCP as well as batch jobs for custom cleaning of text and json files and write them to BigQuery
Involved in setting up of Apache Airflow service in GCP
Technologies Used: Airflow, Apache, API, Azure, BigQuery, Cosmos DB, Data Factory, Factory, GCP, HBase, HDInsight, JS, PySpark, Python, SDK, Spark, Spark SQL, SQL, Sqoop, Tableau, VPC

Nokia

05.2017 - 10.2019

Job overview

Company Overview: Nokia is a global technology company specializing in telecommunications, networking, and 5G infrastructure solutions
Implemented the data validation and cleansing processes to maintain data accuracy and reliability
Monitor and resolve data pipeline errors to ensure seamless data flow
Used AWS Lambda to perform data validation, filtering, sorting, or other transformations for every data change in a Database table and load the transformed data to another data store AWS S3 for raw file storage
Worked on Big Data Integration & Analytics based on Hadoop, SOLR, PySpark, Kafka, Storm and web Methods
Have worked on partition of Kafka messages and setting up the replication factors in Kafka Cluster
Created several Databricks Spark jobs with PySpark to perform several tables to table operations
Developed Spark applications for the entire batch processing by using PySpark
Involved in the entire lifecycle of the projects including Design, Development, and Deployment, Testing and Implementation, and support
Developed database triggers and stored procedures using T-SQL cursors and tables
Implemented Apache Airflow for workflow automation and scheduling tasks and created DAGs tasks
Built scalable data infrastructure on cloud platforms, such as AWS, using Kubernetes and Docker
Conducted query optimization and performance tuning tasks, such as query profiling, indexing, and utilizing Snowflake's automatic clustering to improve query response times and reduce costs
Created CI/CD pipelines with Jenkins and deploy the application on AWS EC2 using docker containers
Used AWS EMR to transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB
Technologies Used: AWS CI/CD, Cluster, Data Factory, Docker, DynamoDB, EC2, EMR, ETL, Factory, Jenkins, Kafka, Kubernetes, Lake, Lambda, PySpark, S3, Snowflake, Spark, SQL, Sqoop, Storm

Education

Kakatiya University

Master of Science from Computer Applications Development

05-2008

Skills

AWS Services: S3, EC2, EMR, Redshift, RDS, Lambda, Kinesis, SNS, SQS, AMI, IAM, Cloud formation
Hadoop Components / Big Data: HDFS, Hue, MapReduce, PIG, Hive, HCatalog, HBase, Sqoop, Impala, Zookeeper, Flume, Kafka, Yarn, Cloudera Manager, Kerberos, PySpark Airflow, Kafka, Snowflake Spark Components
Databases: Oracle, Microsoft SQL Server, MySQL, DB2, Teradata
Programming Languages: Java, Scala, Impala, Python
Web Servers: Apache Tomcat, WebLogic
IDE: Eclipse, Dreamweaver
NoSQL Databases: NoSQL Database (HBase, Cassandra, Mongo DB)

Methodologies: Agile (Scrum), Waterfall, UML, Design Patterns, SDLC
Currently Exploring: Apache Flink, Drill, Tachyon
Cloud Services: AWS, Azure, Azure Data Factory / ETL/ELT/SSIS Azure Data Lake Storage Azure Data bricks, GCP
Teamwork and collaboration
ETL Tools: Talend Open Studio & Talend Enterprise Platform
Reporting and ETL Tools: Tableau, Power BI, AWS GLUE, SSIS, SSRS, Informatica, Data Stage
Friendly, positive attitude

Timeline

Citi Bank

04.2024 - Current

Lundbeck

09.2022 - 03.2024

Aon

11.2019 - 08.2022

Nokia

05.2017 - 10.2019

Kakatiya University

Master of Science from Computer Applications Development

Similar Profiles

Senthilkumar SSenthilkumar S
Service Delivery Manager at Citi Bank USA, The Citi BankService Delivery Manager at Citi Bank USA, The Citi Bank
null null
Service Delivery Manager at Citi Bank USA, The Citi BankService Delivery Manager at Citi Bank USA, The Citi Bank
Zakary O'BrienZakary O'Brien
Compliance AML Execution Analyst at Citi BankCompliance AML Execution Analyst at Citi Bank
Sai Ram BolugodduSai Ram Bolugoddu
SR. Site Reliability Engineer at Citi BankSR. Site Reliability Engineer at Citi Bank
Heidi Van Den AdelHeidi Van Den Adel
Freelance Operational Employee at MakesYouLocalFreelance Operational Employee at MakesYouLocal

CREATE PROFILE

Summary

Overview

Work History

Citi Bank

Job overview

Lundbeck

Job overview

Aon

Job overview

Nokia

Job overview

Education

Kakatiya University

Skills

Timeline

Kakatiya University

Similar Profiles

Senthilkumar SSenthilkumar S

null null

Zakary O'BrienZakary O'Brien

Sai Ram BolugodduSai Ram Bolugoddu

Heidi Van Den AdelHeidi Van Den Adel