Mohamed ElSayed Kandil

Summary

Detail-oriented team player with strong organizational skills. Ability to handle multiple projects simultaneously with a high degree of accuracy. Organized and dependable candidate successful at managing multiple priorities with a positive attitude. Willingness to take on added responsibilities to meet team goals.

Overview

6

years of professional experience

5

years of post-secondary education

Work History

Principal HPC systems Engineer

Brightskies Technologies

01.2022 - Current

Responsible for setup High performance Computing Platforms
Perform technical planning, hardware sizing, application workload assessment, system integration, verification and validation, and supportability and effectiveness analyses for total systems
Manage and monitor all installed systems and infrastructure
Install, configure, test and maintain operating systems, application software and system management tools
Proactively ensure the highest levels of systems and infrastructure availability
Monitor and test application performance for potential bottlenecks, identify possible solutions, and work with developers to implement those fixes
Maintain security, backup, and redundancy strategies
Write and maintain custom scripts to increase system efficiency and lower the human intervention time on any tasks
Liaise with vendors and other IT personnel for problem resolution
Lead a small team of engineers and guide them through projects.

Senior HPC Systems Engineer

Brightskies Technologies

10.2017 - 01.2020

Design and conduct POCs to showcase the proposed HPC solutions
Deploy & administer turnkey HPC solutions
Strong experience with cluster management tools such as xCAT, Bright Computing, and CMU
Strong experience in deploying and administering PBSpro workload manager,Torque, Altair Access Web, and Altair Control
Compiling and performing HPC Benchmarks and optimizing the results and point to the cluster bottlenecks and problems
Installation and configuration of HPC production clusters
Advanced knowledge in distributed file systems Beegfs and lustre
Automating configuration and provisioning Infrastructure using Ansible
Containerize applications using Docker
Continuous integration and deployment using gitlab-ci, jenkins.

Cloud Systems Engineer

Brightskies Technologies

10.2017 - 01.2020

Implementation for POC for HPC Cluster on demand with Bright Computing Openstack
Competent in implementation of openstack infrastructure ( Horizon, Nova, Glance, Keystone, Swift, Cinder, Neutron)
Installation and configuration for virtualized environments based on vsphere 6
Installation and configuration for NSX
Installation and configuration for vCloud director and Integration with Active Directory
Managing vCloud Director infrastructure through the service provider portal and tenant portal
Integration with vCloud director and vRops tenant app
Upgrade for vCloud director 9.5 to version 9.7
Installation and configuration for vRealize Orchestrator
Implementation and configuration for different plugins and connections with VRO to the different components of VMware Cloud stack (vCenter, NSX, VCD, AD)
Building design and flowcharts to automate the process of workload creation over VCD products offering as IAAS for VRO

Application System Engineer

Pharmaoverseas

05.2017 - 10.2017

Responsible for the management of the ERP environment (Application, Database (DB2), Operating systems (AIX, SUSE), Servers (IBM Power Servers) and storage Flash System)
Configuring, monitoring, tuning and troubleshooting the ERP technical environment
Collaborate to resolve SAP transport and source code problems
Install new / rebuild existing servers and configure hardware, peripherals, services, settings, directories, storage, etc
In accordance with standards and project/operational requirements
Perform daily system monitoring, verifying the integrity and availability of all hardware, server resources, systems and key processes, reviewing system and application logs, and verifying completion of scheduled jobs such as backups
Perform regular security monitoring to identify any possible intrusions
Implement an optimal ERP configuration to maximize system performance and availability
Install and configure all required SAP database servers, Operating system and application servers.

Education

B.Sc. - Communication and Electronics Engineering

Alexandria University

01.2008 - 01.2012

Professional Diploma - undefined

Information Technology Institute (ITI)

01.2016 - 01.2017

Post-graduate - undefined

Skills

Bright Computing, Xcat, CMU

Slurm, PBSPro, Altair Access Web, Altair Control, Torque, Moab

Scientific Application compiling

Beegfs, Lustre, Spectrum Scale (GPFS)

Bash, Python, COS, Data StructureDocker, Docker Compose, Git, Gitlab-CI, Ansible, Jenkins, Vagrant, Terraform, YAML, APISystem/Network AdministrationLinux (RHEL, Ubuntu, Fedora,SUSE), UNIX Solaris, AIX, RHEL HA (Pacemaker, corosync)Cloud and VirtualizationVMware (ESXI, vCenter, vCloud Director, vRealize Orchestrator, vROPS)Openstack (Horizon, Nova, Glance, Keystone, Swift, Cinder, Neutron, Kayobe, Kolla)AWS , OCI, KVM, Citrix

Influxdb, grafana, telegraf

Projects

Dammam 7 HPC Cluster Aramco Saudi Oct 2020 – Present

Project description

The following activities are managed within the project scope:

Supporting STCS Project delivery team in Delivering the Cluster to the Operations team:

Supporting setting the Delivery Criteria and ensuring the Deliverables are delivered correctly
Supporting in various performance Tests and Supporting in preparing the cluster for production
Participating in developing monitoring systems and automated availability reports
Supporting the Installation and configuration of a replicated central Authentication System for the different cluster categories of nodes
Supporting the implementation team to integrate the authentication system in the cluster OS images
Designing and implementing central replicated DNS servers to maintain unified hostname schemas for the processing cluster, storage cluster and general purpose application nodes
Configuring the cluster to comply with Aramco security requirements

Operating Dammam 7 cluster

Installing and configuring HPC Applications and creating module Files
Assisting Aramco users to install their applications and troubleshooting errors
Maintaining the clusters jobs scheduler, and troubleshooting reasons for job failures
Changing the OS images to add the required packages
Tuning the Scheduler Configuration, controlling the maximum allowed resources to be used, adding cgroups configurations, Epilogs and Prologs for jobs tracking, cleaning and stability
Adding nodes to GPFS Cluster, and creating filesets as required
Stabilizing the cluster, adding more health checks to the nodes, and configuring triggers to exclude faulty nodes
Installing more monitoring tools (Influxdb, Grafana) for visibility on different cluster components

Supporting STCS Application Team Developing Utilities Apps

Deploying a full development environment on STCS cloud environment
Providing STCS development team with API collections to assist the development team integrating the Application with Slurm
Participating in the Design Process and choosing the most secure way to integrate the application with Aramco databases and with Dammam 7 Cluster
Assisting the Development team creating job template and passing input to the jobs from the GUI application
Assisting the Development team improving the application and adding more functionalities for managing the Jobs through the GUI Application

Visualization HPC Clusters Aramco Saudi Jul 2020 – Aug 2020

Project description

The following activities are managed within the project scope:

Deployment Planning

HPC cluster Implementation

The whole phases of implementation are managed via Ansible Playbooks to automate provisioning and configuration.
Image deployment of the nodes of each cluster operating system on up to 136 nodes.
Management nodes HA .

HPC Cluster UAEU Mar 2020 – April 2020

Project description

Implementation for HPC Cluster stack using the following technologies

Storage Cluster HA with RHEL (Pacemaker and Corosync).
Storage Box PowerVault ME4 series.
Cluster Management using Bright Computing version 9.0.
Cluster workload management PBSpro , Altair Access Web, Altair Control .
Mellanox for Infiniband network.
Cluster bench marking HPL and Burn check for the cluster.

HPC Cluster UAEU Jun 2018 – Jul 2018

Project description

Implementation for HPC Cluster stack using the following technologies

Cluster Management using Bright Computing version 7.3
Cluster workload management PBSpro.
HPL and Burn check for the cluster.

Cloud Revamp VFE Feb 2019 – Dec 2019

Project description
Implementation of Cloud Service Provider using VMware technology Stack and Automation layer with vRealize Orchestrator.
Gather and discuss the different business needs from VFE to be able to translate them into Workflows using the built-in ones along with chaining between them.
Provide the prerequisites for the testing environment with iQuest Hybris marketplace.
Provide iQuest with simple workflow document shows the input and output parameters, Providing the Workflows IDs that will be triggered from Hybris side

Technologies:

· (VRO, VCD, NSX-v, VROps, ESXI and vCenter)

Accomplishments

Oracle Cloud Infrastructure Certified Architect Associate
Oracle Cloud Infrastructure Foundation Certified Associate
Red Hat Certified System Engineer RHCE (Certificate number:180-054-934)
Red Hat Certified System Administrator RHCSA (Certificate number:180-054-934)
Introduction to computer science and programming (edx MITx).

Timeline

Principal HPC systems Engineer

Brightskies Technologies

01.2022 - Current

Senior HPC Systems Engineer

Brightskies Technologies

10.2017 - 01.2020

Cloud Systems Engineer

Brightskies Technologies

10.2017 - 01.2020

Application System Engineer

Pharmaoverseas

05.2017 - 10.2017

Professional Diploma - undefined

Information Technology Institute (ITI)

01.2016 - 01.2017

B.Sc. - Communication and Electronics Engineering

Alexandria University

01.2008 - 01.2012

Summary

Overview

Work History

Principal HPC systems Engineer

Senior HPC Systems Engineer

Cloud Systems Engineer

Application System Engineer

Education

B.Sc. - Communication and Electronics Engineering

Professional Diploma - undefined

Post-graduate - undefined

Skills

Projects

Accomplishments

Timeline

Principal HPC systems Engineer

Senior HPC Systems Engineer

Cloud Systems Engineer

Application System Engineer

Professional Diploma - undefined

B.Sc. - Communication and Electronics Engineering

Post-graduate - undefined

Similar Profiles

Rahul Kumar JaiswalRahul Kumar Jaiswal

Rahul Kumar JaiswalRahul Kumar Jaiswal

Pravin PawarPravin Pawar

Shadd GallegosShadd Gallegos