5 ก.พ. 2569
Assessment, monitoring and observability critical application with end-to-end for all relate component and recording performance history for improvement
Owner of application availability in production environment.
Take care application’s reliable for running 24 hours in 7 days
Investigate, analyze and provide solution to fix incident on production environment
Research and provide improvement procedure for increase application availability on short-term, medium-term and long-term solution
Collective and capacity planning for application sustainability
Provide automation way for monitor, notification and common response to handle problem with prevention approach
Monitoring and Logging: Implement and manage monitoring and logging tools (e.g., Prometheus, ELK stack, Grafana) to ensure system performance, reliability, and security.
Work closely with development and operations teams to troubleshoot issues, optimize performance, and maintain high availability of services.
Develop scripts and automation tools to simplify routine tasks and increase efficiency.
Maintain clear and detailed documentation of core application and processes to ensure knowledge sharing and easy troubleshooting.
Bachelor's degree in Computer Science, Computer Engineering, Information Technology, ICT or a related field (or equivalent work experience).
Proven experience as a SRE Engineer , DevOps Engineer or a similar role.
Strong knowledge of cloud platforms (Azure, AWS), virtualization and container technologies.
Knowledge and Experience of Enterprise IT Infrastructure, FSI is preferable (Banking, Brokerage)
Expert in Linux, Docker, Kubernetes, CI/CD, Jenkins, JMeter, Hashi Vault and other open-source tools
Excellent problem-solving skills and the ability to work in a fast-paced, collaborative environment.













