Senior Engineer, Systems

Health Support Center

Brentwood, TN Posted 7/1/2025 Full Time

Summary: Responsible for designing, implementing, and maintaining enterprise monitoring and observability solutions, with a focus on LogicMonitor and related monitoring platforms. Works closely with IT teams and stakeholders to ensure full infrastructure visibility, reduce alert noise, automate monitoring workflows, and improve system performance. This role plays a key part in establishing best practices for logging, metrics, and alerting, while leveraging automation to optimize monitoring efficiency. Provides technical leadership and contributes to the long-term monitoring strategy for the organization.

Qualifications:Education:

  • Bachelors degree in Engineering, Mathematics, Computer Science or equivalent technical experience.

Licenses/Certification:

  • CompTIA Server+ or Network+ (preferred).
  • ITIL Foundation Certification (preferred).
  • Relevant monitoring certifications (e.g., LogicMonitor, Splunk, Datadog, or SolarWinds) are a plus but not required

Experience:

  • 4-7 years of experience in IT monitoring, observability, or systems engineering in a mid-to-large-scale enterprise environment.
  • Hands-on experience with monitoring platforms such as LogicMonitor, SolarWinds, Datadog, Splunk, Nagios, or similar tools.
  • Experience configuring alert rules, event-based triggers, dynamic thresholds, anomaly detection, and dashboard visualizations.
  • Strong understanding of IT infrastructure components, including servers, networks, cloud environments, and virtualization.
  • Experience working with RBAC policies, IT service management tools (e.g., ServiceNow), and automation workflows.
  • Scripting or automation experience (PowerShell, Python, API integrations) is a plus.

Essential Functions:

Monitoring Platform Administration & Optimization

  • Manage and optimize monitoring platforms, ensuring proper infrastructure coverage and alerting configurations.
  • Develop and maintain monitoring standards for IT infrastructure, including servers, applications, and network devices.
  • Refine alerting processes by reducing noise, tuning thresholds, and improving actionable insights.
  • Manage daily monitoring administration tasks, including adding/removing devices, troubleshooting issues, and maintaining system integrity.

Incident Management & Proactive Monitoring

  • Analyze monitoring data to detect performance bottlenecks and outages.
  • Implement proactive alerting strategies to identify issues before they impact hospital operations, ensuring rapid response to potential failures.
  • Configure anomaly detection and predictive analytics to recognize early warning signs of system degradation.
  • Develop dashboards and automated reports to provide real-time visibility into infrastructure health and application performance.

Collaboration & Process Improvement

  • Work with IT teams (server, network, security, and applications) to align monitoring strategies and ensure visibility across environments.
  • Implement RBAC policies to provide team-specific access and monitoring configurations.
  • Document monitoring policies and best practices, ensuring knowledge-sharing across teams.

Automation & Integrations

  • Create automation and integrations using scripts or APIs to enhance monitoring workflows and reporting.
  • Develop event-driven automation to reduce manual intervention in monitoring-related tasks.
  • Ensure monitoring platforms are seamlessly integrated with ITSM tools for efficient incident tracking and resolution.

Knowledge/Skills/Abilities:

  • Strong expertise in LogicMonitor, Datadog, SolarWinds, Splunk, or similar monitoring platforms.
  • Understanding of metrics, logs, and traces as part of a comprehensive observability strategy.
  • Experience configuring alert rules, anomaly detection, event-based monitoring, and trend analysis.
  • Familiarity with RBAC policies, ITSM tools (e.g., ServiceNow), and API integrations.
  • Basic scripting and automation skills (PowerShell, Python, or other scripting languages) preferred.
  • Strong analytical skills to identify trends, detect anomalies, and troubleshoot monitoring issues.
  • Effective communication skills to collaborate with IT teams and business stakeholders.
  • Ability to work independently and within a team, taking ownership of monitoring initiatives.
  • Approximate percent of time required to travel: 10

ACKNOWLEDGEMENT:

  • This description is designed to indicate the general nature and level of work for this position. It is not intended to describe minor duties or other responsibilities that may be periodically assigned.
  • You agree to conduct your job responsibilities in accordance with the standards set out in the Employee Handbook, Company's Code of Business Conduct, its policies and procedures, applicable federal and state laws, and applicable professional standards.
JOB LOCATION:
Brentwood, TN 37027

Apply NowApply Now
This website uses cookies for analytics and to function properly. By using our site, you agree to these terms.