Manufacturing today operates at the speed of software, where shifts are faster, data is constant, and the cost of failure is higher than ever. Teams are coordinating across multiple sites, products turn over faster, and the amount of data flowing from people, parts, machines, and suppliers keeps multiplying. In that environment, a Manufacturing Execution System (MES) cannot simply keep the lights on. It must be relentlessly fast, available when needed, and ready to scale. Otherwise, it becomes a bottleneck that slows production and erodes trust.
The stakes are real. According to Oxford Economics, unplanned downtime costs the Global 2000 (the world’s largest companies) $400 billion annually. For manufacturing specifically, Aberdeen Research has found that unplanned downtime costs U.S. manufacturers an average of $260,000 per hour. In the automotive industry, where production lines are highly complex, a single hour of downtime can reach as high as $2.3 million. When production stops, so do cash flow and commitments. That’s why performance and reliability aren’t “nice to have” qualities for an MES; they’re financial and compliance imperatives.
Legacy MES deployments struggle here. Many were built for a slower era, not for real-time data, multi-site coordination, or today’s constant demand for integration. The result is slow dashboards, fragile integrations, and performance problems that surface at the worst possible times.. When that happens, operators wait, supervisors lose visibility, and leadership loses confidence.
This post shares what we (and industry leaders) measure, the standards we hold ourselves to, and a practical checklist you can use to evaluate your current MES—no matter who your provider is.
Reliability means your system is there when you need it, keeps data accurate, and recovers quickly when something breaks. In a factory, that can be the difference between meeting a shipment deadline or missing a quarterly goal. For an MES, that translates to being up when the factory needs it, processing transactions accurately, and recovering quickly when something goes wrong. Availability (often stated as “uptime”) is one measure of reliability, but others include data integrity, error rates, and recovery times.
A quick reference for uptime “nines”:
For a plant that runs two or three shifts, “just a few hours” of downtime can be the difference between hitting a shipment window or missing a quarter’s target.
Reliability isn’t only a software concept; it’s rooted in manufacturing standards and discipline:
In practice, reliability means consistent uptime, accurate data, secure operations, and quick recovery, all supported by disciplined processes across IT and OT.
Performance is the speed and responsiveness operators and supervisors feel in the moment. It shows up in whether dashboards load instantly or stall under peak demand. That difference is felt on every shipment and every quarter. Scalability refers to the ability to grow products, users, sites, and data volumes without slowing down or re-architecting the entire system.
A practical way to measure performance is by looking at percentiles, such as how fast the system responds 95% or 99% of the time. This is often written as P95 or P99. Averages can hide the bad moments, and if the slowest responses happen during peak hours, operators will feel it, compounding into missed shipments and wasted hours. Using percentiles within SLIs (Service Level Indicators) and setting SLOs (Service Level Objectives) is a practical way to capture these distributions.
Human factors matter too. People generally perceive system response times in three key ranges: about 0.1 seconds feels “instant,” about 1 second preserves a sense of flow, and around 10 seconds starts to break attention.. These limits are a useful guide for setting performance targets: critical reads should feel sub-second, while complex, multi-step operations may take longer but should always show progress and never block the line.
Where can legacy MES deployments bog down?
Modern MES platforms avoid these traps with horizontal scaling, read replicas, short-TTL caching, back-pressure, and streaming/event architectures—paired with production-grade observability to catch issues before users do. A practical way to track system health is through the Four Golden Signals: response speed, traffic volume, error frequency, and resource strain.
The most credible MES providers share the same metrics they track internally. That way, customers can validate results with their own teams and bring clear data to leadership. Concretely, the categories that matter most for an MES are:
This isn’t theoretical. Cloud reliability frameworks provide a playbook: design for failure, define SLIs/SLOs, and test DR regularly—including multi-AZ/Region patterns when business demands require it.
Even if you don’t publish every datapoint, the habit of instrumenting, reviewing, and communicating these metrics builds trust with plant leadership and IT/OT alike. It also helps teams make sane trade-offs (for example, when to precompute vs. compute on demand, or when to invest in a hot standby).
When evaluating a modern MES (or reviewing your own), it’s reasonable to anchor expectations to well-documented cloud and software standards:
These references are not vendor claims. They are public benchmarks that you can bring into internal conversations to prove your MES is meeting or exceeding industry standards
Use this list to cut through the noise in vendor evaluations. If you are already a customer, use it to benchmark your system and show progress to your teams
From the beginning, we’ve favored an “observer perspective”: publish what we measure and invite customers to hold us to it. In practice, we treat uptime, P95/P99 latency, throughput, and recovery objectives as first-class SLIs, and we review them with customers on a regular cadence.
Architecturally, we align with well-understood cloud reliability patterns—multi-AZ databases, short-TTL caching with smart invalidation, streaming for hot paths, and read-optimized APIs—so the platform stays responsive at peak. We map our observability to the Four Golden Signals and maintain a disaster recovery plan that is tested, documented, and adjusted as customers scale. None of this is unique to us; it reflects what modern MES leadership requires.
Performance, reliability, and scalability are not abstract engineering ideals. They are the difference between confident, first-time-right builds and a day of triage. With downtime costing industries billions per year and single hours reaching seven figures in some sectors, the bar for MES is rising fast. Buyers should use the checklist above to assess new systems. Customers should use it to validate results and reinforce wins inside their own organizations. Either way, expect vendors (including us) to present real numbers, not just claims.
If you’d like to see the standards we hold ourselves to—and how we test against them in environments like yours—reach out. We’re happy to share details and compare notes.