Hi, looks interesting...<p>Some questions:<p>All based on what we do at Kentik, where we run a large distributed column store with high speed data ingest layers feeding it, and a relatively more easy core cluster of portal/high-level metadata nodes. Ingest nodes are 20 core low-disk and data nodes 36-core 24x2tb disk w/ ZFS, which we rely on for compression.<p>We use docker but on top of known equipment types of deterministic latency and performance. Machines are netbooted and much of the system is controlled by puppet push. Typical clusters range from 10 to low hundreds of nodes. We do on-prem but they are typically on top of bare metal and using our control stack.<p>So with that as background...<p>For an application that requires high speed native (in particular disk) performance and persistent disk to run, how is Gravitational requiring that at deployment time?<p>Is there an ability to add metrics to dashboards for on-prem 'operators' to access? Send those through logged mail gateways for SaaS companies to watch on-prem status remotely, at least at an aggregate level (dashboards plus perhaps some log excerpts)? Have part-time (customer-controlled) logged ssh gateways for supervised and/or logged remote access by SaaSco engineers?<p>How do you do versioning, deployment, and distributed rollout to running systems with 24x7x100% uptime expectation? Is there the ability to support partial upgrade of only some nodes of specific roles followed by automated and/or human checks? How about rollback?<p>Thanks, and good luck with the company!