Centralize Monitor Server table definitons on backend#6321
Centralize Monitor Server table definitons on backend#6321DomGarguilo merged 4 commits intoapache:mainfrom
Conversation
|
Tried running this locally and in the web browser debug console I noticed for every page I visited on the monitor it always fetched I first noticed it on the tserver page, it would fetch TABLET_SERVERS, SCAN_SERVERS, and MANAGERS. Then tried visiting the main page and it fetched SCAN_SERVERS and MANAGERS. |
| } | ||
|
|
||
| private static List<Metric> tabletServerMetrics() { | ||
| return metricList(MetricDocSection.GENERAL_SERVER, MetricDocSection.TABLET_SERVER, |
There was a problem hiding this comment.
As an experiment I changed this method to return the following and ran the monitor. It displayed those columns (plus the commons ones prepended) in that order. Not suggesting this change, just sharing the experiment it was neat to see this working and really easy to change.
return List.of(Metric.TSERVER_MINC_RUNNING, Metric.TSERVER_MINC_QUEUED, Metric.TSERVER_HOLD, Metric.SCAN_ZOMBIE_THREADS, Metric.BLOCKCACHE_DATA_REQUESTCOUNT, Metric.BLOCKCACHE_DATA_HITCOUNT, Metric.BLOCKCACHE_DATA_EVICTIONCOUNT);Not something for this PR, but w/ the experiment above realized we need scan queued, running, and failed metrics. Some of those metrics may be there in the form of thread pool metrics, but those are from micrometer and not currently collected by the monitor. May also be good to have failed MinC metric. Then for scan and minc we could have running/queued/failed and those would be really good indications of tserver health. Probably want to eventually put the smallest set of metrics in the table for all servers that indicates if a server is healthy, then maybe we could have a per server page that one could click on and see all that the monitor knows about a specific server.
There was a problem hiding this comment.
it was neat to see this working and really easy to change.
Great! This was the goal so I'm glad things were easy.
we need scan queued, running, and failed metrics
Yea this seems like a good follow on. Should we create a ticket to track this?
Probably want to eventually put the smallest set of metrics in the table for all servers that indicates if a server is healthy, then maybe we could have a per server page that one could click on and see all that the monitor knows about a specific server.
This seems like a good idea to me. The goal with these changes is to make it easy to figure out what info to put in each table however I am not sure where the best place to discuss or document that info per table is.
There was a problem hiding this comment.
The goal with these changes is to make it easy to figure out what info to put in each table
For tablet servers I would like to try to focus on a few things w.r.t. metrics, are scans,writes, conditional updates failing or backed up? Hope to try experimenting w/ that in a follow on to this. If those high level indicators show a problem then would need to dig deeper to figure out why.
|
|
||
| // Keep common columns and filter out metric columns that have no values | ||
| this.columns = requestedColumns.stream() | ||
| .filter(col -> isCommonColumn(col.key()) || presentMetricNames.contains(col.key())) |
There was a problem hiding this comment.
Seems like this will only show columns that have a metric present on at least one server? Would it be possible to always show all the requested columns even if no server currently has metrics? Seems like this could cause columns to blink in and out of existence when auto refresh is enabled.
There was a problem hiding this comment.
Yea this is definitely an issue. Its a tradeoff. The filter was added to avoid a very wide table full of empty columns but filtering out empty columns is not a good longterm solution. The better longterm solution is to to make the per-table metric Lists smaller and more intentional then always show that list even if that means empty columns.
Maybe for now ill remove that filter so it makes it super obvious there is work to be done to refine the list for each table.
I'm pretty sure you're correct. With this PR the calls the navbar is making need to pull these full table model JSONs to get the data they need. I think it will be a worthwhile follow on to make a dedicated endpoint to return this info. That will be a lot more lightweight and not send a ton of unnecessary data. |
This PR centralizes all of the Monitor table definitions into one place on the backend. We can now more or less define a monitor table via a
List<Metric>where that list gets mapped 1:1 to the view on the front end. All order and filtering of the columns takes place next to thisListon the backend.columns,data(rows),status(used for banners and status LEDs), and timestamprecord Columnso the backend can define and send each columnsenum ServerTableto serve as the IDs for each table and to be used as the API param in/rest-v2/servers/view;table=<table>ServersView. There should be onServersViewper UI tableServersViewwhere each tables shape is defined. These methods return aList<Metric>which maps directly to the frontend table columns. This is where we define which and in what order the columns will be for each table.columns.jsand its supporting code. Now the frontend builds the table headers and DataTables column config directly from the returned backend column definitionsNow to add a new table to the UI:
ServersViewfor that table/rest-v2/servers/view;table=<NEW_TABLE>