Skip to content

Distributes compaction coordination across managers#6324

Open
keith-turner wants to merge 3 commits intoapache:mainfrom
keith-turner:dist-coord2
Open

Distributes compaction coordination across managers#6324
keith-turner wants to merge 3 commits intoapache:mainfrom
keith-turner:dist-coord2

Conversation

@keith-turner
Copy link
Copy Markdown
Contributor

For an overview of how this works see the javadoc in CompactionCoordinator.

For an overview of how this works see the javadoc in
CompactionCoordinator.
@keith-turner keith-turner added this to the 4.0.0 milestone Apr 17, 2026
@keith-turner
Copy link
Copy Markdown
Contributor Author

This PR squashed all the changes from #6217 in to a single commit. Opened a new PR because that one has a lot of noise. A lot of prerequisite changes were made for this change and those are all linked to #6217.

SecurityErrorCode.PERMISSION_DENIED).asThriftException();
}
ResourceGroupId groupId = ResourceGroupId.of(job.group);
LOG.trace("reserveCompactionJob called for group {} by compactor {}", groupId,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include the ECID here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added it in ed7ef38

public void addJobs(TInfo tinfo, TCredentials credentials, List<TResolvedCompactionJob> tjobs)
throws TException {
if (!security.canPerformSystemActions(credentials)) {
LOG.warn("Thrift call attempted to add job and did not have proper access. {}",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not throw an exception here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a oneway thrift method. Added a comment in ed7ef38 and changed to log level from warn to error.

List<HostAndPort> sortedUniqueHost) {
}

public synchronized CoordinatorLocations getLocations(boolean useCache) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I could tell, 'true' is only used as an argument

zooCache.clear(Constants.ZMANAGER_COORDINATOR);
}
byte[] serializedMap = zooCache.get(Constants.ZMANAGER_COORDINATOR);
var type = new TypeToken<Map<String,String>>() {}.getType();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could probably be a private static final var

byte[] serializedMap = zooCache.get(Constants.ZMANAGER_COORDINATOR);
var type = new TypeToken<Map<String,String>>() {}.getType();
Map<String,String> stringMap = GSON.get().fromJson(new String(serializedMap, UTF_8), type);
Map<ResourceGroupId,HostAndPort> locations = new HashMap<>();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Map<ResourceGroupId,HostAndPort> locations = new HashMap<>();
Map<ResourceGroupId,HostAndPort> locations = new HashMap<>(stringMap.size());

sleepTime = maxSleepTime;
}

UtilWaitThread.sleep(sleepTime);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need a property here like the TABLE_SUSPEND_DURATION, but for Coordinators. There is a case where we don't want to be too responsive to a Manager going away and coming back. Bouncing the Manager sometimes clears up some state issues. I would think that we only want to recompute the group assignments if the number of compaction groups has changed or the number of managers has changed, and then only after some suspend duration.

private void setupAssistantMetrics(MetricsProducer... producers) {
MetricsInfo metricsInfo = getContext().getMetricsInfo();
metricsInfo.addMetricsProducers(producers);
// TODO should tests compaction metrics from multiple managers
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO remaining

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants