Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix corrupted archive with checksum validation, synchronized lock and API enhancement #51

Merged
merged 8 commits into from
Dec 6, 2024
Prev Previous commit
Next Next commit
Change response for synchronized generation and add timeout error han…
…dling
yma96 committed Dec 4, 2024
commit c6a9cd93eb571b31e5aee6520ff4db87a08cb77f
Original file line number Diff line number Diff line change
@@ -64,6 +64,9 @@
import java.util.concurrent.ExecutorCompletionService;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;
import java.util.stream.Collectors;
import java.util.zip.ZipEntry;
import java.util.zip.ZipInputStream;
@@ -153,44 +156,59 @@ public void destroy()
IOUtils.closeQuietly( client, null );
}

public void generate( HistoricalContentDTO content )
public boolean generate( HistoricalContentDTO content )
{
String buildConfigId = content.getBuildConfigId();
Object lock = buildConfigLocks.computeIfAbsent( buildConfigId, k -> new Object() );
synchronized ( lock )
{
while ( isInProgress( buildConfigId ) )
if ( isInProgress( buildConfigId ) )
{
logger.info( "There is already generation process in progress for buildConfigId {}, try lock wait.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just throw warning to user that there is already process in progress, please try later ? sorry if I miss some requirements. ; -)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sswguo You mean don't let the coming process wait, just warn to user and return? Not sure whether PNC care the response of archive generation since this should be at the end of build, if that's the point I will comment to confirm.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can confirm that which case will send two same archive requests in a short time.

Copy link
Member Author

@yma96 yma96 Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sswguo probably large build with large archive file, since PNC UI doesn't allow concurrent builds with the same config ID, the previous build might be already done but generation with download was still in progress, then another build was in progress and used the same workspace to process download or generation, so we need to know if they want to handle the generation response themselves or just let archive go with the right thing, comment on JIRA and discuss on meeting.

buildConfigId );
try
{
lock.wait();
}
catch ( InterruptedException e )
{
Thread.currentThread().interrupt();
return;
}
// Conflicted generation, just return with expected http response immediately
return false;
}
recordInProgress( content.getBuildConfigId() );
generateExecutor.execute( () -> {

recordInProgress( buildConfigId );
Future<?> future = generateExecutor.submit( () -> {
try
{
doGenerate( content );
}
catch ( Exception e )
{
// If error happens on generation, remove the status to make sure coming generation
removeStatus( buildConfigId );
logger.error( "Generation failed for buildConfigId {}", buildConfigId, e );
}
finally
{
synchronized ( lock )
{
recordCompleted( content.getBuildConfigId() );
buildConfigLocks.remove( buildConfigId );
lock.notifyAll();
logger.info( "lock released, buildConfigId {}", buildConfigId );
}
recordCompleted( buildConfigId );
buildConfigLocks.remove( buildConfigId );
logger.info( "lock released, buildConfigId {}", buildConfigId );
}
} );
try
{
future.get( 60, TimeUnit.MINUTES );
}
catch ( TimeoutException e )
{
// If timeout happens on generation, cancel and remove the status to make sure coming generation
future.cancel( true );
removeStatus( buildConfigId );
logger.error( "Generation timeout for buildConfigId {}", buildConfigId, e );
}
catch ( InterruptedException | ExecutionException e )
{
// If future task level error happens on generation, cancel and remove the status to make sure coming generation
future.cancel( true );
removeStatus( buildConfigId );
logger.error( "Generation future task level failed for buildConfigId {}", buildConfigId, e );
}
}
return true;
}

protected Boolean doGenerate( HistoricalContentDTO content )
@@ -622,15 +640,20 @@ private void restoreGenerateStatusFromDisk() throws IOException
}
}

private void recordInProgress( String buildConfigId )
private void recordInProgress( final String buildConfigId )
{
treated.remove( buildConfigId );
treated.put( buildConfigId, ArchiveStatus.inProgress.getArchiveStatus() );
}

private void recordCompleted( String buildConfigId )
private void recordCompleted( final String buildConfigId )
{
treated.remove( buildConfigId );
treated.put( buildConfigId, ArchiveStatus.completed.getArchiveStatus() );
}

private void removeStatus( final String buildConfigId )
{
treated.remove( buildConfigId );
}
}
Original file line number Diff line number Diff line change
@@ -51,6 +51,7 @@

import static jakarta.ws.rs.core.MediaType.APPLICATION_JSON;
import static jakarta.ws.rs.core.MediaType.APPLICATION_OCTET_STREAM;
import static jakarta.ws.rs.core.Response.Status.CONFLICT;
import static jakarta.ws.rs.core.Response.Status.NOT_FOUND;
import static jakarta.ws.rs.core.Response.accepted;
import static jakarta.ws.rs.core.Response.noContent;
@@ -74,6 +75,7 @@ public class ArchiveManageResources

@Operation( description = "Generate archive based on tracked content" )
@APIResponse( responseCode = "202", description = "The archive created request is accepted" )
@APIResponse( responseCode = "409", description = "The archive created request is conflicted" )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if 409 is good response to PNC as this may fail the request. We should let them know if we agree to use this.

Copy link
Member Author

@yma96 yma96 Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't add @michalovjan to review, could you help to confirm on this? Also comment on JIRA https://issues.redhat.com/browse/MMENG-4256.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @yma96, I'd rather have just 202 Accepted if that's okay.

But if you're keen on having 409, we'd need to add that to repository-driver so that we don't unnecessarily retry requests. It's doable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michalovjan make sense, I'll change to 202 accepted to avoid more code change from PNC.

@RequestBody( description = "The tracked content definition JSON", name = "body", required = true,
content = @Content( mediaType = APPLICATION_JSON,
example = "{" + "\"buildConfigId\": \"XXX\"," + "\"downloads\":" + "[{"
@@ -104,11 +106,17 @@ public Uni<Response> create( final @Context UriInfo uriInfo, final InputStream b
return fromResponse( message );
}

controller.generate( content );
boolean accepted = controller.generate( content );
return Uni.createFrom()
.item( accepted().type( MediaType.TEXT_PLAIN )
.entity( "Archive created request is accepted." )
.build() );
.item( accepted ?
accepted().type( MediaType.TEXT_PLAIN )
.entity( "Archive created request is accepted." )
.build() :
Response.status( CONFLICT )
.type( MediaType.TEXT_PLAIN )
.entity(
"Another generation with same build config ID was already in progress, request is conflicted." )
.build() );
}

@Operation( description = "Get the status of generating archive based on build config Id" )