Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-54234

p4 sync sometimes produces no file, or empty file, when readFile is used immediately after

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • p4-plugin

      We have a build system set up, where the master hands out jobs to the slaves. The slave will p4sync a version information text file to determine which application file that it needs to download via another p4sync call to start its build process.

      In some cases the p4 sync call completes, (we have verbose on), and let's us know which version of the file it was able to find, with revision information. We immediately call readFile, and a NoSuchFileException is thrown. (or the file is empty)

      *09:08:31* java.nio.file.NoSuchFileException: /Users/current_user/.jenkins-LightWeight/workspace/job_name-617/ProjectVersion.txt
      
      *09:08:31*       at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
      
      *09:08:31*       at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
      
      *09:08:31*       at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
      
      *09:08:31*       at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
      
      *09:08:31*       at java.nio.file.Files.newByteChannel(Files.java:361)
      
      *09:08:31*       at java.nio.file.Files.newByteChannel(Files.java:407)
      
      *09:08:31*       at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384)
      
      *09:08:31*       at java.nio.file.Files.newInputStream(Files.java:152)
      
      *09:08:31*       at hudson.FilePath$Read.invoke(FilePath.java:1906)
      
      *09:08:31*       at hudson.FilePath$Read.invoke(FilePath.java:1898)
      
      *09:08:31*       at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2918)
      
      *09:08:31*       at hudson.remoting.UserRequest.perform(UserRequest.java:210)
      
      *09:08:31*       at hudson.remoting.UserRequest.perform(UserRequest.java:53)
      
      *09:08:31*       at hudson.remoting.Request$2.run(Request.java:364)
      
      *09:08:31*       at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
      
      *09:08:31* Caused: java.io.IOException
      
      *09:08:31*       at hudson.remoting.FastPipedInputStream.read(FastPipedInputStream.java:169)
      
      *09:08:31*       at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
      
      *09:08:31*       at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
      
      *09:08:31*       at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
      
      *09:08:31*       at java.io.InputStreamReader.read(InputStreamReader.java:184)
      
      *09:08:31*       at java.io.Reader.read(Reader.java:140)
      
      *09:08:31*       at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2001)
      
      *09:08:31*       at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1980)
      
      *09:08:31*       at org.apache.commons.io.IOUtils.copy(IOUtils.java:1957)
      
      *09:08:31*       at org.apache.commons.io.IOUtils.copy(IOUtils.java:1907)
      
      *09:08:31*       at org.apache.commons.io.IOUtils.toString(IOUtils.java:778)
      
      *09:08:31*       at org.apache.commons.io.IOUtils.toString(IOUtils.java:803)
      
      *09:08:31*       at org.jenkinsci.plugins.workflow.steps.ReadFileStep$Execution.run(ReadFileStep.java:97)
      
      *09:08:31*       at org.jenkinsci.plugins.workflow.steps.ReadFileStep$Execution.run(ReadFileStep.java:86)
      
      *09:08:31*       at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$1$1.call(SynchronousNonBlockingStepExecution.java:49)
      
      *09:08:31*       at hudson.security.ACL.impersonate(ACL.java:290)
      
      *09:08:31*       at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$1.run(SynchronousNonBlockingStepExecution.java:46)
      
      *09:08:31*       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      
      *09:08:31*       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      
      *09:08:31*       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      
      *09:08:31*       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      
      *09:08:31*       at java.lang.Thread.run(Thread.java:748)
      

       To work around this, I threw in some retries and sleeps. The idea being that it will attempt repeatedly to readFile until the file is there (120 tries, 1 second apart), and if that fails , it will attempt the p4 sync again. Usually this will work on the second try, if not the retry logic takes over again. In some rarer cases, the file will be empty, so we throw and error and it starts over.

      Is there any reason that the file system would have some long delay between p4 sync completely and the file being available? We don't have the parallel functionality on for p4. We have no other threads on any of the machines interacting with the workspaces. This work around works for now, but we have no idea why the file would not be there, or not have contents. About 1/50 - 1/100 of our jobs run into this issue, it seems random. 

      Is there a better way to readFile after a p4Sync to guarantee that the file is in fact there, and has contents?

      Example code:

       

      stage('Example Stage')
      {
      	steps
      	{
      		script
      		{		
      			// Local path to our project version 	
      			def projectVersionFilePath = "//${PathToTrunk}/ProjectSettings/ProjectVersion.txt "
      			echo 'projectVersionView - ' + projectVersionView
      
      			def projectVersionFileContent = null;
      
      			def MAX_PROJECT_VERSION_SYNC_ATTEMPTS = 5;
      			def PROJECT_VERSION_SYNC_SLEEP_TIME = 5;
      
      			int numSyncAttempts = 0;
      
      			try {
      				// We have this retry, because in some cases
      				// even after 120 retries (about 120 seconds)
      				// it still can't open the file
      				retry(MAX_PROJECT_VERSION_SYNC_ATTEMPTS)	{
      					if (numSyncAttempts > 0) {						
      						sleep PROJECT_VERSION_SYNC_SLEEP_TIME
      					}
      
      					numSyncAttempts++;	
      
      					// In all cases this will tell us the file and revision number that we want
      					p4sync  charset: 'none',
      						credential: 'VALID_CREDENTIAL',
      						format: 'Jenkins-${NODE_NAME}-${JOB_NAME}-${EXECUTOR_NUMBER}',
      						populate:  forceClean(have: false, parallel: [enable: false, minbytes: '1024', minfiles: '1', threads: '4'], pin: '', quiet: false),
      						source: depotSource(projectVersionFilePath)
      
      					int numReadAttempts = 0;
      
      					def MAX_PROJECT_VERSION_READ_ATTEMPTS = 120;
      					def PROJECT_VERSION_READ_SLEEP_TIME = 1;					
      
      					// We have this retry because sometimes
      					// the file isn't available to be read even if the p4 sync has completed "correctly"
      					retry(MAX_PROJECT_VERSION_READ_ATTEMPTS) {
      
      						if (numReadAttempts > 0) {							
      							sleep projectVersionReadSleepTimeSeconds;
      						}
      												
      						numReadAttempts++;		
      
      						// this will in some cases fail, for reasons unknown to me
      						projectVersionFileContent = readFile 'ProjectVersion.txt';	
      						
      						// in stranger situations the file will read,
      						// but will be empty
      						if (0 == fileContent.length()) {
      							error '0 == fileContent.length'
      						}												
      					}
      				}
      			} catch (e) {
      				echo 'failed getting ProjectVersion.txt'
      				throw e;	
      			}
      
      			echo 'PROJECT VERSION : ' + projectVersionFileContent;
      		}
      	}
      }
      

       

          [JENKINS-54234] p4 sync sometimes produces no file, or empty file, when readFile is used immediately after

          Paul Allen added a comment -

          Hi Sean, I'm no expert on Jenkinsfile Groovy scripts, but the User Groups might be able to help.  I don't normally run p4sync inside a script block, but it seems to make no difference.

          Here is a simple test Jenkinsfile: 

          pipeline {
             agent any
             stages {
                stage('Stage') {
                   steps {
                      script {
                         p4sync(credential: 'phooey', 
                            populate: forceClean(parallel: [enable: true, minbytes: '1024', minfiles: '1', threads: '4']), 
                            source: depotSource('//depot/...'))
          
                         readFile 'projA/Project.txt'
                      }
                   }
                }
             }
          } 

          I have enabled parallel sync just incase, but have not been able to reproduce your issue (//depot/... represents about 1600 files).  The p4sync seems to block until all the files are synced.  

          Have you tried checkout instead of p4sync?

          The p4sync step is extended from SCMStep which seems to use AbstractSynchronousNonBlockingStepExecution, but it is not clear if this is asynchronous in the Jenkinsfile script. 

          Paul Allen added a comment - Hi Sean, I'm no expert on Jenkinsfile Groovy scripts, but the User Groups might be able to help.  I don't normally run  p4sync  inside a  script block, but it seems to make no difference. Here is a simple test Jenkinsfile:  pipeline { agent any stages { stage( 'Stage' ) { steps { script { p4sync(credential: 'phooey' , populate: forceClean(parallel: [enable: true , minbytes: '1024' , minfiles: '1' , threads: '4' ]), source: depotSource( ' //depot/...' )) readFile 'projA/Project.txt' } } } } } I have enabled parallel sync just incase, but have not been able to reproduce your issue (//depot/... represents about 1600 files).  The p4sync seems to block until all the files are synced.   Have you tried checkout instead of p4sync? The  p4sync  step is extended from SCMStep which seems to use AbstractSynchronousNonBlockingStepExecution , but it is not clear if this is asynchronous in the Jenkinsfile script. 

          sean richer added a comment -

          Thanks p4paul. I'm going to look into checkout, and how to configure it with perforce. I'll report back by monday (10/29) with what i find out. Thanks for the quick reply! Our situation is really strange, because we're only trying to sync a single file, and it fails sometimes for reasons I don't yet understand. I'll look up the AbstractSynchronousNonBlockingStepExecution as well. Thanks!

          sean richer added a comment - Thanks p4paul . I'm going to look into checkout, and how to configure it with perforce. I'll report back by monday (10/29) with what i find out. Thanks for the quick reply! Our situation is really strange, because we're only trying to sync a single file, and it fails sometimes for reasons I don't yet understand. I'll look up the AbstractSynchronousNonBlockingStepExecution as well. Thanks!

          Karl Wirth added a comment -

          Hi sricher_bfg. Did you manage to try the test? If you did and you are still getting problems and you'd like another set of eyes on the problem please send an email to support@perforce.com mentioning this Jenkins issue.

          Karl Wirth added a comment - Hi sricher_bfg . Did you manage to try the test? If you did and you are still getting problems and you'd like another set of eyes on the problem please send an email to support@perforce.com  mentioning this Jenkins issue.

          sean richer added a comment -

          thanks p4karl. The issue is intermittent. More often than not p4 sync'ing again will fix the issue, but the inner retry loop to check for the file almost always fails. I see it in our logs occasionally, I asked around internally a bit and it sounds like it has more to do with a race condition with the p4 plugin not having a workspace ready for the executors in time? I will do a little more diving into this. Apparently the original engineer believes the issue will become worse / more pronounced as we add more nodes to the cluster. I'm about to do that. Can I update you as we scale if it keeps being a problem? 

          sean richer added a comment - thanks p4karl . The issue is intermittent. More often than not p4 sync'ing again will fix the issue, but the inner retry loop to check for the file almost always fails. I see it in our logs occasionally, I asked around internally a bit and it sounds like it has more to do with a race condition with the p4 plugin not having a workspace ready for the executors in time? I will do a little more diving into this. Apparently the original engineer believes the issue will become worse / more pronounced as we add more nodes to the cluster. I'm about to do that. Can I update you as we scale if it keeps being a problem? 

            p4karl Karl Wirth
            sricher_bfg sean richer
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: