• Icon: Improvement Improvement
    • Resolution: Fixed
    • Icon: Critical Critical
    • other
    • None
    • Platform: All, OS: All

      Seems that the usual way how jobs work on hudson is that there is one job for
      building a project and then if I want to have a subsequent job to run all unit
      tests, that job needs to download artifacts produced by that build job from
      Hudson. As a consequence, people usually add separate targets to their build
      scripts for running unit tests from hudson which download the build artifacts
      from the network.
      I think it would be nice if things could work the way how users would do it when
      running tests locally - i.e. use usual targets for building and then simply run
      a target for unit tests in the same workspace. This could be achieved by
      providing the subsequent jobs an exact snapshot of the workspace at the time
      when the parent job finished running.

          [JENKINS-682] Clone workspace between jobs

          The problem with doing this is that then you won't be able to run the main build
          and tests in parallel, so it increases the turn-around time.

          What's the difference between setting up two jobs with the same workspace, as
          opposed to setting up one job that does both the build and the test?

          Kohsuke Kawaguchi added a comment - The problem with doing this is that then you won't be able to run the main build and tests in parallel, so it increases the turn-around time. What's the difference between setting up two jobs with the same workspace, as opposed to setting up one job that does both the build and the test?

          mmatula added a comment -

          I see. I was not sure how it is implemented - i.e. whether different slaves
          access the same physical workspace. I thought each slave checks out its own
          workspace for a given job in which case tests and a new build could still run in
          parallel.
          I thought the difference between running it as a single job would be exactly
          that possibility to start another build before tests (and possibly other
          subsequent jobs such as coverage, findbugs, javadoc generation, etc.) finished
          running.

          mmatula added a comment - I see. I was not sure how it is implemented - i.e. whether different slaves access the same physical workspace. I thought each slave checks out its own workspace for a given job in which case tests and a new build could still run in parallel. I thought the difference between running it as a single job would be exactly that possibility to start another build before tests (and possibly other subsequent jobs such as coverage, findbugs, javadoc generation, etc.) finished running.

          Every slave uses a workspace on local file system, so different slaves get
          different workspaces.

          The problem I was trying to point out in sharing a workspace is that if a new
          build starts while a test run is in progress, it will most likely mess up. For
          example, a build might try to overwrite the file that a test is using. So in
          general, I don't see how you can reliably run multiple tasks over the same
          workspace in parallel.

          Or maybe what you are really saying is a slightly different model, where the
          execution will go like:

          ws #0 <--- build #N --><-- test #N --->
          ws #1 <--- build #N+1 --><--test #N+1 --->

          ?

          Kohsuke Kawaguchi added a comment - Every slave uses a workspace on local file system, so different slaves get different workspaces. The problem I was trying to point out in sharing a workspace is that if a new build starts while a test run is in progress, it will most likely mess up. For example, a build might try to overwrite the file that a test is using. So in general, I don't see how you can reliably run multiple tasks over the same workspace in parallel. Or maybe what you are really saying is a slightly different model, where the execution will go like: ws #0 <--- build #N -- >< -- test #N ---> ws #1 <--- build #N+1 -- >< --test #N+1 ---> ?

          gradopado added a comment -

          Have a look at Buildbot, where each build is composed of several build steps.
          Steps are tied to one another, no overlapping.

          The first one is mostly cvs-checkout and then you are free to do in steps what
          you think is necessary. Each step has its own result and derective how to
          proceed on WARNING and ERROR cases. This concept is nice.

          gradopado added a comment - Have a look at Buildbot, where each build is composed of several build steps. Steps are tied to one another, no overlapping. The first one is mostly cvs-checkout and then you are free to do in steps what you think is necessary. Each step has its own result and derective how to proceed on WARNING and ERROR cases. This concept is nice.

          mmatula added a comment -

          Kohsuke, what I would like is to be able to configure my job in such a way that
          it could assume the workspace is in the same state as it was after it's parent
          job. So, if some job is triggered by other job, then I thought it would be good
          if there was a way for this triggered job to inherit the workspace of that
          triggering job. So that people would not have to do additional "magic" specific
          to hudson (and maintain additional hudson-specific targets) in their build
          scripts. Hudson does know that a given job is triggered by another job and most
          likely it is going to use the artifacts that other job produces. So I thought it
          would be nice if those could be provided to these subsequent jobs automatically.

          As a user that does not know much about hudson internals, I cannot suggest how
          to implement it. I thought one way (the basic one) it could be done is what you
          showed in your ASCII diagram. Maybe other possible way could be to have a
          mechanism to push a snapshot of the workspace from a job to other jobs it
          triggers (potentially to other machines if those jobs run on other systems). I
          guess this would be least limiting when it comes to parallel execution.

          mmatula added a comment - Kohsuke, what I would like is to be able to configure my job in such a way that it could assume the workspace is in the same state as it was after it's parent job. So, if some job is triggered by other job, then I thought it would be good if there was a way for this triggered job to inherit the workspace of that triggering job. So that people would not have to do additional "magic" specific to hudson (and maintain additional hudson-specific targets) in their build scripts. Hudson does know that a given job is triggered by another job and most likely it is going to use the artifacts that other job produces. So I thought it would be nice if those could be provided to these subsequent jobs automatically. As a user that does not know much about hudson internals, I cannot suggest how to implement it. I thought one way (the basic one) it could be done is what you showed in your ASCII diagram. Maybe other possible way could be to have a mechanism to push a snapshot of the workspace from a job to other jobs it triggers (potentially to other machines if those jobs run on other systems). I guess this would be least limiting when it comes to parallel execution.

          A note to myself — recent introduction of Resource interface can be used as a
          lock mechanism for multiple jobs to use the same workspace.

          This is convenient for some use cases, like occasionally running lengthy-task
          (like "mvn site") where normally a quick CI build runs.

          Kohsuke Kawaguchi added a comment - A note to myself — recent introduction of Resource interface can be used as a lock mechanism for multiple jobs to use the same workspace. This is convenient for some use cases, like occasionally running lengthy-task (like "mvn site") where normally a quick CI build runs.

          akostadinov added a comment -

          I want to mention the case where workspace is on NFS.

          By many of the jobs we lose most of the time checking out the sources so being
          able to share workspace saves considerable amount time.

          That is in the case of a multiple configuration project or the job can be
          separated into few tasks.

          I think that by being able to mark arbitrary jobs as sequential that will work
          for us.

          akostadinov added a comment - I want to mention the case where workspace is on NFS. By many of the jobs we lose most of the time checking out the sources so being able to share workspace saves considerable amount time. That is in the case of a multiple configuration project or the job can be separated into few tasks. I think that by being able to mark arbitrary jobs as sequential that will work for us.

          akostadinov added a comment -

          adding myself to the cc

          akostadinov added a comment - adding myself to the cc

          this is a commonly asked feature, and this is from my colleague, so bumping up
          the priority a bit.

          See another recent discussion at
          http://www.nabble.com/Efficiently-using-Hudson-tf4649823.html

          Kohsuke Kawaguchi added a comment - this is a commonly asked feature, and this is from my colleague, so bumping up the priority a bit. See another recent discussion at http://www.nabble.com/Efficiently-using-Hudson-tf4649823.html

          Kohsuke Kawaguchi added a comment - Another recent discussion about this: http://www.nabble.com/Multiple-jobs-per-project---to14341510.html

              • Issue 398 has been marked as a duplicate of this issue. ***

          Kohsuke Kawaguchi added a comment - Issue 398 has been marked as a duplicate of this issue. ***

          hoshposh added a comment -

          Adding myself on the CC list for this issue.

          hoshposh added a comment - Adding myself on the CC list for this issue.

          Lloyd Chang added a comment -

          Adding myself as a cc

          Lloyd Chang added a comment - Adding myself as a cc

          adding myself as cc

          jameslivingston added a comment - adding myself as cc

          pgweiss added a comment -

          What would be sufficient for me would not necessarily be the workspace of the
          job that triggers me, but rather, the build artifacts. Even better would be if
          I could pass the build artifact URL's as parameters to the triggered job.

          pgweiss added a comment - What would be sufficient for me would not necessarily be the workspace of the job that triggers me, but rather, the build artifacts. Even better would be if I could pass the build artifact URL's as parameters to the triggered job.

          fhoare added a comment -

          adding myself as cc

          fhoare added a comment - adding myself as cc

          skaze added a comment -

          adding myself to cc

          skaze added a comment - adding myself to cc

          skaze added a comment -

          i would to describe another scenario where having the ability to share a
          workspace between jobs would be useful.

          Note i want to keep using the built-in Maven 2 project type and not have to drop
          to a 'free-style software project' and use shell scripting to get this job done.

          Simply put I need to run maven more than once on the project to finish my build.
          The build needs to do the following:

          • Run a full build and generate results for all projects (i.e. "mvn install -fae
            -Dmaven.test.failure.ignore=true"), yes ignore all failures.
          • Run some data collection mojos across the now fully built project hierarchy
            and then produce some reports (i.e. "mvn site -fae
            -Dmaven.test.failure.ignore=true site:deploy")

          Firstly a normal build is run and a number of code tools (checkstyle, pmd,
          clover, cobertura, findbugs, javadoc, jxr) do their thing, these are my own
          modified versions of the plugins that have all been changed to decouple their
          analysis functionality from their reporting functionality. The reason we have
          decoupled analysis from reporting is for the next stage.

          The second invocation of mvn runs our custom plugins again (their bound to site
          and pre-site phases), but this time in 'aggregate' mode. These 'aggregate' mojos
          have the job of looking through the module hierarchy and aggregating (pulling up
          and merging) the various result files they find (e.g. checkstyle-results.xml) up
          the hierarchy, producing new analysis files as they go (i.e. for the parent pom
          projects). The result of this is that at every level of the project hierarchy
          one can see the aggregated results from JXR, javadoc, checkstyle, pmd, clover,
          tests, etc.

          Note, this is very different to the standard maven reporting plugin's aggregate
          feature (i.e. <javadoc><aggregate>=true), when one uses these standard plugins
          in aggregate mode only the top-most project gets the aggregated report, all the
          other modules in the hierarchy do not generate any report at all.

          Once the pre-site phase has run, the standard site plugin kicks. This in turn
          runs all our reporting mojos and we get some rather lovely fully aggregated
          multi-tier reports for javadoc, jxr, checkstyle, et al.

          Was of doing this kind of thing with Hudson.

          1) Prefered option? - Be able to define multiple build actions, i.e. dont chain
          jobs but chain multiple build commands. a bit like the batch task functionality.
          Thus you are in the same workspace as its the same job. Note this would need
          some kind of 'continue if failed' functionality. I can see how this stretches
          the hudson job model a bit so may not be viable for design reasons.

          2) Call batch task on same project when main build finishes - this is what i'm
          trying to do at the moment but unlike the 'Build other projects' post-build
          action, the 'Invoke batch tasks of other projects' action does not have a
          'Trigger even if the build is unstable' option which would allow us to only call
          the batch task if the main build was successful (i.e. if the first mvn run fails
          i do not want to call the batch task).

          3) use seprate jobs, one that does the initial mvn build and then another, that
          is downstream from that, shartes its workspace, and does the data aggregation
          and site building. I think this one fits most easily into the hudson model...

          Comments welcomed...

          John

          skaze added a comment - i would to describe another scenario where having the ability to share a workspace between jobs would be useful. Note i want to keep using the built-in Maven 2 project type and not have to drop to a 'free-style software project' and use shell scripting to get this job done. Simply put I need to run maven more than once on the project to finish my build. The build needs to do the following: Run a full build and generate results for all projects (i.e. "mvn install -fae -Dmaven.test.failure.ignore=true"), yes ignore all failures. Run some data collection mojos across the now fully built project hierarchy and then produce some reports (i.e. "mvn site -fae -Dmaven.test.failure.ignore=true site:deploy") Firstly a normal build is run and a number of code tools (checkstyle, pmd, clover, cobertura, findbugs, javadoc, jxr) do their thing, these are my own modified versions of the plugins that have all been changed to decouple their analysis functionality from their reporting functionality. The reason we have decoupled analysis from reporting is for the next stage. The second invocation of mvn runs our custom plugins again (their bound to site and pre-site phases), but this time in 'aggregate' mode. These 'aggregate' mojos have the job of looking through the module hierarchy and aggregating (pulling up and merging) the various result files they find (e.g. checkstyle-results.xml) up the hierarchy, producing new analysis files as they go (i.e. for the parent pom projects). The result of this is that at every level of the project hierarchy one can see the aggregated results from JXR, javadoc, checkstyle, pmd, clover, tests, etc. Note, this is very different to the standard maven reporting plugin's aggregate feature (i.e. <javadoc><aggregate>=true), when one uses these standard plugins in aggregate mode only the top-most project gets the aggregated report, all the other modules in the hierarchy do not generate any report at all. Once the pre-site phase has run, the standard site plugin kicks. This in turn runs all our reporting mojos and we get some rather lovely fully aggregated multi-tier reports for javadoc, jxr, checkstyle, et al. Was of doing this kind of thing with Hudson. 1) Prefered option? - Be able to define multiple build actions, i.e. dont chain jobs but chain multiple build commands. a bit like the batch task functionality. Thus you are in the same workspace as its the same job. Note this would need some kind of 'continue if failed' functionality. I can see how this stretches the hudson job model a bit so may not be viable for design reasons. 2) Call batch task on same project when main build finishes - this is what i'm trying to do at the moment but unlike the 'Build other projects' post-build action, the 'Invoke batch tasks of other projects' action does not have a 'Trigger even if the build is unstable' option which would allow us to only call the batch task if the main build was successful (i.e. if the first mvn run fails i do not want to call the batch task). 3) use seprate jobs, one that does the initial mvn build and then another, that is downstream from that, shartes its workspace, and does the data aggregation and site building. I think this one fits most easily into the hudson model... Comments welcomed... John

          chrishapke added a comment -

          My understanding of this issue would be, that the child job would get a copy of
          the parent workspace (when the parent job successfully finished). This would
          allow parallel/overlapping execution of the jobs as proposed e.g. in
          http://hudson.gotdns.com/wiki/display/JENKINS/Splitting+a+big+job+into+smaller+jobs .

          I would suppose that making this copy is much faster then checking out and
          rebuilding the project in the child(s) again. Mayby better than copying:
          creating an archive if the parent job successfully finished and extracting the
          archive when the child(s) start(s) - so the children always get a valid version,
          at any time.

          chrishapke added a comment - My understanding of this issue would be, that the child job would get a copy of the parent workspace (when the parent job successfully finished). This would allow parallel/overlapping execution of the jobs as proposed e.g. in http://hudson.gotdns.com/wiki/display/JENKINS/Splitting+a+big+job+into+smaller+jobs . I would suppose that making this copy is much faster then checking out and rebuilding the project in the child(s) again. Mayby better than copying: creating an archive if the parent job successfully finished and extracting the archive when the child(s) start(s) - so the children always get a valid version, at any time.

          Tom Jordan added a comment -

          adding self in CC

          Tom Jordan added a comment - adding self in CC

          kaosko added a comment -

          adding myself to cc

          kaosko added a comment - adding myself to cc

          One additional use case for this proposal:

          I have a project that checks out from a subversion repository if anything has
          changed. That is the only thing the project does. If I go to the project page
          and look at "Recent Changes", I can see what changes were checked in to the
          repository since the last checkout.

          The project then triggers a downstream project which does a build and various
          tests. If the build fails I would like Hudson to email all everyone who
          committed changes in the last checkout. This is easy to do if the build/test is
          done in the same project as the checkout, but does not work if the checkout and
          build are separate projects (naturally enough, since the build project does not
          know what changed).

          It would be useful if the ability to share a worksapce included the ability to
          pass changes to downstream projects.

          Matthew Webber added a comment - One additional use case for this proposal: I have a project that checks out from a subversion repository if anything has changed. That is the only thing the project does. If I go to the project page and look at "Recent Changes", I can see what changes were checked in to the repository since the last checkout. The project then triggers a downstream project which does a build and various tests. If the build fails I would like Hudson to email all everyone who committed changes in the last checkout. This is easy to do if the build/test is done in the same project as the checkout, but does not work if the checkout and build are separate projects (naturally enough, since the build project does not know what changed). It would be useful if the ability to share a worksapce included the ability to pass changes to downstream projects.

          aberrant80 added a comment -

          Since it's close to hitting the 2-year mark since this issue was created, is it
          safe to assume that there are no immediate plans to resolve this?

          I think many of the watchers and users concerned with this issue is really
          nothing more than HOW to split a job into "build" and "test". A concrete
          example of how to pass things from one job to another job is very much
          appreciated. All I can google up is people saying that a heavy job SHOULD be
          split, WITHOUT actually being clear on HOW to properly do what they're recommending.

          aberrant80 added a comment - Since it's close to hitting the 2-year mark since this issue was created, is it safe to assume that there are no immediate plans to resolve this? I think many of the watchers and users concerned with this issue is really nothing more than HOW to split a job into "build" and "test". A concrete example of how to pass things from one job to another job is very much appreciated. All I can google up is people saying that a heavy job SHOULD be split, WITHOUT actually being clear on HOW to properly do what they're recommending.

          mdonohue added a comment -

          There is a wiki page which explains how to split up a large job:
          http://wiki.jenkins-ci.org/display/JENKINS/Splitting+a+big+job+into+smaller+jobs

          mdonohue added a comment - There is a wiki page which explains how to split up a large job: http://wiki.jenkins-ci.org/display/JENKINS/Splitting+a+big+job+into+smaller+jobs

          mdonohue added a comment -
              • Issue 3977 has been marked as a duplicate of this issue. ***

          mdonohue added a comment - Issue 3977 has been marked as a duplicate of this issue. ***

          jhm added a comment -

          When using only one node you could configure the upstream projects to use a
          fixed workspace (the one of the base project; configure job > extended project
          settings > configure working directory)

          jhm added a comment - When using only one node you could configure the upstream projects to use a fixed workspace (the one of the base project; configure job > extended project settings > configure working directory)

          hdalmeid added a comment -

          Added myself to CC

          hdalmeid added a comment - Added myself to CC

          adding myself as cc

          peter_schuetze added a comment - adding myself as cc

          Andrew Bayer added a comment -

          A thought:

          A new SCM plugin that takes another project as the SCM source - it would keep track of the last build of that other project for polling purposes, and would copy in that other project's workspace for the checkout (and inherit the upstream project's changelog too). It's not perfect - duplicating the workspace means using more space used, for example. But it seems like it might work - thoughts?

          Andrew Bayer added a comment - A thought: A new SCM plugin that takes another project as the SCM source - it would keep track of the last build of that other project for polling purposes, and would copy in that other project's workspace for the checkout (and inherit the upstream project's changelog too). It's not perfect - duplicating the workspace means using more space used, for example. But it seems like it might work - thoughts?

          akostadinov added a comment -

          SCM sounds good. Although perfect will be if it is possible to force the depending jobs to execute on the same slave so they can use the exact same workspace. This appears complicated to me but probably can be achieved as a separate effort.

          I mean there could be a separate plug-in to force dependent jobs execute on the same slave as a given job. And the custom workspace plug-in could be used to set the workspace of the desired job. So what you suggest would probably be enough.

          akostadinov added a comment - SCM sounds good. Although perfect will be if it is possible to force the depending jobs to execute on the same slave so they can use the exact same workspace. This appears complicated to me but probably can be achieved as a separate effort. I mean there could be a separate plug-in to force dependent jobs execute on the same slave as a given job. And the custom workspace plug-in could be used to set the workspace of the desired job. So what you suggest would probably be enough.

          mdonohue added a comment -

          A workspace SCM solves part of the problem, but I had assumed those asking for a shared workspace wanted a single space that handles both read and write activity. The workspace SCM solves the read problem, but doesn't address the issue of shared writes - your artifacts still get spread around in each individual job workspace.

          mdonohue added a comment - A workspace SCM solves part of the problem, but I had assumed those asking for a shared workspace wanted a single space that handles both read and write activity. The workspace SCM solves the read problem, but doesn't address the issue of shared writes - your artifacts still get spread around in each individual job workspace.

          akostadinov added a comment -

          if you execute on a single slave, then you can change workspace. If you're executing on different slaves, you can point the scm plug-in to the last build in the sequence (if builds are one after another).

          If multiple builds should use virtually the same workspace and execute in parallel, then I'm not sure how the individual results could be merged into a single location. If the concern is artifacts then probably a merge back mask could be needed so the new plug-in knows which files to return into the original workspace?

          akostadinov added a comment - if you execute on a single slave, then you can change workspace. If you're executing on different slaves, you can point the scm plug-in to the last build in the sequence (if builds are one after another). If multiple builds should use virtually the same workspace and execute in parallel, then I'm not sure how the individual results could be merged into a single location. If the concern is artifacts then probably a merge back mask could be needed so the new plug-in knows which files to return into the original workspace?

          Andrew Bayer added a comment -

          Yeah, the problem with actually sharing a workspace is doing so across slaves - unless you set up a network share available to all slaves and put the workspace there (which, actually, is how we do things in my setup, though we're not using one workspace in multiple jobs), I just don't see an elegant way to let both upstream and downstream jobs write to the same workspace. Most of the use cases I'm seeing mentioned here are linear - A runs, then B runs in the same workspace as A so it can reuse the results of A's build, etc. It doesn't handle the circular use case, but, frankly, that's a bit of a crazy use case. For that fairly rare and fairly strange use case, I think the right approach would be to lock both jobs to the same slave and then use a custom workspace - sure, that creates some limitations, but I think they're reasonable.

          I've still got some design kinks to work out - what to do if B tries to run but there's no workspace of A available and what to do if B tries to run while A is already running most notably. I'm wondering whether it might make sense to have a publisher that will keep an archive of the most recent completed build's workspace of A, and then B pulls down that archived workspace. We wouldn't have to worry about concurrency collisions with that approach.

          Andrew Bayer added a comment - Yeah, the problem with actually sharing a workspace is doing so across slaves - unless you set up a network share available to all slaves and put the workspace there (which, actually, is how we do things in my setup, though we're not using one workspace in multiple jobs), I just don't see an elegant way to let both upstream and downstream jobs write to the same workspace. Most of the use cases I'm seeing mentioned here are linear - A runs, then B runs in the same workspace as A so it can reuse the results of A's build, etc. It doesn't handle the circular use case, but, frankly, that's a bit of a crazy use case. For that fairly rare and fairly strange use case, I think the right approach would be to lock both jobs to the same slave and then use a custom workspace - sure, that creates some limitations, but I think they're reasonable. I've still got some design kinks to work out - what to do if B tries to run but there's no workspace of A available and what to do if B tries to run while A is already running most notably. I'm wondering whether it might make sense to have a publisher that will keep an archive of the most recent completed build's workspace of A, and then B pulls down that archived workspace. We wouldn't have to worry about concurrency collisions with that approach.

          I don't see any use in having two jobs sharing a workspace but only one can write to this workspace. Than I can do that same stuff already with custom workspace and the locks-and-latches plugin or even simpler have one job do all the work.

          For me the idea of creating two jobs, is that they can run at the same time. Which actually means that they have to different versions of the same workspace.

          You also have to be careful with holding only the most recent workspace. Because what happens if job A is done and updates the most recent workspace while job is still working on the previous one? This must be supported.

          peter_schuetze added a comment - I don't see any use in having two jobs sharing a workspace but only one can write to this workspace. Than I can do that same stuff already with custom workspace and the locks-and-latches plugin or even simpler have one job do all the work. For me the idea of creating two jobs, is that they can run at the same time. Which actually means that they have to different versions of the same workspace. You also have to be careful with holding only the most recent workspace. Because what happens if job A is done and updates the most recent workspace while job is still working on the previous one? This must be supported.

          akostadinov added a comment -

          IMHO keeping archive will be more error prone. If B wants to run and there is no A workspace, then A should be executed prior A and B blocked till A completes. If A tries to run during B's checkout, then it should be blocked until checkout finishes.

          The hard part here would how to return B in the queue so A can execute. I'm not much into the code but probably the SCM plug-in in B could:
          1. start A
          2. queue B
          3. fail current build but block sending notifications
          4. remove the failed build
          5. change last build number -1

          Or something along these lines?

          akostadinov added a comment - IMHO keeping archive will be more error prone. If B wants to run and there is no A workspace, then A should be executed prior A and B blocked till A completes. If A tries to run during B's checkout, then it should be blocked until checkout finishes. The hard part here would how to return B in the queue so A can execute. I'm not much into the code but probably the SCM plug-in in B could: 1. start A 2. queue B 3. fail current build but block sending notifications 4. remove the failed build 5. change last build number -1 Or something along these lines?

          mdonohue added a comment -

          Another interpretation of this would be a nice UI that does the locks-and-latches plus custom workspace setup properly for a group of jobs with a single setting, instead of managing these things on each job, and leaving open the possibility of configuration skew.

          Even in the read-only case, a common use-case appears to be sharing the IO cost among jobs by having a common parent do checkout or some other IO intensive task, and then the other jobs can just read that data without having to copy it anywhere, since it lives in a shared workspace.

          But this use-case is in contrast to the original description, so I think the workspace-copy idea is in line with what was originally described. We'll have to have new bugs for the other use-cases.

          mdonohue added a comment - Another interpretation of this would be a nice UI that does the locks-and-latches plus custom workspace setup properly for a group of jobs with a single setting, instead of managing these things on each job, and leaving open the possibility of configuration skew. Even in the read-only case, a common use-case appears to be sharing the IO cost among jobs by having a common parent do checkout or some other IO intensive task, and then the other jobs can just read that data without having to copy it anywhere, since it lives in a shared workspace. But this use-case is in contrast to the original description, so I think the workspace-copy idea is in line with what was originally described. We'll have to have new bugs for the other use-cases.

          @akostadinov:
          Where is the advantage to configure

          • either one big job
          • or the use of custom workspace + locks-and-latches

          I think the main use case that needs to be supported is described on the following page:
          http://wiki.jenkins-ci.org/display/JENKINS/Splitting+a+big+job+into+smaller+jobs

          peter_schuetze added a comment - @akostadinov: Where is the advantage to configure either one big job or the use of custom workspace + locks-and-latches I think the main use case that needs to be supported is described on the following page: http://wiki.jenkins-ci.org/display/JENKINS/Splitting+a+big+job+into+smaller+jobs

          Andrew Bayer added a comment -

          @akostadinov: I'm leaning towards the archiving approach because it better fits the SCM model - you're getting the controlled workspace of the previous build of job A as the workspace for a build of job B. I want to make sure this plugin can handle these use cases cleanly:

          • Job B uses Job A's workspace, but both Job A and Job B have concurrent builds enabled.
          • Jobs B, C, and D use Job A's workspace, and all need to get identical copies of Job A's workspace.
          • Job B uses Job A's workspace, but Job B doesn't automatically run right after Job A finishes - it runs at a later point in time, when kicked off manually, and by that time, Job A's workspace has been cleaned out, or the slave Job A ran on is no longer available, etc.

          Archiving Job A's workspace works in all of those use cases, and keeps the overall picture smoother. There'll be a publisher extension to turn on in Job A to archive the workspace, and an SCM extension to use in Job B, which will let you choose a parent project to use as the SCM source from a list of jobs with the publisher enabled. The SCM extension will handle polling by checking to see if there's a new archive of the parent project workspace (and that the parent project isn't in the process of writing that archive, so we don't get a partial archive), and it'll handle checkout by pulling down the archive and expanding it. I'm not yet sure exactly how the inherited changelogs will work, since this plugin won't know or care what the parent project's SCM is and the changelog is determined by the SCM, but I'll figure that out. =)

          This doesn't solve every case, this isn't a perfect solution, but I think it fits the most common use case requested - run a build, and then kick off build(s) of 1..n additional projects to run tests against the results of the first build. So I'm gonna implement it - and me implementing it doesn't mean someone else can't write a plugin to fit another use case. =)

          Andrew Bayer added a comment - @akostadinov: I'm leaning towards the archiving approach because it better fits the SCM model - you're getting the controlled workspace of the previous build of job A as the workspace for a build of job B. I want to make sure this plugin can handle these use cases cleanly: Job B uses Job A's workspace, but both Job A and Job B have concurrent builds enabled. Jobs B, C, and D use Job A's workspace, and all need to get identical copies of Job A's workspace. Job B uses Job A's workspace, but Job B doesn't automatically run right after Job A finishes - it runs at a later point in time, when kicked off manually, and by that time, Job A's workspace has been cleaned out, or the slave Job A ran on is no longer available, etc. Archiving Job A's workspace works in all of those use cases, and keeps the overall picture smoother. There'll be a publisher extension to turn on in Job A to archive the workspace, and an SCM extension to use in Job B, which will let you choose a parent project to use as the SCM source from a list of jobs with the publisher enabled. The SCM extension will handle polling by checking to see if there's a new archive of the parent project workspace (and that the parent project isn't in the process of writing that archive, so we don't get a partial archive), and it'll handle checkout by pulling down the archive and expanding it. I'm not yet sure exactly how the inherited changelogs will work, since this plugin won't know or care what the parent project's SCM is and the changelog is determined by the SCM, but I'll figure that out. =) This doesn't solve every case, this isn't a perfect solution, but I think it fits the most common use case requested - run a build, and then kick off build(s) of 1..n additional projects to run tests against the results of the first build. So I'm gonna implement it - and me implementing it doesn't mean someone else can't write a plugin to fit another use case. =)

          akostadinov added a comment -

          @peter_schuetze - I'm not sure sharing of workspaces is exactly what's needed to support this use case. Actually it seems sharing has a pretty broad meaning...

          @abayer - how will your archiving be better than setting A to archive whole workspace as artifacts and then tell B to download these? I've always seen sharing of workspaces to imply lower hudson IO operations and less disk utilization.

          I fully agree though that all of the proposed solutions solve some problems and that everyone having the knowledge and time to implement any of them is welcome

          Regards.

          akostadinov added a comment - @peter_schuetze - I'm not sure sharing of workspaces is exactly what's needed to support this use case. Actually it seems sharing has a pretty broad meaning... @abayer - how will your archiving be better than setting A to archive whole workspace as artifacts and then tell B to download these? I've always seen sharing of workspaces to imply lower hudson IO operations and less disk utilization. I fully agree though that all of the proposed solutions solve some problems and that everyone having the knowledge and time to implement any of them is welcome Regards.

          mdonohue added a comment -

          Given the direction this is going, I think a better issue summary is "Clone workspace between jobs"

          mdonohue added a comment - Given the direction this is going, I think a better issue summary is "Clone workspace between jobs"

          Andrew Bayer added a comment -

          @akostadinov - The biggest difference is that it'll be formalized - there won't be a need to do the zipping/archiving/downloading/extracting manually, and there'll be an automatically defined relationship between the parent build and the child build. Also, we'll be able to have the child build inherit the parent build's SCM changelog. You're right in that this won't do much for lowering IO/disk space usage, but it will answer most of the use cases mentioned in earlier comments here, so that's what I'm going after.

          Andrew Bayer added a comment - @akostadinov - The biggest difference is that it'll be formalized - there won't be a need to do the zipping/archiving/downloading/extracting manually, and there'll be an automatically defined relationship between the parent build and the child build. Also, we'll be able to have the child build inherit the parent build's SCM changelog. You're right in that this won't do much for lowering IO/disk space usage, but it will answer most of the use cases mentioned in earlier comments here, so that's what I'm going after.

          Andrew Bayer added a comment -

          Hmm - I now remember looking at this vaguely before and stumbling across hudson.fsp.WorkspaceSnapshotSCM. But I still haven't seen anything actually using that, so I'm not sure whether to start from scratch or not.

          Andrew Bayer added a comment - Hmm - I now remember looking at this vaguely before and stumbling across hudson.fsp.WorkspaceSnapshotSCM. But I still haven't seen anything actually using that, so I'm not sure whether to start from scratch or not.

          The main thing which is missing in hudson to resolve this issue (and many others) is to be able to define several "builds" in a job like in Continuum for example : http://continuum.apache.org/docs/1.3.5/user_guides/managing_builddef/builddefProject.html
          The idea is to reuse SCM & Workspace for several builds. For each build we can define triggers and builders.
          In hudson the ideal should be to be able in one job to have several sets of Build (Triggers + Settings + Environment + Post build)
          This could be backward compatible by proposing by default only one set.
          I know this is an important change because it touches the code model.

          Arnaud Héritier added a comment - The main thing which is missing in hudson to resolve this issue (and many others) is to be able to define several "builds" in a job like in Continuum for example : http://continuum.apache.org/docs/1.3.5/user_guides/managing_builddef/builddefProject.html The idea is to reuse SCM & Workspace for several builds. For each build we can define triggers and builders. In hudson the ideal should be to be able in one job to have several sets of Build (Triggers + Settings + Environment + Post build) This could be backward compatible by proposing by default only one set. I know this is an important change because it touches the code model.

          Andrew Bayer added a comment -

          The challenge with that model is the slave situation - since each slave has their own workspace area, how do we have several jobs using the same workspace across multiple slaves? I'm by no means averse to the kind of thing you're talking about, I'm just not sure at all how to do it within Hudson.

          Andrew Bayer added a comment - The challenge with that model is the slave situation - since each slave has their own workspace area, how do we have several jobs using the same workspace across multiple slaves? I'm by no means averse to the kind of thing you're talking about, I'm just not sure at all how to do it within Hudson.

          mdonohue added a comment -

          @aheritier

          I believe what you describe is a distinct enhancement request from JENKINS-682 (this issue)
          Careful reading of the original description and the early comments indicates a desire to automatically clone the entire workspace to downstream jobs. Sharing a single workspace doesn't yet have an issue filed, that I'm aware of.

          mdonohue added a comment - @aheritier I believe what you describe is a distinct enhancement request from JENKINS-682 (this issue) Careful reading of the original description and the early comments indicates a desire to automatically clone the entire workspace to downstream jobs. Sharing a single workspace doesn't yet have an issue filed, that I'm aware of.

          Romain Seguy added a comment -

          In my company we split jobs in several task-oriented jobs such as 01-SCM, 02-Build, 03-UnitTesting, 04-Deploy, 05-RegressionTesting, etc.
          Each of these jobs highly depends on the result of the previous one (+ generally the SCM one), that is, on the content of the workspace of other jobs. We currently achieve this by doing some shell scripting, taking a look at the workspace path to see if we're running on a slave or the master, to copy files from one job to another. This means all the jobs must be assigned to the same node (or otherwise you have to use the Copy To Slave plugin)

          My idea some months ago was to create some kind of $WORKSPACE[<job name>] variable which would ease the pointing to upstream/downstream job workspaces. This does not address the issue regarding "one commonly shared" workspace, nor the node assignement one, but I think it can be a first easy & useful solution to set-up.

          We can also think about a new out-of-the-box build step (or a wrapper, etc.) "Grab from other job", with job name and includes/excludes (Ant style).

          Romain Seguy added a comment - In my company we split jobs in several task-oriented jobs such as 01-SCM, 02-Build, 03-UnitTesting, 04-Deploy, 05-RegressionTesting, etc. Each of these jobs highly depends on the result of the previous one (+ generally the SCM one), that is, on the content of the workspace of other jobs. We currently achieve this by doing some shell scripting, taking a look at the workspace path to see if we're running on a slave or the master, to copy files from one job to another. This means all the jobs must be assigned to the same node (or otherwise you have to use the Copy To Slave plugin) My idea some months ago was to create some kind of $WORKSPACE [<job name>] variable which would ease the pointing to upstream/downstream job workspaces. This does not address the issue regarding "one commonly shared" workspace, nor the node assignement one, but I think it can be a first easy & useful solution to set-up. We can also think about a new out-of-the-box build step (or a wrapper, etc.) "Grab from other job", with job name and includes/excludes (Ant style).

          mmatula added a comment - - edited

          @mdonohue

          I did mean sharing the workspace when I filed this issue. Whether it is done by ensuring these related jobs run on the same slave or by cloning the exact state of the workspace to other slave that runs the subsequent job is an implementation detail.

          Or maybe not - I see the difference is that in case of cloning the workspace for the previous job would not be affected by the subsequent one - which may be desirable - but not for the use-case I had in mind. When filing this issue I was mostly concerned about not being able to easily share the workspace of a job with subsequent jobs so they could use the results of the previous job without doing additional hudson-specific magic in my build scripts.

          mmatula added a comment - - edited @mdonohue I did mean sharing the workspace when I filed this issue. Whether it is done by ensuring these related jobs run on the same slave or by cloning the exact state of the workspace to other slave that runs the subsequent job is an implementation detail. Or maybe not - I see the difference is that in case of cloning the workspace for the previous job would not be affected by the subsequent one - which may be desirable - but not for the use-case I had in mind. When filing this issue I was mostly concerned about not being able to easily share the workspace of a job with subsequent jobs so they could use the results of the previous job without doing additional hudson-specific magic in my build scripts.

          Andrew Bayer added a comment -

          @mmatula - does the approach I proposed (archive the workspace - or a selected subset thereof - from one job's build and then one or more other jobs taking that archive and exploding it as the basis of their workspace) fit your request? I want to be sure before I really dive into coding this up. =)

          Andrew Bayer added a comment - @mmatula - does the approach I proposed (archive the workspace - or a selected subset thereof - from one job's build and then one or more other jobs taking that archive and exploding it as the basis of their workspace) fit your request? I want to be sure before I really dive into coding this up. =)

          mmatula added a comment -

          @abayer - Yes, that looks good . Btw., re changelog - I guess it should just inherit the parent job's changelog.

          mmatula added a comment - @abayer - Yes, that looks good . Btw., re changelog - I guess it should just inherit the parent job's changelog.

          Andrew Bayer added a comment -

          FYI, I've got this working (more or less - still a couple things to test and docs/tests to write) - it's up at http://github.com/abayer/hudson-clone-workspace-scm-plugin, though it'll need 1.350 to be released before I can release it.

          Andrew Bayer added a comment - FYI, I've got this working (more or less - still a couple things to test and docs/tests to write) - it's up at http://github.com/abayer/hudson-clone-workspace-scm-plugin , though it'll need 1.350 to be released before I can release it.

          Andrew Bayer added a comment -

          I've released the clone-workspace-scm plugin, which contains both a publisher for archiving workspaces and an SCM for, well, using those archived workspaces as SCM sources. It does need Hudson 1.350 or later - it'll blow up with earlier versions. I've got a wiki page up at http://wiki.jenkins-ci.org/display/JENKINS/Clone+Workspace+SCM+Plugin, but I need to do more work on the documentation. And as I mentioned before, the actual source is at GitHub (http://github.com/abayer/hudson-clone-workspace-scm-plugin) rather than the Hudson SVN repo.

          The plugin should show up in the Update Center within the next 6-12 hours.

          Andrew Bayer added a comment - I've released the clone-workspace-scm plugin, which contains both a publisher for archiving workspaces and an SCM for, well, using those archived workspaces as SCM sources. It does need Hudson 1.350 or later - it'll blow up with earlier versions. I've got a wiki page up at http://wiki.jenkins-ci.org/display/JENKINS/Clone+Workspace+SCM+Plugin , but I need to do more work on the documentation. And as I mentioned before, the actual source is at GitHub ( http://github.com/abayer/hudson-clone-workspace-scm-plugin ) rather than the Hudson SVN repo. The plugin should show up in the Update Center within the next 6-12 hours.

          erwan_q added a comment -

          This plugin is compatible with the matrix projects? If not it is possible?

          erwan_q added a comment - This plugin is compatible with the matrix projects? If not it is possible?

          Andrew Bayer added a comment -

          Not sure what you mean - matrix projects should be able to use the SCM part of it, at the very least. I'm not 100% sure how the publisher/archiver would work with a matrix project, though.

          Andrew Bayer added a comment - Not sure what you mean - matrix projects should be able to use the SCM part of it, at the very least. I'm not 100% sure how the publisher/archiver would work with a matrix project, though.

            abayer Andrew Bayer
            mmatula mmatula
            Votes:
            42 Vote for this issue
            Watchers:
            30 Start watching this issue

              Created:
              Updated:
              Resolved: