Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-41177

Identical job configuration can have different config.xml based on edit method.

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • core
    • None
    • Jenkins 2.14 on Windows Server 2012, Job Configuration History Plugin 2.15, Groovy 2.4, java 8
      Jenkins 2.46.1 on Ubuntu 16.04

      When using the CLI or Rest API, Jenkins is saving the config.xml a different xml than I am providing. Specifically it is removing the first EOL, adding carriage returns, and replacing empty xml tags with self closing ones.

      I manage Jenkins instances for my organization. Each has a "test" instance that is a mirror of the prod one. Developers can experiment with things like changes to jobs and interactions with new plugins on the test version and we can port them to the prod one where configure permissions are disabled. For a first time copy the Job Import Plugin works but for edits to existing ones I have to either manually duplicate the changes or script something.

      I have tried this with both the CLI jar and groovy rest client. I get the config.xml from the test and prod servers. I perform a diff which lets me quickly see what the changes are and if approved will update the prod instance (job-update for CLI, POST for rest).

      If a job is edited in the Jenkins UI the config.xml is saved with newlines. When updated through one of the remote tools it is saved with carriage returns and newlines. So, if I get the xml I just updated and diff it to the one I just sent they will register as completely different.

      The config xmls retrieved will have carriage returns if the last modification was was from a remote update or just newlines if the last edit was in the UI.

      I have looked at the requests I am sending and they do not have carriage returns. I looked at the code for CLI saves and it is creating a file from the parsed xml model rather than the supplied xml string. It removes a line break between the xml preamble and root <project> tag, saves each line with a carriage return and writes out self-closing tags. [edit: last sentence is describing observed behaviour not something seen in the code]

      This messes up the Job Configuration History plugin as a one line change displays as if every line is changed. This also interferes with my local diff before sending the update.

      The two images attached (missing-eol and exact-same-config) are from back to back runs of the update script, meaning the configurations are exactly the same. The config history plugin thinks the entire file has been changed. Even with a manual correction for the carriage returns, my local diff shows differences because of the tag differences.

      The third (ui-save-no-change) is from a manual save in the UI (click "configure", click "save" with no changes). With no content change, we see that the <project> tag is again on its own line and

      If I'm not mistaken about the cause, is it feasible to have Jenkins save the configuration consistently regardless of how the configuration has changed?

          [JENKINS-41177] Identical job configuration can have different config.xml based on edit method.

          Paul G added a comment -

          Thanks. I may be missing something then. It looks like Jenkins loses the formatting from CLI/REST data because the XML is parsed and written do disk based on the in-memory model rather than what was received through either API. It seems like Jenkins chooses different ways to write the xml files (or different in-memory representations persist differently) based on what's requesting the save (UI button click vs API update received) rather than the content of the save, rather than it being a case of the data received being different each time.

          Paul G added a comment - Thanks. I may be missing something then. It looks like Jenkins loses the formatting from CLI/REST data because the XML is parsed and written do disk based on the in-memory model rather than what was received through either API. It seems like Jenkins chooses different ways to write the xml files (or different in-memory representations persist differently) based on what's requesting the save (UI button click vs API update received) rather than the content of the save, rather than it being a case of the data received being different each time.

          Dirk Thomas added a comment -

          We are having the same problem with our Jenkins instance. Locally we work around it by normalizing the XML we receive from the master and showing the diff of the normalized data (https://github.com/ros-infrastructure/ros_buildfarm/blob/16bc85c4e312fb1975d061c6e749fb4af9e3c4ff/ros_buildfarm/jenkins.py#L229-L249). But the Job Configuration History plugin shows many unnecessary changes which makes it more difficult to see the actual differences. Please see the attached picture which shows a huge diff with actually no changes at all.

          If this is considered "not important enough" to be addressed by the devs it would be great if someone could shed some light into why the output is different and provide some pointers to the involved code blocks. Maybe we can come up with a patch for it.

          Dirk Thomas added a comment - We are having the same problem with our Jenkins instance. Locally we work around it by normalizing the XML we receive from the master and showing the diff of the normalized data ( https://github.com/ros-infrastructure/ros_buildfarm/blob/16bc85c4e312fb1975d061c6e749fb4af9e3c4ff/ros_buildfarm/jenkins.py#L229-L249 ). But the Job Configuration History plugin shows many unnecessary changes which makes it more difficult to see the actual differences. Please see the attached picture which shows a huge diff with actually no changes at all. If this is considered "not important enough" to be addressed by the devs it would be great if someone could shed some light into why the output is different and provide some pointers to the involved code blocks. Maybe we can come up with a patch for it.

          Jesse Glick added a comment -

          It is intentional that the provided XML is parsed and the model resaved, so do not expect your formatting to survive intact. I do not know offhand why there would be any difference in formatting relative to doConfigSubmit-saved configurations, though. Possibly the Document object improperly retains some memory of the origin’s formatting when using CLI/REST uploads.

          Jesse Glick added a comment - It is intentional that the provided XML is parsed and the model resaved, so do not expect your formatting to survive intact. I do not know offhand why there would be any difference in formatting relative to doConfigSubmit -saved configurations, though. Possibly the Document object improperly retains some memory of the origin’s formatting when using CLI/REST uploads.

          Dirk Thomas added a comment -

          I don't mind if the server decides to "modify" the style of the xml before it is written to the config file. But I do mind that it deterministically uses two different approaches when editing the config using the web ui vs. using the remote api. That makes the job config history much less useful and clutters the diff.

          Dirk Thomas added a comment - I don't mind if the server decides to "modify" the style of the xml before it is written to the config file. But I do mind that it deterministically uses two different approaches when editing the config using the web ui vs. using the remote api. That makes the job config history much less useful and clutters the diff.

          Daniel Beck added a comment -

          This is why XML diff tools exist. I'd consider this a bug in Job Config History.

          Daniel Beck added a comment - This is why XML diff tools exist. I'd consider this a bug in Job Config History.

          Dirk Thomas added a comment -

          Imo the task of the Job Config History is to store a history of job configurations. The fact that Jenkins doesn't store the same configuration in a deterministic way would require to implement a more complex diff logic. Since Jenkins is already changing the formatting of the XML when it is received through e.g. the remote API I don't see a reason why it shouldn't store the content in a deterministic way. A user might even want to compare the current config file in the filesystem with a backup from some time ago. Why should that be more effort than absolutely necessary?

          Even if you disagree I am just asking for some pointers where these two ways of updating the config files are happening in the code. I am more than happy to work on a patch myself if nobody else considers that this is necessary / useful. So far I have found these locations which seem to be relevant. I just don't see yet where the two paths diverge:

          What I am not sure about is where the remote API config is coming in and which code it is taking until being written to the config file. Any pointers would help. Thanks.

          Dirk Thomas added a comment - Imo the task of the Job Config History is to store a history of job configurations. The fact that Jenkins doesn't store the same configuration in a deterministic way would require to implement a more complex diff logic. Since Jenkins is already changing the formatting of the XML when it is received through e.g. the remote API I don't see a reason why it shouldn't store the content in a deterministic way. A user might even want to compare the current config file in the filesystem with a backup from some time ago. Why should that be more effort than absolutely necessary? Even if you disagree I am just asking for some pointers where these two ways of updating the config files are happening in the code. I am more than happy to work on a patch myself if nobody else considers that this is necessary / useful. So far I have found these locations which seem to be relevant. I just don't see yet where the two paths diverge: https://github.com/jenkinsci/jenkins/blob/25ef87d2e243fbe7e6f582b08f00574b032a5fb9/core/src/main/java/hudson/model/AbstractItem.java#L609  is where an update coming from the web ui is being handled which gets passed down here: https://github.com/jenkinsci/jenkins/blob/25ef87d2e243fbe7e6f582b08f00574b032a5fb9/core/src/main/java/hudson/model/AbstractItem.java#L666 and finally the content is being written to the config file in https://github.com/jenkinsci/jenkins/blob/25ef87d2e243fbe7e6f582b08f00574b032a5fb9/core/src/main/java/jenkins/util/xml/XMLUtils.java#L209 What I am not sure about is where the remote API config is coming in and which code it is taking until being written to the config file. Any pointers would help. Thanks.

          Sam Peretz added a comment -

          I'm not sure what the remote API is used for, but we see these inexplicable differences in XML format when modifying the config from the web UI as compared to generating the config from a seed job (using the groovy DSL).  There are things like "<foo/>" vs. "<foo></foo>", and spacing differences, that render the diffs unusable for no apparent reason.

          Sam Peretz added a comment - I'm not sure what the remote API is used for, but we see these inexplicable differences in XML format when modifying the config from the web UI as compared to generating the config from a seed job (using the groovy DSL).  There are things like " <foo/> " vs. " <foo></foo> ", and spacing differences, that render the diffs unusable for no apparent reason.

          Dirk Thomas added a comment - - edited

          jglick danielbeck Could you please help us out with some pointers here? Even if you don't consider this to be a problem it would help others to look into this and come up with a patch.

          Dirk Thomas added a comment - - edited jglick danielbeck  Could you please help us out with some pointers here? Even if you don't consider this to be a problem it would help others to look into this and come up with a patch.

          Jesse Glick added a comment -

          I would conisder it to be a bug, albeit a very minor one. I do not have any tips offhand. You would need to fire up a debugger.

          Jesse Glick added a comment - I would conisder it to be a bug, albeit a very minor one. I do not have any tips offhand. You would need to fire up a debugger.

          Sorin Sbarnea added a comment -

          That's a core bug because the API should return a normalised and predictable XML format, regardless what tool was used to change the config.

          I find unreasonable and close to impossible to use an external XML comparison tool in order to see what really changed in the config.

          I see this as solvable in two ways: either make Jenkins reformat the config on save, or change the read API to perform the reformat on read. I am not sure which one is better, but any of them should address the issue of random fake changes in configuration, which can be a real issue if jobs are configured using configuration management tools (fake change history, so long that is impossible to stop real changes).

          Sorin Sbarnea added a comment - That's a core bug because the API should return a normalised and predictable XML format, regardless what tool was used to change the config. I find unreasonable and close to impossible to use an external XML comparison tool in order to see what really changed in the config. I see this as solvable in two ways: either make Jenkins reformat the config on save, or change the read API to perform the reformat on read. I am not sure which one is better, but any of them should address the issue of random fake changes in configuration, which can be a real issue if jobs are configured using configuration management tools (fake change history, so long that is impossible to stop real changes).

            manandbytes Mykola Nikishov
            sdp0et Paul G
            Votes:
            6 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated: