Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
From a useless Git Diff to a useful one (coderwall.com)
35 points by bitsweet on Oct 30, 2013 | hide | past | favorite | 31 comments


The problem is not the diff but the "and besides that".

You should make two different commits, changing indentation and changing a feature are different changes and should be tracked independently.


The real issue is that in practice it is often difficult to separate these two changes. You may correct code in one place which requires correcting code elsewhere in a cascading effect. If you limit your correction to just one area (in a single commit) then you introduce the possibility of breaking the build. The other issue is that if you fix all the whitespace stuff first you may find the owners are not willing to pull in your commit for various reasons. I often find in open source the smaller the change the more likely it is to be merged. It's a complicated area and unfortunately often the "two different commits" strategy doesn't encompass the full complexity.

Oh and another cool feature is worddiffs for when you're editing text.


Whitespace changes in an open source project are the kind of thing that depends on the code style enforced at the project level - certainly not something you would submit as a newcomer. Do you happen to have an example of what you're explaining here?


My editor is setup to automatically trim trailing spaces. When I edit files from other devs in the group, it's not uncommon for my commits to have whitespace removal in addition to code changes. I really don't think that's a big deal.

Now doing a code format to tweak indenting and brace positioning, etc., that should definitely be put into a separate commit.


The real issue is that in practice it is often difficult to separate these two changes.

??? No it's not


Indeed, when I come across empty lines, white space at the end of the lines etc. I remove it all, commit it and then start actually changing the code. White space in diffs is distracting from the actual changes.


Same goes for --ignore-space-at-eol

The better solution would be to teach people to stop being careless about trailing whitespace in their commits.


Well, if people used a decent editor to start with . . . but that's not a flame war I want to start (at least not today). I've been tempted to turn on checks in Jenkins that would automatically reject certain "broken" changesets: whitespace at EOL (easy enough to check by using the already provided .git/hooks/pre-commit.sample), reduction in number of unit/regression tests (can be overridden by hand, after conferring with at least one other developer), etc.

We have these awesomely powerful systems, with this awesomely powerful software, that oftentimes I'm depressed by people doing things by hand. Heck, if I was really not willing to deal with whitespace retards, I'd just make a hook on the server auto indent and whitespace cleanup all changes before commit.


Can you do that with a server hook? I've looked into this before and thought it had to be a hook in the client repository, as it would modify the commit hash. This is problematic because it requires everyone to add it to all of their repositories themselves.

I'd love to be wrong on this!


I could be wrong (I haven't delved too deeply into this), but I think it requires a code review system (such as Gerrit) hooked up to a CI (such as Jenkins). Then you have the code review system require the CI "sign off" (whatever criteria you please) on changesets before they can be approved by a human; no sign off by the CI, no merging the changeset. Of course, if the CI tests are dependent upon something in the repo, this can be defeated . . .

As usual, it comes down to trust: if you can't trust your developers to write good code or not subvert the system, you either need to educate them, or work with better developers (fire them, or quit and work somewhere better). The point at which you have to start enforcing things by machine fiat can be helpful to catch honest mistakes, but it can also be a sign of a dysfunctional team.


This is what you call a "boil the ocean" solution.

All that it takes is that every single user behaves exactly correctly. That's an indication of a poor design. Tools should naturally guide people to the correct usage and be moderately forgiving of minor examples of incorrect usage.


For whitespace fixes (making a mix of tabs vs spaces consistent), I agree with you.

For "I wrapped these 5 lines in a conditional and adjusted the indentation per code standard" in a language like C# I disagree with you, the first commit is ugly and leaves the codebase (temporarily) out of sync with codebase style, which may fail the build or fail to submit in the first place if you enforce such things via precommit hook or build process.

For the same operation in python, where semantics changes are attached to indentation level changes, the first commit is BROKEN, WRONG, AND BAD. The build not failing the first commit reflects poorly on the unit tests.

Increasing the readability of a diff is a noble goal. Splitting the diff in 'twain can help, but it's not a full answer. Neither is keeping commits as small as possible: simple renames can easily touch 30 files at a time, to say nothing of adding parameters or more complicated refactorings which must be committed together or leaves the codebase in a broken state.

Other tools include highlighting difference within a line instead of merely at a per-line level. Ignoring whitespace changes in languages which aren't whitespace sensitive can also help: I personally find the two highlighted lines of an if statement and it's end more informative than the seven highlighted lines which include it's functionally unchanged body that's merely changed scope.


I assume if you're using `git diff`, you're not comparing adjacent commits. And asking git for "diff these two commits but ignore this third one" is nontrivial.


Perhaps this should be a problem that can be done for you automatically per commit? (via a commit hook or something)


There's a tool that will split your git commits into whitespace/non-whitespace pairs:

https://github.com/dmnd/git-whiteout

(Can't vouch for it; I just assumed that someone would have automated this by now so I dug around for the tool)


Why? Is there a good justification for this other than that git by default is bad about showing diffs well?


Only reason why is to make pretty diffs. Now git has options so people don't have to deal with anal retentive details like that any more. The best option is of course to have a pre-commit hook that strips trailing whitespace from lines. The absolutely worst option is to force people to spend brain cycles dealing with that stuff.


Not sure if it does exactly the same thing, but on GitHub, if you add ?w=1 to the end of a diff URL, it'll show it with whitespace ignored.


My personal favorite: --patience

It tends to produce more "this looks like what I did" diffs, with code at least.


Or use a diff tool that can do this automatically..? [1] Diff without the ability to edit the changes you see is only about 10% as useful as the ability to edit those diffs inline, change how it's decided to compare the two (or three) files (match THIS line with THIS OTHER line...), or trivially revert the changes a line at a time. [2]

I hop back and forth between GUI-based editors and the command line. Some things are just better in a GUI, and diff is one of those things--especially any nontrivial diff.

[1] http://www.scootersoftware.com/

[2] I'm sure you can do this in vi and/or emacs as well. I leave how to do that as an exercise to the reader.


Well I don't disagree, even if everybody else does. I've long been a user of Araxis Merge, which is not only visual, but also lets you edit the diffs as you go, which comes in super handy for all the reasons that you state.

I didn't realise how spoiled I was until I started using git, and found the hard way how limiting the text diff format can be. Column-oriented changes in particular are very difficult to figure out, and even for more readable sets of changes I always found it a bit hard to get a rough idea of how spread out or clustered together the changes are. git gui and SourceTree try their best to help, but ultimately they're just displaying a prettified version of the same sometimes-unhelpful view of things.

On the other hand, if you're making small changes to a lot of files, a text diff is just the thing. It would be hyperbole to state that there's nothing worse than telling Araxis Merge to do a folder compare and find out that 1,000+ files have had 1 line changed in them - but it is rather annoying, and very time-consuming to investigate thoroughly. You need a text diff for that sort of thing.

But I still always reach for Araxis first...


As I said, I hop back and forth between GUI and command line; I haven't personally needed to deal with a folder diff with 1000+ files with one line changed each, but if I did, command line seems like a reasonable option.

Beyond Compare (linked above) lets you set patterns to ignore, which has been incredibly useful for when an autogenerated time/date stamp is changed in 1000+ files; I can say "ignore that change; tell me what changed that DOESN'T match the pattern".

You can do the same thing with grep on the command line of course. But then you lose out on all the GUI awesomeness.

I've used Araxis, and it's Even More Awesome, but 5x the cost (last I checked) as Beyond Compare, and I didn't think it was 5x as awesome. BC has gotten much cooler since then too (with comparison plug-ins, and bitmap compare -- let's see diff compare two bitmaps and tell me which pixels have changed! -- and DOC file compare, and normalized XML file compare, and so on ;).


`git difftool` is pretty much just like `git diff`, except that it shows the diff with your tool of choice.


On Mercurial, all of those options are just "hg diff -w".

You can also ignore whitespace when picking commits apart with hg record (akin to git add -p, I think).


Also everyone should be using tig: http://jonas.nitro.dk/tig/


I'd really like a diff option that ignores changes that don't alter the AST.

That's pretty outside the scope of git diff or gnu diff, though.


An alias for bash

  gitdiff() {
        git diff --patience --ignore-space-at-eol -b -w --ignore-blank-lines $1
  }


I am being pedantic, but that would be a function and not an alias.

This one is simple enough to be defined as an alias:

    alias gitdiff='git diff --patience --ignore-space-at-eol -b -w --ignore-blank-lines'
This[1] discusses aliases vs functions (just to make my reply a bit more useful [may be not for you, but in general] and to me feel less guilty.)

[1] http://unix.stackexchange.com/questions/30925/in-bash-when-t...


oops... yes. I'll stop posting on HN before I've eaten.


why make a bash function, when git has it's own alias feature?

Also, you're only accepting one argument, where there could be many depending on your refspec and path.


For those wondering: in ~/.gitconfig or similar,

  [alias]
    df = diff --patience --ignore-space-at-eol -b -w --ignore-blank-lines




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: