commands: teach 'delay' capability to `git-lfs-filter-process(1)` #2511

Merged
merged 20 commits into from Aug 24, 2017

Conversation

Projects
None yet
6 participants
@ttaylorr
Member

ttaylorr commented Aug 15, 2017

This pull requests teaches the git-lfs-filter-process(1) command the 'delay' capability that was proposed by @larsxschneider in #1632 (comment) and added to Git in git/git@51b8aec.

This implementation allows the process filter to respond with a status=delayed when asked to smudge the contents of a file that it does not already have in the local .git/lfs/objects cache.

If any items were delayed, Git will later ask for blobs that were available with the list_available_blobs command, which Git LFS will either respond with the immediately available blobs, wait until at least one becomes available, or respond with an empty set indicating that the transfer is complete.

To implement this, I added a new delayedSmudge helper function, which behaves the same way as smudge(), but instead adds the pointer to *tq.TransferQueue given as an argument.

Here are some benchmarks when running git clone on GitHub.com and a local copy of GitHub:

lfs.concurrenttransfers=1 ...=3 ...=8 ...=16 ...=32
v2.13.2
github.com 29.2297 - - - -
github.dev 18.4657 - - - -
v2.14.1.next
github.com 45.6183 28.6867 14.9877 13.8893 13.7767
github.dev 9.02767 7.41867 7.70833 7.33933 7.71067

For a delta of:

lfs.concurrenttransfers=1 ...=3 ...=8 ...=16 ...=32
github.com -43.3866 0.543 14.242 15.3404 15.453
github.dev 9.43803 11.04703 10.75737 11.12637 10.75503

Excluding outliers:

v2.13.2 v2.14.1.next delta delta %
github.com 29.2297 17.8351 11.4s faster 164% faster
github.dev 18.4657 7.840934 10.6s faster 236% faster

There were three big performance changes that impact this improvement:

  1. #2483: Infinitely buffering the tq.(*TransferQueue).Add() function, which allows the TransferQueue to accept new items sent by Git, instead of waiting until the first batch of items were transferred.
  2. #2506: Increasing the default number of concurrently running transfers from 3 to 8 allows LFS to take advantage of more network bandwidth and process items faster.
  3. Not blocking on list_available_blobs until tq.Wait() returns. Instead, read from tq.Watch() until a read becomes blocking (via select { case <-tq.Wait(): ... default: }. This allows Git to perform the checkout while the tq.(*TransferQueue) is still transferring items.

Following this, I'd like to deprecate the git lfs clone command, since git clone is now as fast or faster than git lfs clone as of the latest tip of git.git.

EDIT: the each number is the average of 5 runs of a given command, with the outliers removed. The benchmark command is written as: benchmark.sh. The average-timings command is written as: average-timings.sh.

Closes: #2466.


/cc @git-lfs/core
/cc @larsxschneider

@ttaylorr ttaylorr added the review label Aug 15, 2017

@ttaylorr ttaylorr added this to the v2.3.0 milestone Aug 15, 2017

@ttaylorr ttaylorr requested review from technoweenie and larsxschneider Aug 15, 2017

@technoweenie

Looking great. I pointed out a few things below:

commands/command_filter_process.go
+ status = delayedStausFromErr(err)
+ } else {
+ status = statusFromErr(err)
+ }

This comment has been minimized.

@technoweenie

technoweenie Aug 16, 2017

Member

IMO, this nested if is complicated. Would something like this work? It looks like calling Flush() on nil is a no-op any way.

if delayed {
  s.WriteStatus(delayedStausFromErr(err))
} else if ferr := w.Flush(); ferr != nil {
  s.WriteStatus(statusFromErr(ferr))
} else {
  s.WriteStatus(statusFromErr(err))
}

Continuing to declare status outside of that scope is fine too, but it didn't look like it was needed anywhere else.

BTW delayedStausFromErr has a typo :)

@technoweenie

technoweenie Aug 16, 2017

Member

IMO, this nested if is complicated. Would something like this work? It looks like calling Flush() on nil is a no-op any way.

if delayed {
  s.WriteStatus(delayedStausFromErr(err))
} else if ferr := w.Flush(); ferr != nil {
  s.WriteStatus(statusFromErr(ferr))
} else {
  s.WriteStatus(statusFromErr(err))
}

Continuing to declare status outside of that scope is fine too, but it didn't look like it was needed anywhere else.

BTW delayedStausFromErr has a typo :)

This comment has been minimized.

@ttaylorr

ttaylorr Aug 16, 2017

Member

Way better -- thanks for this suggestion. Changed things up in 1001e1c (and I got that typo!)

@ttaylorr

ttaylorr Aug 16, 2017

Member

Way better -- thanks for this suggestion. Changed things up in 1001e1c (and I got that typo!)

commands/command_smudge.go
+//
+// delayedSmudge returns the number of bytes written, whether the checkout was
+// delayed, the *lfs.Pointer that was smudged, and an error, if one occurred.
+func delayedSmudge(s *git.FilterProcessScanner, to io.Writer, from io.Reader, tq *tq.TransferQueue, filename string, skip bool, filter *filepathfilter.Filter) (int64, bool, *lfs.Pointer, error) {

This comment has been minimized.

@technoweenie

technoweenie Aug 16, 2017

Member

The tq parameter clobbers the tq package name. Not critical of course, but may be worth changing.

@technoweenie

technoweenie Aug 16, 2017

Member

The tq parameter clobbers the tq package name. Not critical of course, but may be worth changing.

This comment has been minimized.

@ttaylorr

ttaylorr Aug 16, 2017

Member

Good call -- I think we should be fine for now, since we don't use any other exported members from the tq package, but this will be a good way to prevent an ambiguous diff later on if that changes. Covered this one in: a9942cd.

@ttaylorr

ttaylorr Aug 16, 2017

Member

Good call -- I think we should be fine for now, since we don't use any other exported members from the tq package, but this will be a good way to prevent an ambiguous diff later on if that changes. Covered this one in: a9942cd.

commands/command_filter_process.go
+ }
+ }
+ case "list_available_blobs":
+ closeOnce.Do(func() { go q.Wait() })

This comment has been minimized.

@technoweenie

technoweenie Aug 16, 2017

Member

I could use some clarification here. It's my understanding that a transfer queue can accept new transfers until Wait() is called. If I'm interpreting the protocol docs correctly, then the process should look like this:

  1. Git sends all pointers through smudge with can-delay=1
  2. Git sends list_available_blobs. The transfer queue is closed.
  3. Git sends more smudge commands for each available blob, never calling list_available_blobs again.
  4. Profit!!
@technoweenie

technoweenie Aug 16, 2017

Member

I could use some clarification here. It's my understanding that a transfer queue can accept new transfers until Wait() is called. If I'm interpreting the protocol docs correctly, then the process should look like this:

  1. Git sends all pointers through smudge with can-delay=1
  2. Git sends list_available_blobs. The transfer queue is closed.
  3. Git sends more smudge commands for each available blob, never calling list_available_blobs again.
  4. Profit!!

This comment has been minimized.

@ttaylorr

ttaylorr Aug 16, 2017

Member

🤔 not quite -- I think the confusion has to do with the readAvailable() func. readAvailable does as many reads as possible before <-tq.Watch() becomes blocking, responding to a list_available_blobs command with any blobs that it currently has ready, or blocking until at least one blob is ready.

The process looks like this:

  1. Git sends all pointers through command=smudge with can-delay=1.
  2. Git issues a list_available_blobs command, indicating that no more new blobs will be sent as above. The transfer queue is accordingly closed.
  3. We read as many currently available blobs as we can before blocking, or block until a) at least one blob becomes available, or b) the transfer queue has no more blobs to send, in which case we send an empty list, and the process is complete.
  4. Git send a command=smudge with can-delay=0 for each blob in the list above.
  5. Git issues another list_available_blobs command.
  6. See step 3.

I did my best to clarify this in 8816463.

@ttaylorr

ttaylorr Aug 16, 2017

Member

🤔 not quite -- I think the confusion has to do with the readAvailable() func. readAvailable does as many reads as possible before <-tq.Watch() becomes blocking, responding to a list_available_blobs command with any blobs that it currently has ready, or blocking until at least one blob is ready.

The process looks like this:

  1. Git sends all pointers through command=smudge with can-delay=1.
  2. Git issues a list_available_blobs command, indicating that no more new blobs will be sent as above. The transfer queue is accordingly closed.
  3. We read as many currently available blobs as we can before blocking, or block until a) at least one blob becomes available, or b) the transfer queue has no more blobs to send, in which case we send an empty list, and the process is complete.
  4. Git send a command=smudge with can-delay=0 for each blob in the list above.
  5. Git issues another list_available_blobs command.
  6. See step 3.

I did my best to clarify this in 8816463.

+ if !ok {
+ return ts
+ }
+ return append(ts, t)

This comment has been minimized.

@technoweenie

technoweenie Aug 16, 2017

Member

Does this part need to exist? If ts is empty, the for should start another iteration that'll attempt to read from the channel. Very possible there's some detail about select that I'm forgetting...

@technoweenie

technoweenie Aug 16, 2017

Member

Does this part need to exist? If ts is empty, the for should start another iteration that'll attempt to read from the channel. Very possible there's some detail about select that I'm forgetting...

This comment has been minimized.

@ttaylorr

ttaylorr Aug 17, 2017

Member

I think so, though the duplication is unfortunate. Specifically, this return append, could be turned into an append only, but the blocking <-ch read causes the enclosing for loop to not spin.

@ttaylorr

ttaylorr Aug 17, 2017

Member

I think so, though the duplication is unfortunate. Specifically, this return append, could be turned into an append only, but the blocking <-ch read causes the enclosing for loop to not spin.

This comment has been minimized.

@technoweenie

technoweenie Aug 17, 2017

Member

Jeez, the return makes all the difference in the world. Thanks.

@technoweenie

technoweenie Aug 17, 2017

Member

Jeez, the return makes all the difference in the world. Thanks.

test/test-track.sh
@@ -29,6 +29,10 @@ begin_test "track"
mkdir -p a/b
+ if [ ! -d .git/info ]; then
+ mkdir .git/info
+ fi

This comment has been minimized.

@technoweenie

technoweenie Aug 17, 2017

Member

FWIW I think you could replace this with mkdir -p .git/info.

@technoweenie

technoweenie Aug 17, 2017

Member

FWIW I think you could replace this with mkdir -p .git/info.

This comment has been minimized.

@ttaylorr

ttaylorr Aug 17, 2017

Member

Good call -- I got this one in: bc376b8.

@ttaylorr

ttaylorr Aug 17, 2017

Member

Good call -- I got this one in: bc376b8.

@technoweenie

Really excited to see this land!

@larsxschneider

This comment has been minimized.

Show comment
Hide comment
@larsxschneider

larsxschneider Aug 20, 2017

Contributor

As suggested by @ttaylorr, I added a progress indicator for filtering content in Git and posted the patch to the Git mailing list.

If you clone a repo it would look like this:

$ git clone https://server/some/repo.git
Cloning into 'repo'...
remote: Counting objects: 6077, done.
remote: Total 6077 (delta 0), reused 0 (delta 0), pack-reused 6077
Receiving objects: 100% (6077/6077), 6.00 MiB | 1.24 MiB/s, done.
Resolving deltas: 100% (4177/4177), done.
Filtering content: 100% (62/62), 7.63 MiB | 304.00 KiB/s, done.

If you checkout a commit that references many LFS files that you don't have locally on your machine yet, then you would see something like this:

$ git checkout mybranch
Filtering content: 100% (62/62), 7.63 MiB | 312.00 KiB/s, done.
Switched to branch 'mybranch'

A nice side effect is that users can easily see the bandwidth performance of their LFS servers 🙂

Contributor

larsxschneider commented Aug 20, 2017

As suggested by @ttaylorr, I added a progress indicator for filtering content in Git and posted the patch to the Git mailing list.

If you clone a repo it would look like this:

$ git clone https://server/some/repo.git
Cloning into 'repo'...
remote: Counting objects: 6077, done.
remote: Total 6077 (delta 0), reused 0 (delta 0), pack-reused 6077
Receiving objects: 100% (6077/6077), 6.00 MiB | 1.24 MiB/s, done.
Resolving deltas: 100% (4177/4177), done.
Filtering content: 100% (62/62), 7.63 MiB | 304.00 KiB/s, done.

If you checkout a commit that references many LFS files that you don't have locally on your machine yet, then you would see something like this:

$ git checkout mybranch
Filtering content: 100% (62/62), 7.63 MiB | 312.00 KiB/s, done.
Switched to branch 'mybranch'

A nice side effect is that users can easily see the bandwidth performance of their LFS servers 🙂

@larsxschneider

This comment has been minimized.

Show comment
Hide comment
@larsxschneider

larsxschneider Aug 20, 2017

Contributor

There might be a bug in the current (7115b57) delay implementation. Try these steps:

(1) Checkout a branch of a repo with files in LFS
(2) Delete all files in that repo except for the .git folder
(3) Run git checkout -f HEAD

The checkout operation should be blazingly fast as all objects should be available locally already. However, it looks like as if Git LFS downloads all objects, again.

Contributor

larsxschneider commented Aug 20, 2017

There might be a bug in the current (7115b57) delay implementation. Try these steps:

(1) Checkout a branch of a repo with files in LFS
(2) Delete all files in that repo except for the .git folder
(3) Run git checkout -f HEAD

The checkout operation should be blazingly fast as all objects should be available locally already. However, it looks like as if Git LFS downloads all objects, again.

@ttaylorr

This comment has been minimized.

Show comment
Hide comment
@ttaylorr

ttaylorr Aug 20, 2017

Member

The checkout operation should be blazingly fast as all objects should be available locally already. However, it looks like as if Git LFS downloads all objects, again.

@larsxschneider great catch, this was definitely a bug. With c04745c:

~/D/git-lfs-test (master) $ find .git/lfs/objects -type f | wc -l
     187

~/D/git-lfs-test (master) $ rm -rf ./000 ./001 ./002 ./003 ./004 ./005 ./006 ./007 ./008 ./009 ./010 ./script

~/D/git-lfs-test (master!) $ time git checkout -f HEAD
Your branch is up-to-date with 'origin/master'.
git checkout -f HEAD  0.05s user 0.07s system 84% cpu 0.146 total

EDIT: the above commit had a bug where it would not smudge pointers for which the local object was cached. I fixed that in: 50ecd67.

Member

ttaylorr commented Aug 20, 2017

The checkout operation should be blazingly fast as all objects should be available locally already. However, it looks like as if Git LFS downloads all objects, again.

@larsxschneider great catch, this was definitely a bug. With c04745c:

~/D/git-lfs-test (master) $ find .git/lfs/objects -type f | wc -l
     187

~/D/git-lfs-test (master) $ rm -rf ./000 ./001 ./002 ./003 ./004 ./005 ./006 ./007 ./008 ./009 ./010 ./script

~/D/git-lfs-test (master!) $ time git checkout -f HEAD
Your branch is up-to-date with 'origin/master'.
git checkout -f HEAD  0.05s user 0.07s system 84% cpu 0.146 total

EDIT: the above commit had a bug where it would not smudge pointers for which the local object was cached. I fixed that in: 50ecd67.

commands/command_smudge.go
@@ -63,6 +63,15 @@ func delayedSmudge(s *git.FilterProcessScanner, to io.Writer, from io.Reader, q
q.Add(filename, path, ptr.Oid, ptr.Size)
return 0, true, ptr, nil
}
+
+ // Write 'statusFromErr(nil)', even though 'perr != nil', since
+ // we are about to write non-delayed smudged contents to "to".

This comment has been minimized.

@larsxschneider

larsxschneider Aug 20, 2017

Contributor

The fix seems to work! Thank you 👍! However, I still try to decipher the comment 😄. perr should be nil here, no? TBH I don't really understand the same comment a few lines above either. Can you give me a hint?

@larsxschneider

larsxschneider Aug 20, 2017

Contributor

The fix seems to work! Thank you 👍! However, I still try to decipher the comment 😄. perr should be nil here, no? TBH I don't really understand the same comment a few lines above either. Can you give me a hint?

This comment has been minimized.

@larsxschneider

larsxschneider Aug 20, 2017

Contributor

Plus, could it be that this was the reason for the degraded performance in the lfs.concurrenttransfers=1 case?

@larsxschneider

larsxschneider Aug 20, 2017

Contributor

Plus, could it be that this was the reason for the degraded performance in the lfs.concurrenttransfers=1 case?

This comment has been minimized.

@ttaylorr

ttaylorr Aug 21, 2017

Member

Ah, good question -- I think the comment is out of date but the code is correct here: this case occurs when we have the contents locally and can smudge without delay. It is certainly possible that this could affect cases where lfs.concurrenttransfers=1, but I think this would be negligible. I fixed this up in 4cefa24.

@ttaylorr

ttaylorr Aug 21, 2017

Member

Ah, good question -- I think the comment is out of date but the code is correct here: this case occurs when we have the contents locally and can smudge without delay. It is certainly possible that this could affect cases where lfs.concurrenttransfers=1, but I think this would be negligible. I fixed this up in 4cefa24.

@larsxschneider

This comment has been minimized.

Show comment
Hide comment
@larsxschneider

larsxschneider Aug 20, 2017

Contributor

Cloning repositories via SSH with a large number of LFS files seems to consistently result in a "batch response: Bad credentials" error:

$ git clone git@github.com:larsxschneider/lfstest-manyfiles.git testrepo
Cloning into 'testrepo'...
warning: templates not found /Users/lars/share/git-core/templates
remote: Counting objects: 15012, done.
remote: Total 15012 (delta 0), reused 0 (delta 0), pack-reused 15012
Receiving objects: 100% (15012/15012), 2.02 MiB | 1.57 MiB/s, done.
Downloading 1/test11791.bin (6 B)00), 17.69 KiB | 0 bytes/s
Error downloading object: 1/test11791.bin (c5a39d8): Smudge error: Error downloading 1/test11791.bin (c5a39d8045fd604bdcf307bbe06e7e41dd59a2399dc2a1807e4a6e3dfef5148b): batch response: Bad credentials

Errors logged to /Users/lars/Temp/lfstest/testrepo/.git/lfs/objects/logs/20170820T190408.392912345.log
Use `git lfs logs last` to view the log.
error: external filter 'git-lfs filter-process' failed
fatal: 1/test11791.bin: smudge filter lfs failed
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry the checkout with 'git checkout -f HEAD'

Plus, in case of an error Git LFS prints Downloading 1/test11791.bin (6 B) over the Filtering content: 100% (62/62), 7.63 MiB | 312.00 KiB/s progress.
Can/should we disable the Downloading... output in Git LFS?

Contributor

larsxschneider commented Aug 20, 2017

Cloning repositories via SSH with a large number of LFS files seems to consistently result in a "batch response: Bad credentials" error:

$ git clone git@github.com:larsxschneider/lfstest-manyfiles.git testrepo
Cloning into 'testrepo'...
warning: templates not found /Users/lars/share/git-core/templates
remote: Counting objects: 15012, done.
remote: Total 15012 (delta 0), reused 0 (delta 0), pack-reused 15012
Receiving objects: 100% (15012/15012), 2.02 MiB | 1.57 MiB/s, done.
Downloading 1/test11791.bin (6 B)00), 17.69 KiB | 0 bytes/s
Error downloading object: 1/test11791.bin (c5a39d8): Smudge error: Error downloading 1/test11791.bin (c5a39d8045fd604bdcf307bbe06e7e41dd59a2399dc2a1807e4a6e3dfef5148b): batch response: Bad credentials

Errors logged to /Users/lars/Temp/lfstest/testrepo/.git/lfs/objects/logs/20170820T190408.392912345.log
Use `git lfs logs last` to view the log.
error: external filter 'git-lfs filter-process' failed
fatal: 1/test11791.bin: smudge filter lfs failed
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry the checkout with 'git checkout -f HEAD'

Plus, in case of an error Git LFS prints Downloading 1/test11791.bin (6 B) over the Filtering content: 100% (62/62), 7.63 MiB | 312.00 KiB/s progress.
Can/should we disable the Downloading... output in Git LFS?

@technoweenie

This comment has been minimized.

Show comment
Hide comment
@technoweenie

technoweenie Aug 20, 2017

Member

Cloning repositories via SSH with a large number of LFS files seems to consistently result in a "batch response: Bad credentials" error:

Hmm, I wouldn't think this PR would affect that. It's probably an unrelated issue. Can you try with an HTTPS remote (just to confirm whether this is a general transfer queue issue, an ssh auth issue, or a delay filter issue)?

As for the SSH thing, my hunch is that the expiration timeout is not being respected. Is this happening at around the 10 minute mark?

$ ssh git@github.com git-lfs-authenticate github/git-lfs-test download
{
  "href": "https://lfs.github.com/github/git-lfs-test",
  "header": {
    "Authorization": "RemoteAuth booya"
  },
  "expires_at": "2017-08-20T18:10:12Z",
  "expires_in": 599
}

Two potential spots that I can think of:

  • Does the Batch API try to refresh the Authorization header after it expires? I thought it recognizes that and automatically retries. Maybe there's a race condition, and it should refresh the token if expires_in is < 5 or something.
  • Does the SSH cred cacher respect the expires_in? It should be falling back to a real ssh call after expires_in seconds have passed.

Can/should we disable the Downloading... output in Git LFS?

I think we still need some output for legacy smudges. The filter should disable them (or even implement its own progress output).

Member

technoweenie commented Aug 20, 2017

Cloning repositories via SSH with a large number of LFS files seems to consistently result in a "batch response: Bad credentials" error:

Hmm, I wouldn't think this PR would affect that. It's probably an unrelated issue. Can you try with an HTTPS remote (just to confirm whether this is a general transfer queue issue, an ssh auth issue, or a delay filter issue)?

As for the SSH thing, my hunch is that the expiration timeout is not being respected. Is this happening at around the 10 minute mark?

$ ssh git@github.com git-lfs-authenticate github/git-lfs-test download
{
  "href": "https://lfs.github.com/github/git-lfs-test",
  "header": {
    "Authorization": "RemoteAuth booya"
  },
  "expires_at": "2017-08-20T18:10:12Z",
  "expires_in": 599
}

Two potential spots that I can think of:

  • Does the Batch API try to refresh the Authorization header after it expires? I thought it recognizes that and automatically retries. Maybe there's a race condition, and it should refresh the token if expires_in is < 5 or something.
  • Does the SSH cred cacher respect the expires_in? It should be falling back to a real ssh call after expires_in seconds have passed.

Can/should we disable the Downloading... output in Git LFS?

I think we still need some output for legacy smudges. The filter should disable them (or even implement its own progress output).

@larsxschneider

This comment has been minimized.

Show comment
Hide comment
@larsxschneider

larsxschneider Aug 20, 2017

Contributor

@technoweenie it seems to work without issues for HTTPS remotes and the SSH error happens exactly at the 10min mark.

Contributor

larsxschneider commented Aug 20, 2017

@technoweenie it seems to work without issues for HTTPS remotes and the SSH error happens exactly at the 10min mark.

@technoweenie

This comment has been minimized.

Show comment
Hide comment
@technoweenie

technoweenie Aug 20, 2017

Member

it seems to work without issues for HTTPS remotes and the SSH error happens exactly at the 10min mark.

Got it. For this PR, let's focus on the delay filter itself if possible. I've opened up #2519 as a place to discuss and fix the SSH timeout issue you experienced. Thanks for bringing it up :)

Member

technoweenie commented Aug 20, 2017

it seems to work without issues for HTTPS remotes and the SSH error happens exactly at the 10min mark.

Got it. For this PR, let's focus on the delay filter itself if possible. I've opened up #2519 as a place to discuss and fix the SSH timeout issue you experienced. Thanks for bringing it up :)

commands/command_filter_process.go
- s.WriteStatus(statusFromErr(nil))
+ if !(req.Header["command"] == "smudge" && req.Header["can-delay"] == "1") && !(req.Header["command"] == "list_available_blobs") {
+ s.WriteStatus(statusFromErr(nil))
+ }

This comment has been minimized.

@larsxschneider

larsxschneider Aug 24, 2017

Contributor

Nit: I feel this makes the protocol a bit hard to read.
I would rather add s.WriteStatus(statusFromErr(nil)) in the switch statement twice below.

@larsxschneider

larsxschneider Aug 24, 2017

Contributor

Nit: I feel this makes the protocol a bit hard to read.
I would rather add s.WriteStatus(statusFromErr(nil)) in the switch statement twice below.

This comment has been minimized.

@ttaylorr

ttaylorr Aug 24, 2017

Member

I agree, changed in: 8c9a180.

@ttaylorr

ttaylorr Aug 24, 2017

Member

I agree, changed in: 8c9a180.

+// 2. There is one item available, or ...
+// 3. The 'tq.TransferQueue' is completed.
+func readAvailable(ch <-chan *tq.Transfer) []*tq.Transfer {
+ ts := make([]*tq.Transfer, 0, 100)

This comment has been minimized.

@larsxschneider

larsxschneider Aug 24, 2017

Contributor

I am curious, why do we initalize the array with 100 items?

@larsxschneider

larsxschneider Aug 24, 2017

Contributor

I am curious, why do we initalize the array with 100 items?

This comment has been minimized.

@ttaylorr

ttaylorr Aug 24, 2017

Member

Since the transfer queue has a batch size of 100, and Watch() is buffered at 100 items, all 100 could be ready at once. This is done to prevent growing the array.

@ttaylorr

ttaylorr Aug 24, 2017

Member

Since the transfer queue has a batch size of 100, and Watch() is buffered at 100 items, all 100 could be ready at once. This is done to prevent growing the array.

This comment has been minimized.

@larsxschneider

larsxschneider Aug 25, 2017

Contributor

Wouldn't it make sense to use a constant here?

@larsxschneider

larsxschneider Aug 25, 2017

Contributor

Wouldn't it make sense to use a constant here?

commands/command_filter_process.go
+ if err != nil && err != io.EOF {
+ return "error"
+ }
+ return "delayed"

This comment has been minimized.

@larsxschneider

larsxschneider Aug 24, 2017

Contributor

nit: I wouldn't duplicate the error code here. how about:

func delayedStatusFromErr(err error) string {
    status := statusFromErr(err)
    if status == "success" {
        return "delayed"
    }
    return status
}
@larsxschneider

larsxschneider Aug 24, 2017

Contributor

nit: I wouldn't duplicate the error code here. how about:

func delayedStatusFromErr(err error) string {
    status := statusFromErr(err)
    if status == "success" {
        return "delayed"
    }
    return status
}

This comment has been minimized.

@ttaylorr

ttaylorr Aug 24, 2017

Member

Good call. I was hesitant to write this in the way you suggested originally, because it binds the implementation of this function to the protocol having the value "success". That said, I think this change is well motivated, and I am going to follow up with a future pull request to make these values non protocol-specific.

@ttaylorr

ttaylorr Aug 24, 2017

Member

Good call. I was hesitant to write this in the way you suggested originally, because it binds the implementation of this function to the protocol having the value "success". That said, I think this change is well motivated, and I am going to follow up with a future pull request to make these values non protocol-specific.

+//
+// delayedSmudge returns the number of bytes written, whether the checkout was
+// delayed, the *lfs.Pointer that was smudged, and an error, if one occurred.
+func delayedSmudge(s *git.FilterProcessScanner, to io.Writer, from io.Reader, q *tq.TransferQueue, filename string, skip bool, filter *filepathfilter.Filter) (int64, bool, *lfs.Pointer, error) {

This comment has been minimized.

@larsxschneider

larsxschneider Aug 24, 2017

Contributor

nit: line break?

@larsxschneider

larsxschneider Aug 24, 2017

Contributor

nit: line break?

This comment has been minimized.

@ttaylorr

ttaylorr Aug 24, 2017

Member

Even though this is a long line, I can't think of any other function signatures that we wrap over multiple lines in LFS. For the sake of consistency, I'm going to leave this here. This long line has the added benefit of pointing out a smell that should be fixed.

@ttaylorr

ttaylorr Aug 24, 2017

Member

Even though this is a long line, I can't think of any other function signatures that we wrap over multiple lines in LFS. For the sake of consistency, I'm going to leave this here. This long line has the added benefit of pointing out a smell that should be fixed.

test/test-malformed-pointers.sh
@@ -83,7 +83,9 @@ begin_test "empty pointers"
pushd .. >/dev/null
clone_repo "$reponame" "$reponame-assert"
- [ "0" -eq "$(grep -c "empty.dat" clone.log)" ]
+ # The below assertion is known to be broken on Git v2.14.next.

This comment has been minimized.

@larsxschneider

larsxschneider Aug 24, 2017

Contributor

Why is that? Does this require further investigation?

@larsxschneider

larsxschneider Aug 24, 2017

Contributor

Why is that? Does this require further investigation?

This comment has been minimized.

@larsxschneider

larsxschneider Aug 24, 2017

Contributor

Plus, we could switch back to test master, again, no?

@larsxschneider

larsxschneider Aug 24, 2017

Contributor

Plus, we could switch back to test master, again, no?

This comment has been minimized.

@ttaylorr

ttaylorr Aug 24, 2017

Member

@larsxschneider I'm not sure, but it appears that this is fixed on 'master', whatever was causing it, so I removed the assertion skip in 8bd7aac.

@ttaylorr

ttaylorr Aug 24, 2017

Member

@larsxschneider I'm not sure, but it appears that this is fixed on 'master', whatever was causing it, so I removed the assertion skip in 8bd7aac.

This comment has been minimized.

@ttaylorr

ttaylorr Aug 24, 2017

Member

I'll follow up in a separate PR to remove the test on the 'next' branch.

@ttaylorr

ttaylorr Aug 24, 2017

Member

I'll follow up in a separate PR to remove the test on the 'next' branch.

@larsxschneider

I added a few nits that you can address if you want (most relevant from my point of view is the delayedStatusFromErr suggestion).

I am happy to see that merged 👍

@ttaylorr

This comment has been minimized.

Show comment
Hide comment
@ttaylorr

ttaylorr Aug 24, 2017

Member

🙇 thanks for the review, everybody!

Member

ttaylorr commented Aug 24, 2017

🙇 thanks for the review, everybody!

@ttaylorr ttaylorr merged commit ef961a1 into master Aug 24, 2017

4 checks passed

GitHub CLA @ttaylorr has accepted the GitHub Contributor License Agreement.
Details
ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@ttaylorr ttaylorr deleted the delay branch Aug 24, 2017

@jjgod

This comment has been minimized.

Show comment
Hide comment
@jjgod

jjgod Sep 14, 2017

What git version do I need for this?

~$ git lfs version
git-lfs/2.3.0 (GitHub; darwin amd64; go 1.8.3; git 4dd2bf73)
~$ git --version
git version 2.14.1

Yet it doesn't seem to work, no concurrent transfers when I do git clone .

For cloning the same repository:

git lfs clone:

real	0m28.256s
user	0m24.853s
sys	0m9.224s

git clone:

real	2m5.616s
user	0m15.168s
sys	0m10.439s

jjgod commented Sep 14, 2017

What git version do I need for this?

~$ git lfs version
git-lfs/2.3.0 (GitHub; darwin amd64; go 1.8.3; git 4dd2bf73)
~$ git --version
git version 2.14.1

Yet it doesn't seem to work, no concurrent transfers when I do git clone .

For cloning the same repository:

git lfs clone:

real	0m28.256s
user	0m24.853s
sys	0m9.224s

git clone:

real	2m5.616s
user	0m15.168s
sys	0m10.439s
@larsxschneider

This comment has been minimized.

Show comment
Hide comment
@larsxschneider

larsxschneider Sep 15, 2017

Contributor

@jjgod You need the upcoming Git version 2.15 😄
See https://twitter.com/kit3bus/status/908579232422486016

Contributor

larsxschneider commented Sep 15, 2017

@jjgod You need the upcoming Git version 2.15 😄
See https://twitter.com/kit3bus/status/908579232422486016

@DJm00n

This comment has been minimized.

Show comment
Hide comment
@DJm00n

DJm00n Oct 31, 2017

@larsxschneider

$ git --version
git version 2.15.0.windows.1
$ git lfs version
git-lfs/2.3.4 (GitHub; windows amd64; go 1.8.3; git d2f6752f)

Clone speed is fine but no progress output and looks like hang:

Speed:
git clone

real 0m34.317s
user 0m0.000s
sys 0m0.000s

git lfs clone

Git LFS: (16 of 16 files) 130.94 MB / 130.94 MB
real    0m38.161s
user    0m0.000s
sys     0m0.015s

DJm00n commented Oct 31, 2017

@larsxschneider

$ git --version
git version 2.15.0.windows.1
$ git lfs version
git-lfs/2.3.4 (GitHub; windows amd64; go 1.8.3; git d2f6752f)

Clone speed is fine but no progress output and looks like hang:

Speed:
git clone

real 0m34.317s
user 0m0.000s
sys 0m0.000s

git lfs clone

Git LFS: (16 of 16 files) 130.94 MB / 130.94 MB
real    0m38.161s
user    0m0.000s
sys     0m0.015s
@DJm00n

This comment has been minimized.

Show comment
Hide comment
@DJm00n

DJm00n Oct 31, 2017

@larsxschneider Strange, seems like git/git@52f1d62 is merged to git 2.15 already. macOS output is without progress meter too.

DJm00n commented Oct 31, 2017

@larsxschneider Strange, seems like git/git@52f1d62 is merged to git 2.15 already. macOS output is without progress meter too.

@larsxschneider

This comment has been minimized.

Show comment
Hide comment
@larsxschneider

larsxschneider Oct 31, 2017

Contributor
Contributor

larsxschneider commented Oct 31, 2017

@DJm00n

This comment has been minimized.

Show comment
Hide comment
@DJm00n

DJm00n Oct 31, 2017

@larsxschneider, missed that thread. On your test repo
git clone http://www.oddjack.com/?certs=larsxschneider/lfstest-manyfiles prints "Filtering content" progress too on my Macbook. But not on my vsonline test repo. So it looks like more general git issue not git-for-windows only.

MacBook-Pro:Develop djm00n$ git version
git version 2.15.0
MacBook-Pro:Develop djm00n$ git lfs version
git-lfs/2.3.4 (GitHub; darwin amd64; go 1.9.1)

Git and Git LFS are installed from Homebrew.

DJm00n commented Oct 31, 2017

@larsxschneider, missed that thread. On your test repo
git clone http://www.oddjack.com/?certs=larsxschneider/lfstest-manyfiles prints "Filtering content" progress too on my Macbook. But not on my vsonline test repo. So it looks like more general git issue not git-for-windows only.

MacBook-Pro:Develop djm00n$ git version
git version 2.15.0
MacBook-Pro:Develop djm00n$ git lfs version
git-lfs/2.3.4 (GitHub; darwin amd64; go 1.9.1)

Git and Git LFS are installed from Homebrew.

@DJm00n

This comment has been minimized.

Show comment
Hide comment
@DJm00n

DJm00n Dec 4, 2017

@larsxschneider just updated to git 2.15.1 and still no progress showing on clone. Any updates?
If any info/logs are needed just ask and ill provide them.
my test repo:
repo https://dimitriy-ryazantcev.visualstudio.com/DefaultCollection/_git/Test1
gittest1/gittest999!

DJm00n commented Dec 4, 2017

@larsxschneider just updated to git 2.15.1 and still no progress showing on clone. Any updates?
If any info/logs are needed just ask and ill provide them.
my test repo:
repo https://dimitriy-ryazantcev.visualstudio.com/DefaultCollection/_git/Test1
gittest1/gittest999!

@larsxschneider

This comment has been minimized.

Show comment
Hide comment
@larsxschneider

larsxschneider Dec 4, 2017

Contributor

Thanks @DJm00n ! I can reproduce the issue with your repo! Let's see if I can find the problem...

Contributor

larsxschneider commented Dec 4, 2017

Thanks @DJm00n ! I can reproduce the issue with your repo! Let's see if I can find the problem...

@larsxschneider

This comment has been minimized.

Show comment
Hide comment
@larsxschneider

larsxschneider Dec 4, 2017

Contributor

@DJm00n I think I found and fixed the issue. Let's see what the Git experts think: https://public-inbox.org/git/20171204203647.30546-1-lars.schneider@autodesk.com/

Contributor

larsxschneider commented Dec 4, 2017

@DJm00n I think I found and fixed the issue. Let's see what the Git experts think: https://public-inbox.org/git/20171204203647.30546-1-lars.schneider@autodesk.com/

@ttaylorr

This comment has been minimized.

Show comment
Hide comment
@ttaylorr

ttaylorr Dec 5, 2017

Member

@DJm00n I think I found and fixed the issue. Let's see what the Git experts think: https://public-inbox.org/git/20171204203647.30546-1-lars.schneider@autodesk.com/

@larsxschneider Excellent find, and thanks for keeping us up-to-date.

Member

ttaylorr commented Dec 5, 2017

@DJm00n I think I found and fixed the issue. Let's see what the Git experts think: https://public-inbox.org/git/20171204203647.30546-1-lars.schneider@autodesk.com/

@larsxschneider Excellent find, and thanks for keeping us up-to-date.

@awmcclain

This comment has been minimized.

Show comment
Hide comment
@awmcclain

awmcclain May 9, 2018

Sorry for the late question, but does this mean that by upgrading lfs & git that the speed will increase for git pull's where a large number of added files are tracked?

awmcclain commented May 9, 2018

Sorry for the late question, but does this mean that by upgrading lfs & git that the speed will increase for git pull's where a large number of added files are tracked?

@ttaylorr

This comment has been minimized.

Show comment
Hide comment
@ttaylorr

ttaylorr May 9, 2018

Member

but does this mean that by upgrading lfs & git that the speed will increase for git pull's where a large number of added files are tracked?

That's correct!

Member

ttaylorr commented May 9, 2018

but does this mean that by upgrading lfs & git that the speed will increase for git pull's where a large number of added files are tracked?

That's correct!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment