Merge branch 'ds/commit-graph'
authorJunio C Hamano <gitster@pobox.com>
Tue, 8 May 2018 06:59:20 +0000 (15:59 +0900)
committerJunio C Hamano <gitster@pobox.com>
Tue, 8 May 2018 06:59:20 +0000 (15:59 +0900)
Precompute and store information necessary for ancestry traversal
in a separate file to optimize graph walking.

* ds/commit-graph:
  commit-graph: implement "--append" option
  commit-graph: build graph from starting commits
  commit-graph: read only from specific pack-indexes
  commit: integrate commit graph with commit parsing
  commit-graph: close under reachability
  commit-graph: add core.commitGraph setting
  commit-graph: implement git commit-graph read
  commit-graph: implement git-commit-graph write
  commit-graph: implement write_commit_graph()
  commit-graph: create git-commit-graph builtin
  graph: add commit graph design document
  commit-graph: add format document
  csum-file: refactor finalize_hashfile() method
  csum-file: rename hashclose() to finalize_hashfile()

30 files changed:
.gitignore
Documentation/config.txt
Documentation/git-commit-graph.txt [new file with mode: 0644]
Documentation/technical/commit-graph-format.txt [new file with mode: 0644]
Documentation/technical/commit-graph.txt [new file with mode: 0644]
Makefile
alloc.c
builtin.h
builtin/commit-graph.c [new file with mode: 0644]
builtin/index-pack.c
builtin/pack-objects.c
bulk-checkin.c
cache.h
command-list.txt
commit-graph.c [new file with mode: 0644]
commit-graph.h [new file with mode: 0644]
commit.c
commit.h
config.c
contrib/completion/git-completion.bash
csum-file.c
csum-file.h
environment.c
fast-import.c
git.c
pack-bitmap-write.c
pack-write.c
packfile.c
packfile.h
t/t5318-commit-graph.sh [new file with mode: 0755]

index 8d9a458..b2a1ae4 100644 (file)
@@ -35,6 +35,7 @@
 /git-clone
 /git-column
 /git-commit
+/git-commit-graph
 /git-commit-tree
 /git-config
 /git-count-objects
index 2659153..cb4c93f 100644 (file)
@@ -898,6 +898,10 @@ core.notesRef::
 This setting defaults to "refs/notes/commits", and it can be overridden by
 the `GIT_NOTES_REF` environment variable.  See linkgit:git-notes[1].
 
+core.commitGraph::
+       Enable git commit graph feature. Allows reading from the
+       commit-graph file.
+
 core.sparseCheckout::
        Enable "sparse checkout" feature. See section "Sparse checkout" in
        linkgit:git-read-tree[1] for more information.
diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
new file mode 100644 (file)
index 0000000..4c97b55
--- /dev/null
@@ -0,0 +1,94 @@
+git-commit-graph(1)
+===================
+
+NAME
+----
+git-commit-graph - Write and verify Git commit graph files
+
+
+SYNOPSIS
+--------
+[verse]
+'git commit-graph read' [--object-dir <dir>]
+'git commit-graph write' <options> [--object-dir <dir>]
+
+
+DESCRIPTION
+-----------
+
+Manage the serialized commit graph file.
+
+
+OPTIONS
+-------
+--object-dir::
+       Use given directory for the location of packfiles and commit graph
+       file. This parameter exists to specify the location of an alternate
+       that only has the objects directory, not a full .git directory. The
+       commit graph file is expected to be at <dir>/info/commit-graph and
+       the packfiles are expected to be in <dir>/pack.
+
+
+COMMANDS
+--------
+'write'::
+
+Write a commit graph file based on the commits found in packfiles.
++
+With the `--stdin-packs` option, generate the new commit graph by
+walking objects only in the specified pack-indexes. (Cannot be combined
+with --stdin-commits.)
++
+With the `--stdin-commits` option, generate the new commit graph by
+walking commits starting at the commits specified in stdin as a list
+of OIDs in hex, one OID per line. (Cannot be combined with
+--stdin-packs.)
++
+With the `--append` option, include all commits that are present in the
+existing commit-graph file.
+
+'read'::
+
+Read a graph file given by the commit-graph file and output basic
+details about the graph file. Used for debugging purposes.
+
+
+EXAMPLES
+--------
+
+* Write a commit graph file for the packed commits in your local .git folder.
++
+------------------------------------------------
+$ git commit-graph write
+------------------------------------------------
+
+* Write a graph file, extending the current graph file using commits
+* in <pack-index>.
++
+------------------------------------------------
+$ echo <pack-index> | git commit-graph write --stdin-packs
+------------------------------------------------
+
+* Write a graph file containing all reachable commits.
++
+------------------------------------------------
+$ git show-ref -s | git commit-graph write --stdin-commits
+------------------------------------------------
+
+* Write a graph file containing all commits in the current
+* commit-graph file along with those reachable from HEAD.
++
+------------------------------------------------
+$ git rev-parse HEAD | git commit-graph write --stdin-commits --append
+------------------------------------------------
+
+* Read basic information from the commit-graph file.
++
+------------------------------------------------
+$ git commit-graph read
+------------------------------------------------
+
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt
new file mode 100644 (file)
index 0000000..ad6af81
--- /dev/null
@@ -0,0 +1,97 @@
+Git commit graph format
+=======================
+
+The Git commit graph stores a list of commit OIDs and some associated
+metadata, including:
+
+- The generation number of the commit. Commits with no parents have
+  generation number 1; commits with parents have generation number
+  one more than the maximum generation number of its parents. We
+  reserve zero as special, and can be used to mark a generation
+  number invalid or as "not computed".
+
+- The root tree OID.
+
+- The commit date.
+
+- The parents of the commit, stored using positional references within
+  the graph file.
+
+These positional references are stored as unsigned 32-bit integers
+corresponding to the array position withing the list of commit OIDs. We
+use the most-significant bit for special purposes, so we can store at most
+(1 << 31) - 1 (around 2 billion) commits.
+
+== Commit graph files have the following format:
+
+In order to allow extensions that add extra data to the graph, we organize
+the body into "chunks" and provide a binary lookup table at the beginning
+of the body. The header includes certain values, such as number of chunks
+and hash type.
+
+All 4-byte numbers are in network order.
+
+HEADER:
+
+  4-byte signature:
+      The signature is: {'C', 'G', 'P', 'H'}
+
+  1-byte version number:
+      Currently, the only valid version is 1.
+
+  1-byte Hash Version (1 = SHA-1)
+      We infer the hash length (H) from this value.
+
+  1-byte number (C) of "chunks"
+
+  1-byte (reserved for later use)
+     Current clients should ignore this value.
+
+CHUNK LOOKUP:
+
+  (C + 1) * 12 bytes listing the table of contents for the chunks:
+      First 4 bytes describe the chunk id. Value 0 is a terminating label.
+      Other 8 bytes provide the byte-offset in current file for chunk to
+      start. (Chunks are ordered contiguously in the file, so you can infer
+      the length using the next chunk position if necessary.) Each chunk
+      ID appears at most once.
+
+  The remaining data in the body is described one chunk at a time, and
+  these chunks may be given in any order. Chunks are required unless
+  otherwise specified.
+
+CHUNK DATA:
+
+  OID Fanout (ID: {'O', 'I', 'D', 'F'}) (256 * 4 bytes)
+      The ith entry, F[i], stores the number of OIDs with first
+      byte at most i. Thus F[255] stores the total
+      number of commits (N).
+
+  OID Lookup (ID: {'O', 'I', 'D', 'L'}) (N * H bytes)
+      The OIDs for all commits in the graph, sorted in ascending order.
+
+  Commit Data (ID: {'C', 'G', 'E', 'T' }) (N * (H + 16) bytes)
+    * The first H bytes are for the OID of the root tree.
+    * The next 8 bytes are for the positions of the first two parents
+      of the ith commit. Stores value 0xffffffff if no parent in that
+      position. If there are more than two parents, the second value
+      has its most-significant bit on and the other bits store an array
+      position into the Large Edge List chunk.
+    * The next 8 bytes store the generation number of the commit and
+      the commit time in seconds since EPOCH. The generation number
+      uses the higher 30 bits of the first 4 bytes, while the commit
+      time uses the 32 bits of the second 4 bytes, along with the lowest
+      2 bits of the lowest byte, storing the 33rd and 34th bit of the
+      commit time.
+
+  Large Edge List (ID: {'E', 'D', 'G', 'E'}) [Optional]
+      This list of 4-byte values store the second through nth parents for
+      all octopus merges. The second parent value in the commit data stores
+      an array position within this list along with the most-significant bit
+      on. Starting at that array position, iterate through this list of commit
+      positions for the parents until reaching a value with the most-significant
+      bit on. The other bits correspond to the position of the last parent.
+
+TRAILER:
+
+       H-byte HASH-checksum of all of the above.
diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt
new file mode 100644 (file)
index 0000000..0550c6d
--- /dev/null
@@ -0,0 +1,163 @@
+Git Commit Graph Design Notes
+=============================
+
+Git walks the commit graph for many reasons, including:
+
+1. Listing and filtering commit history.
+2. Computing merge bases.
+
+These operations can become slow as the commit count grows. The merge
+base calculation shows up in many user-facing commands, such as 'merge-base'
+or 'status' and can take minutes to compute depending on history shape.
+
+There are two main costs here:
+
+1. Decompressing and parsing commits.
+2. Walking the entire graph to satisfy topological order constraints.
+
+The commit graph file is a supplemental data structure that accelerates
+commit graph walks. If a user downgrades or disables the 'core.commitGraph'
+config setting, then the existing ODB is sufficient. The file is stored
+as "commit-graph" either in the .git/objects/info directory or in the info
+directory of an alternate.
+
+The commit graph file stores the commit graph structure along with some
+extra metadata to speed up graph walks. By listing commit OIDs in lexi-
+cographic order, we can identify an integer position for each commit and
+refer to the parents of a commit using those integer positions. We use
+binary search to find initial commits and then use the integer positions
+for fast lookups during the walk.
+
+A consumer may load the following info for a commit from the graph:
+
+1. The commit OID.
+2. The list of parents, along with their integer position.
+3. The commit date.
+4. The root tree OID.
+5. The generation number (see definition below).
+
+Values 1-4 satisfy the requirements of parse_commit_gently().
+
+Define the "generation number" of a commit recursively as follows:
+
+ * A commit with no parents (a root commit) has generation number one.
+
+ * A commit with at least one parent has generation number one more than
+   the largest generation number among its parents.
+
+Equivalently, the generation number of a commit A is one more than the
+length of a longest path from A to a root commit. The recursive definition
+is easier to use for computation and observing the following property:
+
+    If A and B are commits with generation numbers N and M, respectively,
+    and N <= M, then A cannot reach B. That is, we know without searching
+    that B is not an ancestor of A because it is further from a root commit
+    than A.
+
+    Conversely, when checking if A is an ancestor of B, then we only need
+    to walk commits until all commits on the walk boundary have generation
+    number at most N. If we walk commits using a priority queue seeded by
+    generation numbers, then we always expand the boundary commit with highest
+    generation number and can easily detect the stopping condition.
+
+This property can be used to significantly reduce the time it takes to
+walk commits and determine topological relationships. Without generation
+numbers, the general heuristic is the following:
+
+    If A and B are commits with commit time X and Y, respectively, and
+    X < Y, then A _probably_ cannot reach B.
+
+This heuristic is currently used whenever the computation is allowed to
+violate topological relationships due to clock skew (such as "git log"
+with default order), but is not used when the topological order is
+required (such as merge base calculations, "git log --graph").
+
+In practice, we expect some commits to be created recently and not stored
+in the commit graph. We can treat these commits as having "infinite"
+generation number and walk until reaching commits with known generation
+number.
+
+Design Details
+--------------
+
+- The commit graph file is stored in a file named 'commit-graph' in the
+  .git/objects/info directory. This could be stored in the info directory
+  of an alternate.
+
+- The core.commitGraph config setting must be on to consume graph files.
+
+- The file format includes parameters for the object ID hash function,
+  so a future change of hash algorithm does not require a change in format.
+
+Future Work
+-----------
+
+- The commit graph feature currently does not honor commit grafts. This can
+  be remedied by duplicating or refactoring the current graft logic.
+
+- The 'commit-graph' subcommand does not have a "verify" mode that is
+  necessary for integration with fsck.
+
+- The file format includes room for precomputed generation numbers. These
+  are not currently computed, so all generation numbers will be marked as
+  0 (or "uncomputed"). A later patch will include this calculation.
+
+- After computing and storing generation numbers, we must make graph
+  walks aware of generation numbers to gain the performance benefits they
+  enable. This will mostly be accomplished by swapping a commit-date-ordered
+  priority queue with one ordered by generation number. The following
+  operations are important candidates:
+
+    - paint_down_to_common()
+    - 'log --topo-order'
+
+- Currently, parse_commit_gently() requires filling in the root tree
+  object for a commit. This passes through lookup_tree() and consequently
+  lookup_object(). Also, it calls lookup_commit() when loading the parents.
+  These method calls check the ODB for object existence, even if the
+  consumer does not need the content. For example, we do not need the
+  tree contents when computing merge bases. Now that commit parsing is
+  removed from the computation time, these lookup operations are the
+  slowest operations keeping graph walks from being fast. Consider
+  loading these objects without verifying their existence in the ODB and
+  only loading them fully when consumers need them. Consider a method
+  such as "ensure_tree_loaded(commit)" that fully loads a tree before
+  using commit->tree.
+
+- The current design uses the 'commit-graph' subcommand to generate the graph.
+  When this feature stabilizes enough to recommend to most users, we should
+  add automatic graph writes to common operations that create many commits.
+  For example, one could compute a graph on 'clone', 'fetch', or 'repack'
+  commands.
+
+- A server could provide a commit graph file as part of the network protocol
+  to avoid extra calculations by clients. This feature is only of benefit if
+  the user is willing to trust the file, because verifying the file is correct
+  is as hard as computing it from scratch.
+
+Related Links
+-------------
+[0] https://bugs.chromium.org/p/git/issues/detail?id=8
+    Chromium work item for: Serialized Commit Graph
+
+[1] https://public-inbox.org/git/20110713070517.GC18566@sigill.intra.peff.net/
+    An abandoned patch that introduced generation numbers.
+
+[2] https://public-inbox.org/git/20170908033403.q7e6dj7benasrjes@sigill.intra.peff.net/
+    Discussion about generation numbers on commits and how they interact
+    with fsck.
+
+[3] https://public-inbox.org/git/20170908034739.4op3w4f2ma5s65ku@sigill.intra.peff.net/
+    More discussion about generation numbers and not storing them inside
+    commit objects. A valuable quote:
+
+    "I think we should be moving more in the direction of keeping
+     repo-local caches for optimizations. Reachability bitmaps have been
+     a big performance win. I think we should be doing the same with our
+     properties of commits. Not just generation numbers, but making it
+     cheap to access the graph structure without zlib-inflating whole
+     commit objects (i.e., packv4 or something like the "metapacks" I
+     proposed a few years ago)."
+
+[4] https://public-inbox.org/git/20180108154822.54829-1-git@jeffhostetler.com/T/#u
+    A patch to remove the ahead-behind calculation from 'status'.
index 46a215d..f89f2db 100644 (file)
--- a/Makefile
+++ b/Makefile
@@ -816,6 +816,7 @@ LIB_OBJS += color.o
 LIB_OBJS += column.o
 LIB_OBJS += combine-diff.o
 LIB_OBJS += commit.o
+LIB_OBJS += commit-graph.o
 LIB_OBJS += compat/obstack.o
 LIB_OBJS += compat/terminal.o
 LIB_OBJS += config.o
@@ -995,6 +996,7 @@ BUILTIN_OBJS += builtin/clone.o
 BUILTIN_OBJS += builtin/column.o
 BUILTIN_OBJS += builtin/commit-tree.o
 BUILTIN_OBJS += builtin/commit.o
+BUILTIN_OBJS += builtin/commit-graph.o
 BUILTIN_OBJS += builtin/config.o
 BUILTIN_OBJS += builtin/count-objects.o
 BUILTIN_OBJS += builtin/credential.o
diff --git a/alloc.c b/alloc.c
index 12afadf..cf4f8b6 100644 (file)
--- a/alloc.c
+++ b/alloc.c
@@ -93,6 +93,7 @@ void *alloc_commit_node(void)
        struct commit *c = alloc_node(&commit_state, sizeof(struct commit));
        c->object.type = OBJ_COMMIT;
        c->index = alloc_commit_index();
+       c->graph_pos = COMMIT_NOT_FROM_GRAPH;
        return c;
 }
 
index 3f3fdfc..4e0f647 100644 (file)
--- a/builtin.h
+++ b/builtin.h
@@ -149,6 +149,7 @@ extern int cmd_clone(int argc, const char **argv, const char *prefix);
 extern int cmd_clean(int argc, const char **argv, const char *prefix);
 extern int cmd_column(int argc, const char **argv, const char *prefix);
 extern int cmd_commit(int argc, const char **argv, const char *prefix);
+extern int cmd_commit_graph(int argc, const char **argv, const char *prefix);
 extern int cmd_commit_tree(int argc, const char **argv, const char *prefix);
 extern int cmd_config(int argc, const char **argv, const char *prefix);
 extern int cmd_count_objects(int argc, const char **argv, const char *prefix);
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
new file mode 100644 (file)
index 0000000..37420ae
--- /dev/null
@@ -0,0 +1,171 @@
+#include "builtin.h"
+#include "config.h"
+#include "dir.h"
+#include "lockfile.h"
+#include "parse-options.h"
+#include "commit-graph.h"
+
+static char const * const builtin_commit_graph_usage[] = {
+       N_("git commit-graph [--object-dir <objdir>]"),
+       N_("git commit-graph read [--object-dir <objdir>]"),
+       N_("git commit-graph write [--object-dir <objdir>] [--append] [--stdin-packs|--stdin-commits]"),
+       NULL
+};
+
+static const char * const builtin_commit_graph_read_usage[] = {
+       N_("git commit-graph read [--object-dir <objdir>]"),
+       NULL
+};
+
+static const char * const builtin_commit_graph_write_usage[] = {
+       N_("git commit-graph write [--object-dir <objdir>] [--append] [--stdin-packs|--stdin-commits]"),
+       NULL
+};
+
+static struct opts_commit_graph {
+       const char *obj_dir;
+       int stdin_packs;
+       int stdin_commits;
+       int append;
+} opts;
+
+static int graph_read(int argc, const char **argv)
+{
+       struct commit_graph *graph = NULL;
+       char *graph_name;
+
+       static struct option builtin_commit_graph_read_options[] = {
+               OPT_STRING(0, "object-dir", &opts.obj_dir,
+                       N_("dir"),
+                       N_("The object directory to store the graph")),
+               OPT_END(),
+       };
+
+       argc = parse_options(argc, argv, NULL,
+                            builtin_commit_graph_read_options,
+                            builtin_commit_graph_read_usage, 0);
+
+       if (!opts.obj_dir)
+               opts.obj_dir = get_object_directory();
+
+       graph_name = get_commit_graph_filename(opts.obj_dir);
+       graph = load_commit_graph_one(graph_name);
+
+       if (!graph)
+               die("graph file %s does not exist", graph_name);
+       FREE_AND_NULL(graph_name);
+
+       printf("header: %08x %d %d %d %d\n",
+               ntohl(*(uint32_t*)graph->data),
+               *(unsigned char*)(graph->data + 4),
+               *(unsigned char*)(graph->data + 5),
+               *(unsigned char*)(graph->data + 6),
+               *(unsigned char*)(graph->data + 7));
+       printf("num_commits: %u\n", graph->num_commits);
+       printf("chunks:");
+
+       if (graph->chunk_oid_fanout)
+               printf(" oid_fanout");
+       if (graph->chunk_oid_lookup)
+               printf(" oid_lookup");
+       if (graph->chunk_commit_data)
+               printf(" commit_metadata");
+       if (graph->chunk_large_edges)
+               printf(" large_edges");
+       printf("\n");
+
+       return 0;
+}
+
+static int graph_write(int argc, const char **argv)
+{
+       const char **pack_indexes = NULL;
+       int packs_nr = 0;
+       const char **commit_hex = NULL;
+       int commits_nr = 0;
+       const char **lines = NULL;
+       int lines_nr = 0;
+       int lines_alloc = 0;
+
+       static struct option builtin_commit_graph_write_options[] = {
+               OPT_STRING(0, "object-dir", &opts.obj_dir,
+                       N_("dir"),
+                       N_("The object directory to store the graph")),
+               OPT_BOOL(0, "stdin-packs", &opts.stdin_packs,
+                       N_("scan pack-indexes listed by stdin for commits")),
+               OPT_BOOL(0, "stdin-commits", &opts.stdin_commits,
+                       N_("start walk at commits listed by stdin")),
+               OPT_BOOL(0, "append", &opts.append,
+                       N_("include all commits already in the commit-graph file")),
+               OPT_END(),
+       };
+
+       argc = parse_options(argc, argv, NULL,
+                            builtin_commit_graph_write_options,
+                            builtin_commit_graph_write_usage, 0);
+
+       if (opts.stdin_packs && opts.stdin_commits)
+               die(_("cannot use both --stdin-commits and --stdin-packs"));
+       if (!opts.obj_dir)
+               opts.obj_dir = get_object_directory();
+
+       if (opts.stdin_packs || opts.stdin_commits) {
+               struct strbuf buf = STRBUF_INIT;
+               lines_nr = 0;
+               lines_alloc = 128;
+               ALLOC_ARRAY(lines, lines_alloc);
+
+               while (strbuf_getline(&buf, stdin) != EOF) {
+                       ALLOC_GROW(lines, lines_nr + 1, lines_alloc);
+                       lines[lines_nr++] = strbuf_detach(&buf, NULL);
+               }
+
+               if (opts.stdin_packs) {
+                       pack_indexes = lines;
+                       packs_nr = lines_nr;
+               }
+               if (opts.stdin_commits) {
+                       commit_hex = lines;
+                       commits_nr = lines_nr;
+               }
+       }
+
+       write_commit_graph(opts.obj_dir,
+                          pack_indexes,
+                          packs_nr,
+                          commit_hex,
+                          commits_nr,
+                          opts.append);
+
+       return 0;
+}
+
+int cmd_commit_graph(int argc, const char **argv, const char *prefix)
+{
+       static struct option builtin_commit_graph_options[] = {
+               OPT_STRING(0, "object-dir", &opts.obj_dir,
+                       N_("dir"),
+                       N_("The object directory to store the graph")),
+               OPT_END(),
+       };
+
+       if (argc == 2 && !strcmp(argv[1], "-h"))
+               usage_with_options(builtin_commit_graph_usage,
+                                  builtin_commit_graph_options);
+
+       git_config(git_default_config, NULL);
+       argc = parse_options(argc, argv, prefix,
+                            builtin_commit_graph_options,
+                            builtin_commit_graph_usage,
+                            PARSE_OPT_STOP_AT_NON_OPTION);
+
+       if (argc > 0) {
+               if (!strcmp(argv[0], "read"))
+                       return graph_read(argc, argv);
+               if (!strcmp(argv[0], "write"))
+                       return graph_write(argc, argv);
+       }
+
+       usage_with_options(builtin_commit_graph_usage,
+                          builtin_commit_graph_options);
+}
index 78e66b9..a2cd29d 100644 (file)
@@ -1271,7 +1271,7 @@ static void conclude_pack(int fix_thin_pack, const char *curr_pack, unsigned cha
                            nr_objects - nr_objects_initial);
                stop_progress_msg(&progress, msg.buf);
                strbuf_release(&msg);
-               hashclose(f, tail_hash, 0);
+               finalize_hashfile(f, tail_hash, 0);
                hashcpy(read_hash, pack_hash);
                fixup_pack_header_footer(output_fd, pack_hash,
                                         curr_pack, nr_objects,
index 4bdae5a..24b1c6c 100644 (file)
@@ -837,11 +837,11 @@ static void write_pack_file(void)
                 * If so, rewrite it like in fast-import
                 */
                if (pack_to_stdout) {
-                       hashclose(f, oid.hash, CSUM_CLOSE);
+                       finalize_hashfile(f, oid.hash, CSUM_HASH_IN_STREAM | CSUM_CLOSE);
                } else if (nr_written == nr_remaining) {
-                       hashclose(f, oid.hash, CSUM_FSYNC);
+                       finalize_hashfile(f, oid.hash, CSUM_HASH_IN_STREAM | CSUM_FSYNC | CSUM_CLOSE);
                } else {
-                       int fd = hashclose(f, oid.hash, 0);
+                       int fd = finalize_hashfile(f, oid.hash, 0);
                        fixup_pack_header_footer(fd, oid.hash, pack_tmp_name,
                                                 nr_written, oid.hash, offset);
                        close(fd);
index de1f404..c0bc8de 100644 (file)
@@ -36,9 +36,9 @@ static void finish_bulk_checkin(struct bulk_checkin_state *state)
                unlink(state->pack_tmp_name);
                goto clear_exit;
        } else if (state->nr_written == 1) {
-               hashclose(state->f, oid.hash, CSUM_FSYNC);
+               finalize_hashfile(state->f, oid.hash, CSUM_HASH_IN_STREAM | CSUM_FSYNC | CSUM_CLOSE);
        } else {
-               int fd = hashclose(state->f, oid.hash, 0);
+               int fd = finalize_hashfile(state->f, oid.hash, 0);
                fixup_pack_header_footer(fd, oid.hash, state->pack_tmp_name,
                                         state->nr_written, oid.hash,
                                         state->offset);
diff --git a/cache.h b/cache.h
index 6f14cfb..601a49c 100644 (file)
--- a/cache.h
+++ b/cache.h
@@ -806,6 +806,7 @@ extern char *git_replace_ref_base;
 
 extern int fsync_object_files;
 extern int core_preload_index;
+extern int core_commit_graph;
 extern int core_apply_sparse_checkout;
 extern int precomposed_unicode;
 extern int protect_hfs;
index a1fad28..835c589 100644 (file)
@@ -34,6 +34,7 @@ git-clean                               mainporcelain
 git-clone                               mainporcelain           init
 git-column                              purehelpers
 git-commit                              mainporcelain           history
+git-commit-graph                        plumbingmanipulators
 git-commit-tree                         plumbingmanipulators
 git-config                              ancillarymanipulators
 git-count-objects                       ancillaryinterrogators
diff --git a/commit-graph.c b/commit-graph.c
new file mode 100644 (file)
index 0000000..3fc1e0d
--- /dev/null
@@ -0,0 +1,741 @@
+#include "cache.h"
+#include "config.h"
+#include "git-compat-util.h"
+#include "lockfile.h"
+#include "pack.h"
+#include "packfile.h"
+#include "commit.h"
+#include "object.h"
+#include "revision.h"
+#include "sha1-lookup.h"
+#include "commit-graph.h"
+#include "object-store.h"
+
+#define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */
+#define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
+#define GRAPH_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
+#define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */
+#define GRAPH_CHUNKID_LARGEEDGES 0x45444745 /* "EDGE" */
+
+#define GRAPH_DATA_WIDTH 36
+
+#define GRAPH_VERSION_1 0x1
+#define GRAPH_VERSION GRAPH_VERSION_1
+
+#define GRAPH_OID_VERSION_SHA1 1
+#define GRAPH_OID_LEN_SHA1 GIT_SHA1_RAWSZ
+#define GRAPH_OID_VERSION GRAPH_OID_VERSION_SHA1
+#define GRAPH_OID_LEN GRAPH_OID_LEN_SHA1
+
+#define GRAPH_OCTOPUS_EDGES_NEEDED 0x80000000
+#define GRAPH_PARENT_MISSING 0x7fffffff
+#define GRAPH_EDGE_LAST_MASK 0x7fffffff
+#define GRAPH_PARENT_NONE 0x70000000
+
+#define GRAPH_LAST_EDGE 0x80000000
+
+#define GRAPH_FANOUT_SIZE (4 * 256)
+#define GRAPH_CHUNKLOOKUP_WIDTH 12
+#define GRAPH_MIN_SIZE (5 * GRAPH_CHUNKLOOKUP_WIDTH + GRAPH_FANOUT_SIZE + \
+                       GRAPH_OID_LEN + 8)
+
+char *get_commit_graph_filename(const char *obj_dir)
+{
+       return xstrfmt("%s/info/commit-graph", obj_dir);
+}
+
+static struct commit_graph *alloc_commit_graph(void)
+{
+       struct commit_graph *g = xcalloc(1, sizeof(*g));
+       g->graph_fd = -1;
+
+       return g;
+}
+
+struct commit_graph *load_commit_graph_one(const char *graph_file)
+{
+       void *graph_map;
+       const unsigned char *data, *chunk_lookup;
+       size_t graph_size;
+       struct stat st;
+       uint32_t i;
+       struct commit_graph *graph;
+       int fd = git_open(graph_file);
+       uint64_t last_chunk_offset;
+       uint32_t last_chunk_id;
+       uint32_t graph_signature;
+       unsigned char graph_version, hash_version;
+
+       if (fd < 0)
+               return NULL;
+       if (fstat(fd, &st)) {
+               close(fd);
+               return NULL;
+       }
+       graph_size = xsize_t(st.st_size);
+
+       if (graph_size < GRAPH_MIN_SIZE) {
+               close(fd);
+               die("graph file %s is too small", graph_file);
+       }
+       graph_map = xmmap(NULL, graph_size, PROT_READ, MAP_PRIVATE, fd, 0);
+       data = (const unsigned char *)graph_map;
+
+       graph_signature = get_be32(data);
+       if (graph_signature != GRAPH_SIGNATURE) {
+               error("graph signature %X does not match signature %X",
+                     graph_signature, GRAPH_SIGNATURE);
+               goto cleanup_fail;
+       }
+
+       graph_version = *(unsigned char*)(data + 4);
+       if (graph_version != GRAPH_VERSION) {
+               error("graph version %X does not match version %X",
+                     graph_version, GRAPH_VERSION);
+               goto cleanup_fail;
+       }
+
+       hash_version = *(unsigned char*)(data + 5);
+       if (hash_version != GRAPH_OID_VERSION) {
+               error("hash version %X does not match version %X",
+                     hash_version, GRAPH_OID_VERSION);
+               goto cleanup_fail;
+       }
+
+       graph = alloc_commit_graph();
+
+       graph->hash_len = GRAPH_OID_LEN;
+       graph->num_chunks = *(unsigned char*)(data + 6);
+       graph->graph_fd = fd;
+       graph->data = graph_map;
+       graph->data_len = graph_size;
+
+       last_chunk_id = 0;
+       last_chunk_offset = 8;
+       chunk_lookup = data + 8;
+       for (i = 0; i < graph->num_chunks; i++) {
+               uint32_t chunk_id = get_be32(chunk_lookup + 0);
+               uint64_t chunk_offset = get_be64(chunk_lookup + 4);
+               int chunk_repeated = 0;
+
+               chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH;
+
+               if (chunk_offset > graph_size - GIT_MAX_RAWSZ) {
+                       error("improper chunk offset %08x%08x", (uint32_t)(chunk_offset >> 32),
+                             (uint32_t)chunk_offset);
+                       goto cleanup_fail;
+               }
+
+               switch (chunk_id) {
+               case GRAPH_CHUNKID_OIDFANOUT:
+                       if (graph->chunk_oid_fanout)
+                               chunk_repeated = 1;
+                       else
+                               graph->chunk_oid_fanout = (uint32_t*)(data + chunk_offset);
+                       break;
+
+               case GRAPH_CHUNKID_OIDLOOKUP:
+                       if (graph->chunk_oid_lookup)
+                               chunk_repeated = 1;
+                       else
+                               graph->chunk_oid_lookup = data + chunk_offset;
+                       break;
+
+               case GRAPH_CHUNKID_DATA:
+                       if (graph->chunk_commit_data)
+                               chunk_repeated = 1;
+                       else
+                               graph->chunk_commit_data = data + chunk_offset;
+                       break;
+
+               case GRAPH_CHUNKID_LARGEEDGES:
+                       if (graph->chunk_large_edges)
+                               chunk_repeated = 1;
+                       else
+                               graph->chunk_large_edges = data + chunk_offset;
+                       break;
+               }
+
+               if (chunk_repeated) {
+                       error("chunk id %08x appears multiple times", chunk_id);
+                       goto cleanup_fail;
+               }
+
+               if (last_chunk_id == GRAPH_CHUNKID_OIDLOOKUP)
+               {
+                       graph->num_commits = (chunk_offset - last_chunk_offset)
+                                            / graph->hash_len;
+               }
+
+               last_chunk_id = chunk_id;
+               last_chunk_offset = chunk_offset;
+       }
+
+       return graph;
+
+cleanup_fail:
+       munmap(graph_map, graph_size);
+       close(fd);
+       exit(1);
+}
+
+/* global storage */
+static struct commit_graph *commit_graph = NULL;
+
+static void prepare_commit_graph_one(const char *obj_dir)
+{
+       char *graph_name;
+
+       if (commit_graph)
+               return;
+
+       graph_name = get_commit_graph_filename(obj_dir);
+       commit_graph = load_commit_graph_one(graph_name);
+
+       FREE_AND_NULL(graph_name);
+}
+
+static int prepare_commit_graph_run_once = 0;
+static void prepare_commit_graph(void)
+{
+       struct alternate_object_database *alt;
+       char *obj_dir;
+
+       if (prepare_commit_graph_run_once)
+               return;
+       prepare_commit_graph_run_once = 1;
+
+       obj_dir = get_object_directory();
+       prepare_commit_graph_one(obj_dir);
+       prepare_alt_odb(the_repository);
+       for (alt = the_repository->objects->alt_odb_list;
+            !commit_graph && alt;
+            alt = alt->next)
+               prepare_commit_graph_one(alt->path);
+}
+
+static void close_commit_graph(void)
+{
+       if (!commit_graph)
+               return;
+
+       if (commit_graph->graph_fd >= 0) {
+               munmap((void *)commit_graph->data, commit_graph->data_len);
+               commit_graph->data = NULL;
+               close(commit_graph->graph_fd);
+       }
+
+       FREE_AND_NULL(commit_graph);
+}
+
+static int bsearch_graph(struct commit_graph *g, struct object_id *oid, uint32_t *pos)
+{
+       return bsearch_hash(oid->hash, g->chunk_oid_fanout,
+                           g->chunk_oid_lookup, g->hash_len, pos);
+}
+
+static struct commit_list **insert_parent_or_die(struct commit_graph *g,
+                                                uint64_t pos,
+                                                struct commit_list **pptr)
+{
+       struct commit *c;
+       struct object_id oid;
+       hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos);
+       c = lookup_commit(&oid);
+       if (!c)
+               die("could not find commit %s", oid_to_hex(&oid));
+       c->graph_pos = pos;
+       return &commit_list_insert(c, pptr)->next;
+}
+
+static int fill_commit_in_graph(struct commit *item, struct commit_graph *g, uint32_t pos)
+{
+       struct object_id oid;
+       uint32_t edge_value;
+       uint32_t *parent_data_ptr;
+       uint64_t date_low, date_high;
+       struct commit_list **pptr;
+       const unsigned char *commit_data = g->chunk_commit_data + (g->hash_len + 16) * pos;
+
+       item->object.parsed = 1;
+       item->graph_pos = pos;
+
+       hashcpy(oid.hash, commit_data);
+       item->tree = lookup_tree(&oid);
+
+       date_high = get_be32(commit_data + g->hash_len + 8) & 0x3;
+       date_low = get_be32(commit_data + g->hash_len + 12);
+       item->date = (timestamp_t)((date_high << 32) | date_low);
+
+       pptr = &item->parents;
+
+       edge_value = get_be32(commit_data + g->hash_len);
+       if (edge_value == GRAPH_PARENT_NONE)
+               return 1;
+       pptr = insert_parent_or_die(g, edge_value, pptr);
+
+       edge_value = get_be32(commit_data + g->hash_len + 4);
+       if (edge_value == GRAPH_PARENT_NONE)
+               return 1;
+       if (!(edge_value & GRAPH_OCTOPUS_EDGES_NEEDED)) {
+               pptr = insert_parent_or_die(g, edge_value, pptr);
+               return 1;
+       }
+
+       parent_data_ptr = (uint32_t*)(g->chunk_large_edges +
+                         4 * (uint64_t)(edge_value & GRAPH_EDGE_LAST_MASK));
+       do {
+               edge_value = get_be32(parent_data_ptr);
+               pptr = insert_parent_or_die(g,
+                                           edge_value & GRAPH_EDGE_LAST_MASK,
+                                           pptr);
+               parent_data_ptr++;
+       } while (!(edge_value & GRAPH_LAST_EDGE));
+
+       return 1;
+}
+
+int parse_commit_in_graph(struct commit *item)
+{
+       if (!core_commit_graph)
+               return 0;
+       if (item->object.parsed)
+               return 1;
+
+       prepare_commit_graph();
+       if (commit_graph) {
+               uint32_t pos;
+               int found;
+               if (item->graph_pos != COMMIT_NOT_FROM_GRAPH) {
+                       pos = item->graph_pos;
+                       found = 1;
+               } else {
+                       found = bsearch_graph(commit_graph, &(item->object.oid), &pos);
+               }
+
+               if (found)
+                       return fill_commit_in_graph(item, commit_graph, pos);
+       }
+
+       return 0;
+}
+
+static void write_graph_chunk_fanout(struct hashfile *f,
+                                    struct commit **commits,
+                                    int nr_commits)
+{
+       int i, count = 0;
+       struct commit **list = commits;
+
+       /*
+        * Write the first-level table (the list is sorted,
+        * but we use a 256-entry lookup to be able to avoid
+        * having to do eight extra binary search iterations).
+        */
+       for (i = 0; i < 256; i++) {
+               while (count < nr_commits) {
+                       if ((*list)->object.oid.hash[0] != i)
+                               break;
+                       count++;
+                       list++;
+               }
+
+               hashwrite_be32(f, count);
+       }
+}
+
+static void write_graph_chunk_oids(struct hashfile *f, int hash_len,
+                                  struct commit **commits, int nr_commits)
+{
+       struct commit **list = commits;
+       int count;
+       for (count = 0; count < nr_commits; count++, list++)
+               hashwrite(f, (*list)->object.oid.hash, (int)hash_len);
+}
+
+static const unsigned char *commit_to_sha1(size_t index, void *table)
+{
+       struct commit **commits = table;
+       return commits[index]->object.oid.hash;
+}
+
+static void write_graph_chunk_data(struct hashfile *f, int hash_len,
+                                  struct commit **commits, int nr_commits)
+{
+       struct commit **list = commits;
+       struct commit **last = commits + nr_commits;
+       uint32_t num_extra_edges = 0;
+
+       while (list < last) {
+               struct commit_list *parent;
+               int edge_value;
+               uint32_t packedDate[2];
+
+               parse_commit(*list);
+               hashwrite(f, (*list)->tree->object.oid.hash, hash_len);
+
+               parent = (*list)->parents;
+
+               if (!parent)
+                       edge_value = GRAPH_PARENT_NONE;
+               else {
+                       edge_value = sha1_pos(parent->item->object.oid.hash,
+                                             commits,
+                                             nr_commits,
+                                             commit_to_sha1);
+
+                       if (edge_value < 0)
+                               edge_value = GRAPH_PARENT_MISSING;
+               }
+
+               hashwrite_be32(f, edge_value);
+
+               if (parent)
+                       parent = parent->next;
+
+               if (!parent)
+                       edge_value = GRAPH_PARENT_NONE;
+               else if (parent->next)
+                       edge_value = GRAPH_OCTOPUS_EDGES_NEEDED | num_extra_edges;
+               else {
+                       edge_value = sha1_pos(parent->item->object.oid.hash,
+                                             commits,
+                                             nr_commits,
+                                             commit_to_sha1);
+                       if (edge_value < 0)
+                               edge_value = GRAPH_PARENT_MISSING;
+               }
+
+               hashwrite_be32(f, edge_value);
+
+               if (edge_value & GRAPH_OCTOPUS_EDGES_NEEDED) {
+                       do {
+                               num_extra_edges++;
+                               parent = parent->next;
+                       } while (parent);
+               }
+
+               if (sizeof((*list)->date) > 4)
+                       packedDate[0] = htonl(((*list)->date >> 32) & 0x3);
+               else
+                       packedDate[0] = 0;
+
+               packedDate[1] = htonl((*list)->date);
+               hashwrite(f, packedDate, 8);
+
+               list++;
+       }
+}
+
+static void write_graph_chunk_large_edges(struct hashfile *f,
+                                         struct commit **commits,
+                                         int nr_commits)
+{
+       struct commit **list = commits;
+       struct commit **last = commits + nr_commits;
+       struct commit_list *parent;
+
+       while (list < last) {
+               int num_parents = 0;
+               for (parent = (*list)->parents; num_parents < 3 && parent;
+                    parent = parent->next)
+                       num_parents++;
+
+               if (num_parents <= 2) {
+                       list++;
+                       continue;
+               }
+
+               /* Since num_parents > 2, this initializer is safe. */
+               for (parent = (*list)->parents->next; parent; parent = parent->next) {
+                       int edge_value = sha1_pos(parent->item->object.oid.hash,
+                                                 commits,
+                                                 nr_commits,
+                                                 commit_to_sha1);
+
+                       if (edge_value < 0)
+                               edge_value = GRAPH_PARENT_MISSING;
+                       else if (!parent->next)
+                               edge_value |= GRAPH_LAST_EDGE;
+
+                       hashwrite_be32(f, edge_value);
+               }
+
+               list++;
+       }
+}
+
+static int commit_compare(const void *_a, const void *_b)
+{
+       const struct object_id *a = (const struct object_id *)_a;
+       const struct object_id *b = (const struct object_id *)_b;
+       return oidcmp(a, b);
+}
+
+struct packed_commit_list {
+       struct commit **list;
+       int nr;
+       int alloc;
+};
+
+struct packed_oid_list {
+       struct object_id *list;
+       int nr;
+       int alloc;
+};
+
+static int add_packed_commits(const struct object_id *oid,
+                             struct packed_git *pack,
+                             uint32_t pos,
+                             void *data)
+{
+       struct packed_oid_list *list = (struct packed_oid_list*)data;
+       enum object_type type;
+       off_t offset = nth_packed_object_offset(pack, pos);
+       struct object_info oi = OBJECT_INFO_INIT;
+
+       oi.typep = &type;
+       if (packed_object_info(pack, offset, &oi) < 0)
+               die("unable to get type of object %s", oid_to_hex(oid));
+
+       if (type != OBJ_COMMIT)
+               return 0;
+
+       ALLOC_GROW(list->list, list->nr + 1, list->alloc);
+       oidcpy(&(list->list[list->nr]), oid);
+       list->nr++;
+
+       return 0;
+}
+
+static void add_missing_parents(struct packed_oid_list *oids, struct commit *commit)
+{
+       struct commit_list *parent;
+       for (parent = commit->parents; parent; parent = parent->next) {
+               if (!(parent->item->object.flags & UNINTERESTING)) {
+                       ALLOC_GROW(oids->list, oids->nr + 1, oids->alloc);
+                       oidcpy(&oids->list[oids->nr], &(parent->item->object.oid));
+                       oids->nr++;
+                       parent->item->object.flags |= UNINTERESTING;
+               }
+       }
+}
+
+static void close_reachable(struct packed_oid_list *oids)
+{
+       int i;
+       struct commit *commit;
+
+       for (i = 0; i < oids->nr; i++) {
+               commit = lookup_commit(&oids->list[i]);
+               if (commit)
+                       commit->object.flags |= UNINTERESTING;
+       }
+
+       /*
+        * As this loop runs, oids->nr may grow, but not more
+        * than the number of missing commits in the reachable
+        * closure.
+        */
+       for (i = 0; i < oids->nr; i++) {
+               commit = lookup_commit(&oids->list[i]);
+
+               if (commit && !parse_commit(commit))
+                       add_missing_parents(oids, commit);
+       }
+
+       for (i = 0; i < oids->nr; i++) {
+               commit = lookup_commit(&oids->list[i]);
+
+               if (commit)
+                       commit->object.flags &= ~UNINTERESTING;
+       }
+}
+
+void write_commit_graph(const char *obj_dir,
+                       const char **pack_indexes,
+                       int nr_packs,
+                       const char **commit_hex,
+                       int nr_commits,
+                       int append)
+{
+       struct packed_oid_list oids;
+       struct packed_commit_list commits;
+       struct hashfile *f;
+       uint32_t i, count_distinct = 0;
+       char *graph_name;
+       int fd;
+       struct lock_file lk = LOCK_INIT;
+       uint32_t chunk_ids[5];
+       uint64_t chunk_offsets[5];
+       int num_chunks;
+       int num_extra_edges;
+       struct commit_list *parent;
+
+       oids.nr = 0;
+       oids.alloc = approximate_object_count() / 4;
+
+       if (append) {
+               prepare_commit_graph_one(obj_dir);
+               if (commit_graph)
+                       oids.alloc += commit_graph->num_commits;
+       }
+
+       if (oids.alloc < 1024)
+               oids.alloc = 1024;
+       ALLOC_ARRAY(oids.list, oids.alloc);
+
+       if (append && commit_graph) {
+               for (i = 0; i < commit_graph->num_commits; i++) {
+                       const unsigned char *hash = commit_graph->chunk_oid_lookup +
+                               commit_graph->hash_len * i;
+                       hashcpy(oids.list[oids.nr++].hash, hash);
+               }
+       }
+
+       if (pack_indexes) {
+               struct strbuf packname = STRBUF_INIT;
+               int dirlen;
+               strbuf_addf(&packname, "%s/pack/", obj_dir);
+               dirlen = packname.len;
+               for (i = 0; i < nr_packs; i++) {
+                       struct packed_git *p;
+                       strbuf_setlen(&packname, dirlen);
+                       strbuf_addstr(&packname, pack_indexes[i]);
+                       p = add_packed_git(packname.buf, packname.len, 1);
+                       if (!p)
+                               die("error adding pack %s", packname.buf);
+                       if (open_pack_index(p))
+                               die("error opening index for %s", packname.buf);
+                       for_each_object_in_pack(p, add_packed_commits, &oids);
+                       close_pack(p);
+               }
+               strbuf_release(&packname);
+       }
+
+       if (commit_hex) {
+               for (i = 0; i < nr_commits; i++) {
+                       const char *end;
+                       struct object_id oid;
+                       struct commit *result;
+
+                       if (commit_hex[i] && parse_oid_hex(commit_hex[i], &oid, &end))
+                               continue;
+
+                       result = lookup_commit_reference_gently(&oid, 1);
+
+                       if (result) {
+                               ALLOC_GROW(oids.list, oids.nr + 1, oids.alloc);
+                               oidcpy(&oids.list[oids.nr], &(result->object.oid));
+                               oids.nr++;
+                       }
+               }
+       }
+
+       if (!pack_indexes && !commit_hex)
+               for_each_packed_object(add_packed_commits, &oids, 0);
+
+       close_reachable(&oids);
+
+       QSORT(oids.list, oids.nr, commit_compare);
+
+       count_distinct = 1;
+       for (i = 1; i < oids.nr; i++) {
+               if (oidcmp(&oids.list[i-1], &oids.list[i]))
+                       count_distinct++;
+       }
+
+       if (count_distinct >= GRAPH_PARENT_MISSING)
+               die(_("the commit graph format cannot write %d commits"), count_distinct);
+
+       commits.nr = 0;
+       commits.alloc = count_distinct;
+       ALLOC_ARRAY(commits.list, commits.alloc);
+
+       num_extra_edges = 0;
+       for (i = 0; i < oids.nr; i++) {
+               int num_parents = 0;
+               if (i > 0 && !oidcmp(&oids.list[i-1], &oids.list[i]))
+                       continue;
+
+               commits.list[commits.nr] = lookup_commit(&oids.list[i]);
+               parse_commit(commits.list[commits.nr]);
+
+               for (parent = commits.list[commits.nr]->parents;
+                    parent; parent = parent->next)
+                       num_parents++;
+
+               if (num_parents > 2)
+                       num_extra_edges += num_parents - 1;
+
+               commits.nr++;
+       }
+       num_chunks = num_extra_edges ? 4 : 3;
+
+       if (commits.nr >= GRAPH_PARENT_MISSING)
+               die(_("too many commits to write graph"));
+
+       graph_name = get_commit_graph_filename(obj_dir);
+       fd = hold_lock_file_for_update(&lk, graph_name, 0);
+
+       if (fd < 0) {
+               struct strbuf folder = STRBUF_INIT;
+               strbuf_addstr(&folder, graph_name);
+               strbuf_setlen(&folder, strrchr(folder.buf, '/') - folder.buf);
+
+               if (mkdir(folder.buf, 0777) < 0)
+                       die_errno(_("cannot mkdir %s"), folder.buf);
+               strbuf_release(&folder);
+
+               fd = hold_lock_file_for_update(&lk, graph_name, LOCK_DIE_ON_ERROR);
+
+               if (fd < 0)
+                       die_errno("unable to create '%s'", graph_name);
+       }
+
+       f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf);
+
+       hashwrite_be32(f, GRAPH_SIGNATURE);
+
+       hashwrite_u8(f, GRAPH_VERSION);
+       hashwrite_u8(f, GRAPH_OID_VERSION);
+       hashwrite_u8(f, num_chunks);
+       hashwrite_u8(f, 0); /* unused padding byte */
+
+       chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT;
+       chunk_ids[1] = GRAPH_CHUNKID_OIDLOOKUP;
+       chunk_ids[2] = GRAPH_CHUNKID_DATA;
+       if (num_extra_edges)
+               chunk_ids[3] = GRAPH_CHUNKID_LARGEEDGES;
+       else
+               chunk_ids[3] = 0;
+       chunk_ids[4] = 0;
+
+       chunk_offsets[0] = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH;
+       chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE;
+       chunk_offsets[2] = chunk_offsets[1] + GRAPH_OID_LEN * commits.nr;
+       chunk_offsets[3] = chunk_offsets[2] + (GRAPH_OID_LEN + 16) * commits.nr;
+       chunk_offsets[4] = chunk_offsets[3] + 4 * num_extra_edges;
+
+       for (i = 0; i <= num_chunks; i++) {
+               uint32_t chunk_write[3];
+
+               chunk_write[0] = htonl(chunk_ids[i]);
+               chunk_write[1] = htonl(chunk_offsets[i] >> 32);
+               chunk_write[2] = htonl(chunk_offsets[i] & 0xffffffff);
+               hashwrite(f, chunk_write, 12);
+       }
+
+       write_graph_chunk_fanout(f, commits.list, commits.nr);
+       write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr);
+       write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr);
+       write_graph_chunk_large_edges(f, commits.list, commits.nr);
+
+       close_commit_graph();
+       finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
+       commit_lock_file(&lk);
+
+       free(oids.list);
+       oids.alloc = 0;
+       oids.nr = 0;
+}
diff --git a/commit-graph.h b/commit-graph.h
new file mode 100644 (file)
index 0000000..e1d8580
--- /dev/null
@@ -0,0 +1,46 @@
+#ifndef COMMIT_GRAPH_H
+#define COMMIT_GRAPH_H
+
+#include "git-compat-util.h"
+
+char *get_commit_graph_filename(const char *obj_dir);
+
+/*
+ * Given a commit struct, try to fill the commit struct info, including:
+ *  1. tree object
+ *  2. date
+ *  3. parents.
+ *
+ * Returns 1 if and only if the commit was found in the packed graph.
+ *
+ * See parse_commit_buffer() for the fallback after this call.
+ */
+int parse_commit_in_graph(struct commit *item);
+
+struct commit_graph {
+       int graph_fd;
+
+       const unsigned char *data;
+       size_t data_len;
+
+       unsigned char hash_len;
+       unsigned char num_chunks;
+       uint32_t num_commits;
+       struct object_id oid;
+
+       const uint32_t *chunk_oid_fanout;
+       const unsigned char *chunk_oid_lookup;
+       const unsigned char *chunk_commit_data;
+       const unsigned char *chunk_large_edges;
+};
+
+struct commit_graph *load_commit_graph_one(const char *graph_file);
+
+void write_commit_graph(const char *obj_dir,
+                       const char **pack_indexes,
+                       int nr_packs,
+                       const char **commit_hex,
+                       int nr_commits,
+                       int append);
+
+#endif
index ca474a7..5704911 100644 (file)
--- a/commit.c
+++ b/commit.c
@@ -1,6 +1,7 @@
 #include "cache.h"
 #include "tag.h"
 #include "commit.h"
+#include "commit-graph.h"
 #include "pkt-line.h"
 #include "utf8.h"
 #include "diff.h"
@@ -383,6 +384,8 @@ int parse_commit_gently(struct commit *item, int quiet_on_missing)
                return -1;
        if (item->object.parsed)
                return 0;
+       if (parse_commit_in_graph(item))
+               return 0;
        buffer = read_object_file(&item->object.oid, &type, &size);
        if (!buffer)
                return quiet_on_missing ? -1 :
index 0fb8271..e57ae4b 100644 (file)
--- a/commit.h
+++ b/commit.h
@@ -9,6 +9,8 @@
 #include "string-list.h"
 #include "pretty.h"
 
+#define COMMIT_NOT_FROM_GRAPH 0xFFFFFFFF
+
 struct commit_list {
        struct commit *item;
        struct commit_list *next;
@@ -21,6 +23,7 @@ struct commit {
        timestamp_t date;
        struct commit_list *parents;
        struct tree *tree;
+       uint32_t graph_pos;
 };
 
 extern int save_commit_buffer;
index 5687e2d..40d46b3 100644 (file)
--- a/config.c
+++ b/config.c
@@ -1293,6 +1293,11 @@ static int git_default_core_config(const char *var, const char *value)
                return 0;
        }
 
+       if (!strcmp(var, "core.commitgraph")) {
+               core_commit_graph = git_config_bool(var, value);
+               return 0;
+       }
+
        if (!strcmp(var, "core.sparsecheckout")) {
                core_apply_sparse_checkout = git_config_bool(var, value);
                return 0;
index 01dd9ff..86a13fc 100644 (file)
@@ -875,6 +875,7 @@ __git_list_porcelain_commands ()
                check-ref-format) : plumbing;;
                checkout-index)   : plumbing;;
                column)           : internal helper;;
+               commit-graph)     : plumbing;;
                commit-tree)      : plumbing;;
                count-objects)    : infrequent;;
                credential)       : credentials;;
@@ -2345,6 +2346,7 @@ _git_config ()
                core.bigFileThreshold
                core.checkStat
                core.commentChar
+               core.commitGraph
                core.compression
                core.createObject
                core.deltaBaseCacheLimit
index 5eda7fb..53ce37f 100644 (file)
@@ -53,7 +53,7 @@ void hashflush(struct hashfile *f)
        }
 }
 
-int hashclose(struct hashfile *f, unsigned char *result, unsigned int flags)
+int finalize_hashfile(struct hashfile *f, unsigned char *result, unsigned int flags)
 {
        int fd;
 
@@ -61,11 +61,11 @@ int hashclose(struct hashfile *f, unsigned char *result, unsigned int flags)
        the_hash_algo->final_fn(f->buffer, &f->ctx);
        if (result)
                hashcpy(result, f->buffer);
-       if (flags & (CSUM_CLOSE | CSUM_FSYNC)) {
-               /* write checksum and close fd */
+       if (flags & CSUM_HASH_IN_STREAM)
                flush(f, f->buffer, the_hash_algo->rawsz);
-               if (flags & CSUM_FSYNC)
-                       fsync_or_die(f->fd, f->name);
+       if (flags & CSUM_FSYNC)
+               fsync_or_die(f->fd, f->name);
+       if (flags & CSUM_CLOSE) {
                if (close(f->fd))
                        die_errno("%s: sha1 file error on close", f->name);
                fd = 0;
index 992e5c0..c5a2e33 100644 (file)
@@ -26,14 +26,15 @@ struct hashfile_checkpoint {
 extern void hashfile_checkpoint(struct hashfile *, struct hashfile_checkpoint *);
 extern int hashfile_truncate(struct hashfile *, struct hashfile_checkpoint *);
 
-/* hashclose flags */
-#define CSUM_CLOSE     1
-#define CSUM_FSYNC     2
+/* finalize_hashfile flags */
+#define CSUM_CLOSE             1
+#define CSUM_FSYNC             2
+#define CSUM_HASH_IN_STREAM    4
 
 extern struct hashfile *hashfd(int fd, const char *name);
 extern struct hashfile *hashfd_check(const char *name);
 extern struct hashfile *hashfd_throughput(int fd, const char *name, struct progress *tp);
-extern int hashclose(struct hashfile *, unsigned char *, unsigned int);
+extern int finalize_hashfile(struct hashfile *, unsigned char *, unsigned int);
 extern void hashwrite(struct hashfile *, const void *, unsigned int);
 extern void hashflush(struct hashfile *f);
 extern void crc32_begin(struct hashfile *);
index fd970b8..80542e2 100644 (file)
@@ -65,6 +65,7 @@ enum push_default_type push_default = PUSH_DEFAULT_UNSPECIFIED;
 enum object_creation_mode object_creation_mode = OBJECT_CREATION_MODE;
 char *notes_ref_name;
 int grafts_replace_parents = 1;
+int core_commit_graph;
 int core_apply_sparse_checkout;
 int merge_log_config = -1;
 int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
index 05d1079..34edf3f 100644 (file)
@@ -973,7 +973,7 @@ static void end_packfile(void)
                struct tag *t;
 
                close_pack_windows(pack_data);
-               hashclose(pack_file, cur_pack_oid.hash, 0);
+               finalize_hashfile(pack_file, cur_pack_oid.hash, 0);
                fixup_pack_header_footer(pack_data->pack_fd, pack_data->sha1,
                                    pack_data->pack_name, object_count,
                                    cur_pack_oid.hash, pack_size);
diff --git a/git.c b/git.c
index 70798f6..bab6bbf 100644 (file)
--- a/git.c
+++ b/git.c
@@ -392,6 +392,7 @@ static struct cmd_struct commands[] = {
        { "clone", cmd_clone },
        { "column", cmd_column, RUN_SETUP_GENTLY },
        { "commit", cmd_commit, RUN_SETUP | NEED_WORK_TREE },
+       { "commit-graph", cmd_commit_graph, RUN_SETUP },
        { "commit-tree", cmd_commit_tree, RUN_SETUP | NO_PARSEOPT },
        { "config", cmd_config, RUN_SETUP_GENTLY | DELAY_PAGER_CONFIG },
        { "count-objects", cmd_count_objects, RUN_SETUP },
index 41ae27f..9d1bb05 100644 (file)
@@ -534,7 +534,7 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
        if (options & BITMAP_OPT_HASH_CACHE)
                write_hash_cache(f, index, index_nr);
 
-       hashclose(f, NULL, CSUM_FSYNC);
+       finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC | CSUM_CLOSE);
 
        if (adjust_shared_perm(tmp_file.buf))
                die_errno("unable to make temporary bitmap file readable");
index d775c74..a9d46bc 100644 (file)
@@ -170,8 +170,9 @@ const char *write_idx_file(const char *index_name, struct pack_idx_entry **objec
        }
 
        hashwrite(f, sha1, the_hash_algo->rawsz);
-       hashclose(f, NULL, ((opts->flags & WRITE_IDX_VERIFY)
-                           ? CSUM_CLOSE : CSUM_FSYNC));
+       finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_CLOSE |
+                                   ((opts->flags & WRITE_IDX_VERIFY)
+                                   ? 0 : CSUM_FSYNC));
        return index_name;
 }
 
index 0bc67d0..6c3ddc3 100644 (file)
@@ -304,7 +304,7 @@ void close_pack_index(struct packed_git *p)
        }
 }
 
-static void close_pack(struct packed_git *p)
+void close_pack(struct packed_git *p)
 {
        close_pack_windows(p);
        close_pack_fd(p);
@@ -1869,7 +1869,7 @@ int has_pack_index(const unsigned char *sha1)
        return 1;
 }
 
-static int for_each_object_in_pack(struct packed_git *p, each_packed_object_fn cb, void *data)
+int for_each_object_in_pack(struct packed_git *p, each_packed_object_fn cb, void *data)
 {
        uint32_t i;
        int r = 0;
index a92c0b2..9c2f885 100644 (file)
@@ -65,6 +65,7 @@ extern void close_pack_index(struct packed_git *);
 
 extern unsigned char *use_pack(struct packed_git *, struct pack_window **, off_t, unsigned long *);
 extern void close_pack_windows(struct packed_git *);
+extern void close_pack(struct packed_git *);
 extern void close_all_packs(struct raw_object_store *o);
 extern void unuse_pack(struct pack_window **);
 extern void clear_delta_base_cache(void);
@@ -154,6 +155,7 @@ typedef int each_packed_object_fn(const struct object_id *oid,
                                  struct packed_git *pack,
                                  uint32_t pos,
                                  void *data);
+extern int for_each_object_in_pack(struct packed_git *p, each_packed_object_fn, void *data);
 extern int for_each_packed_object(each_packed_object_fn, void *, unsigned flags);
 
 /*
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
new file mode 100755 (executable)
index 0000000..a380419
--- /dev/null
@@ -0,0 +1,224 @@
+#!/bin/sh
+
+test_description='commit graph'
+. ./test-lib.sh
+
+test_expect_success 'setup full repo' '
+       mkdir full &&
+       cd "$TRASH_DIRECTORY/full" &&
+       git init &&
+       git config core.commitGraph true &&
+       objdir=".git/objects"
+'
+
+test_expect_success 'write graph with no packs' '
+       cd "$TRASH_DIRECTORY/full" &&
+       git commit-graph write --object-dir . &&
+       test_path_is_file info/commit-graph
+'
+
+test_expect_success 'create commits and repack' '
+       cd "$TRASH_DIRECTORY/full" &&
+       for i in $(test_seq 3)
+       do
+               test_commit $i &&
+               git branch commits/$i
+       done &&
+       git repack
+'
+
+graph_git_two_modes() {
+       git -c core.graph=true $1 >output
+       git -c core.graph=false $1 >expect
+       test_cmp output expect
+}
+
+graph_git_behavior() {
+       MSG=$1
+       DIR=$2
+       BRANCH=$3
+       COMPARE=$4
+       test_expect_success "check normal git operations: $MSG" '
+               cd "$TRASH_DIRECTORY/$DIR" &&
+               graph_git_two_modes "log --oneline $BRANCH" &&
+               graph_git_two_modes "log --topo-order $BRANCH" &&
+               graph_git_two_modes "log --graph $COMPARE..$BRANCH" &&
+               graph_git_two_modes "branch -vv" &&
+               graph_git_two_modes "merge-base -a $BRANCH $COMPARE"
+       '
+}
+
+graph_git_behavior 'no graph' full commits/3 commits/1
+
+graph_read_expect() {
+       OPTIONAL=""
+       NUM_CHUNKS=3
+       if test ! -z $2
+       then
+               OPTIONAL=" $2"
+               NUM_CHUNKS=$((3 + $(echo "$2" | wc -w)))
+       fi
+       cat >expect <<- EOF
+       header: 43475048 1 1 $NUM_CHUNKS 0
+       num_commits: $1
+       chunks: oid_fanout oid_lookup commit_metadata$OPTIONAL
+       EOF
+       git commit-graph read >output &&
+       test_cmp expect output
+}
+
+test_expect_success 'write graph' '
+       cd "$TRASH_DIRECTORY/full" &&
+       graph1=$(git commit-graph write) &&
+       test_path_is_file $objdir/info/commit-graph &&
+       graph_read_expect "3"
+'
+
+graph_git_behavior 'graph exists' full commits/3 commits/1
+
+test_expect_success 'Add more commits' '
+       cd "$TRASH_DIRECTORY/full" &&
+       git reset --hard commits/1 &&
+       for i in $(test_seq 4 5)
+       do
+               test_commit $i &&
+               git branch commits/$i
+       done &&
+       git reset --hard commits/2 &&
+       for i in $(test_seq 6 7)
+       do
+               test_commit $i &&
+               git branch commits/$i
+       done &&
+       git reset --hard commits/2 &&
+       git merge commits/4 &&
+       git branch merge/1 &&
+       git reset --hard commits/4 &&
+       git merge commits/6 &&
+       git branch merge/2 &&
+       git reset --hard commits/3 &&
+       git merge commits/5 commits/7 &&
+       git branch merge/3 &&
+       git repack
+'
+
+# Current graph structure:
+#
+#   __M3___
+#  /   |   \
+# 3 M1 5 M2 7
+# |/  \|/  \|
+# 2    4    6
+# |___/____/
+# 1
+
+test_expect_success 'write graph with merges' '
+       cd "$TRASH_DIRECTORY/full" &&
+       git commit-graph write &&
+       test_path_is_file $objdir/info/commit-graph &&
+       graph_read_expect "10" "large_edges"
+'
+
+graph_git_behavior 'merge 1 vs 2' full merge/1 merge/2
+graph_git_behavior 'merge 1 vs 3' full merge/1 merge/3
+graph_git_behavior 'merge 2 vs 3' full merge/2 merge/3
+
+test_expect_success 'Add one more commit' '
+       cd "$TRASH_DIRECTORY/full" &&
+       test_commit 8 &&
+       git branch commits/8 &&
+       ls $objdir/pack | grep idx >existing-idx &&
+       git repack &&
+       ls $objdir/pack| grep idx | grep -v --file=existing-idx >new-idx
+'
+
+# Current graph structure:
+#
+#      8
+#      |
+#   __M3___
+#  /   |   \
+# 3 M1 5 M2 7
+# |/  \|/  \|
+# 2    4    6
+# |___/____/
+# 1
+
+graph_git_behavior 'mixed mode, commit 8 vs merge 1' full commits/8 merge/1
+graph_git_behavior 'mixed mode, commit 8 vs merge 2' full commits/8 merge/2
+
+test_expect_success 'write graph with new commit' '
+       cd "$TRASH_DIRECTORY/full" &&
+       git commit-graph write &&
+       test_path_is_file $objdir/info/commit-graph &&
+       graph_read_expect "11" "large_edges"
+'
+
+graph_git_behavior 'full graph, commit 8 vs merge 1' full commits/8 merge/1
+graph_git_behavior 'full graph, commit 8 vs merge 2' full commits/8 merge/2
+
+test_expect_success 'write graph with nothing new' '
+       cd "$TRASH_DIRECTORY/full" &&
+       git commit-graph write &&
+       test_path_is_file $objdir/info/commit-graph &&
+       graph_read_expect "11" "large_edges"
+'
+
+graph_git_behavior 'cleared graph, commit 8 vs merge 1' full commits/8 merge/1
+graph_git_behavior 'cleared graph, commit 8 vs merge 2' full commits/8 merge/2
+
+test_expect_success 'build graph from latest pack with closure' '
+       cd "$TRASH_DIRECTORY/full" &&
+       cat new-idx | git commit-graph write --stdin-packs &&
+       test_path_is_file $objdir/info/commit-graph &&
+       graph_read_expect "9" "large_edges"
+'
+
+graph_git_behavior 'graph from pack, commit 8 vs merge 1' full commits/8 merge/1
+graph_git_behavior 'graph from pack, commit 8 vs merge 2' full commits/8 merge/2
+
+test_expect_success 'build graph from commits with closure' '
+       cd "$TRASH_DIRECTORY/full" &&
+       git tag -a -m "merge" tag/merge merge/2 &&
+       git rev-parse tag/merge >commits-in &&
+       git rev-parse merge/1 >>commits-in &&
+       cat commits-in | git commit-graph write --stdin-commits &&
+       test_path_is_file $objdir/info/commit-graph &&
+       graph_read_expect "6"
+'
+
+graph_git_behavior 'graph from commits, commit 8 vs merge 1' full commits/8 merge/1
+graph_git_behavior 'graph from commits, commit 8 vs merge 2' full commits/8 merge/2
+
+test_expect_success 'build graph from commits with append' '
+       cd "$TRASH_DIRECTORY/full" &&
+       git rev-parse merge/3 | git commit-graph write --stdin-commits --append &&
+       test_path_is_file $objdir/info/commit-graph &&
+       graph_read_expect "10" "large_edges"
+'
+
+graph_git_behavior 'append graph, commit 8 vs merge 1' full commits/8 merge/1
+graph_git_behavior 'append graph, commit 8 vs merge 2' full commits/8 merge/2
+
+test_expect_success 'setup bare repo' '
+       cd "$TRASH_DIRECTORY" &&
+       git clone --bare --no-local full bare &&
+       cd bare &&
+       git config core.commitGraph true &&
+       baredir="./objects"
+'
+
+graph_git_behavior 'bare repo, commit 8 vs merge 1' bare commits/8 merge/1
+graph_git_behavior 'bare repo, commit 8 vs merge 2' bare commits/8 merge/2
+
+test_expect_success 'write graph in bare repo' '
+       cd "$TRASH_DIRECTORY/bare" &&
+       git commit-graph write &&
+       test_path_is_file $baredir/info/commit-graph &&
+       graph_read_expect "11" "large_edges"
+'
+
+graph_git_behavior 'bare repo with graph, commit 8 vs merge 1' bare commits/8 merge/1
+graph_git_behavior 'bare repo with graph, commit 8 vs merge 2' bare commits/8 merge/2
+
+test_done