repositorio - tortoise svn

¿Cuál es el porcentaje de autoría en las estadísticas de TortoiseSVN? (1)

En la sección de estadísticas de TortoiseSVN, hay algo llamado porcentaje de autoría. ¿Que es esto? ¿Cómo se calcula esto? y ¿cómo puede ser útil?

El porcentaje de autoría es una métrica que tiene como objetivo cuantificar la contribución de cada committer .

En teoría, deberían ser cambios de líneas, pero agregados a lo largo de todo el historial del archivo, con un peso decreciente. Además, se puede aplicar algún tipo de heurística para reducir el peso de los cambios solo de espacio en blanco, como las correcciones de sangría. A grandes rasgos, esta métrica debería responder a la pregunta "con qué persona debería hablar si quiero entender / corregir / mejorar esta parte del código".

En la práctica, aquí está el código real :

void CStatGraphDlg::GatherData() { // Sanity check if ((m_parAuthors==NULL)||(m_parDates==NULL)||(m_parFileChanges==NULL)) return; m_nTotalCommits = m_parAuthors->GetCount(); m_nTotalFileChanges = 0; // Update m_nWeeks and m_minDate UpdateWeekCount(); // Now create a mapping that holds the information per week. m_commitsPerUnitAndAuthor.clear(); m_filechangesPerUnitAndAuthor.clear(); m_commitsPerAuthor.clear(); int interval = 0; __time64_t d = (__time64_t)m_parDates->GetAt(0); int nLastUnit = GetUnit(d); double AllContributionAuthor = 0; // Now loop over all weeks and gather the info for (LONG i=0; i<m_nTotalCommits; ++i) { // Find the interval number __time64_t commitDate = (__time64_t)m_parDates->GetAt(i); int u = GetUnit(commitDate); if (nLastUnit != u) interval++; nLastUnit = u; // Find the authors name CString sAuth = m_parAuthors->GetAt(i); if (!m_bAuthorsCaseSensitive) sAuth = sAuth.MakeLower(); tstring author = tstring(sAuth); // Increase total commit count for this author m_commitsPerAuthor[author]++; // Increase the commit count for this author in this week m_commitsPerUnitAndAuthor[interval][author]++; CTime t = m_parDates->GetAt(i); m_unitNames[interval] = GetUnitLabel(nLastUnit, t); // Increase the file change count for this author in this week int fileChanges = m_parFileChanges->GetAt(i); m_filechangesPerUnitAndAuthor[interval][author] += fileChanges; m_nTotalFileChanges += fileChanges; //calculate Contribution Author double contributionAuthor = CoeffContribution((int)m_nTotalCommits - i -1) * fileChanges; AllContributionAuthor += contributionAuthor; m_PercentageOfAuthorship[author] += contributionAuthor; } // Find first and last interval number. if (!m_commitsPerUnitAndAuthor.empty()) { IntervalDataMap::iterator interval_it = m_commitsPerUnitAndAuthor.begin(); m_firstInterval = interval_it->first; interval_it = m_commitsPerUnitAndAuthor.end(); --interval_it; m_lastInterval = interval_it->first; // Sanity check - if m_lastInterval is too large it could freeze TSVN and take up all memory!!! assert(m_lastInterval >= 0 && m_lastInterval < 10000); } else { m_firstInterval = 0; m_lastInterval = -1; } // Get a list of authors names LoadListOfAuthors(m_commitsPerAuthor); // Calculate percent of Contribution Authors for (std::list<tstring>::iterator it = m_authorNames.begin(); it != m_authorNames.end(); ++it) { m_PercentageOfAuthorship[*it] = (m_PercentageOfAuthorship[*it] *100)/ AllContributionAuthor; } // All done, now the statistics pages can retrieve the data and // extract the information to be shown. }

La métrica se inspiró en algunas ideas de las estadísticas de Git (las copié aquí porque las encuentro interesantes, pero los enlaces se rompen fácilmente):

Terminology There are four types of users: Maintainers, Developers, Bug-fixers, and regular Users. The first three are all Contributors. Name: Maintainer (Contributor) Description: The Maintainer reviews commits and branches from other Contributors and decided which ones to integrate into a ''master'' branch. Name: Developer (Contributor) Description: The Developer contributes enhancements to the project, e.g. they add new content or improve existing content. Name: Bug-fixer (Contributor) Description: The Bug-fixer locates ''bugs'' (as something unwanted that needs to be corrected) in the content and ''fixes'' them. Name: User Description: The User uses the content, be it in their daily work or every now and then for a specific purpose. Use cases A model where other Contributors review commits is assumed in all use cases. When referenced are made to a Contributor addressing another Contributor to adjust their behavior as the result of data mined, it should be kept in mind that the Contributor should foremost be the one to do this. Using this information to, say, spend more time checking ones own commits for bugs when working on a specific part of the content on ones own accord is is often more effective then doing so only after being asked. </disclaimer>? :P Name: Finding a Contributor that is active in a specific bit of content. Description: Whenever a Contributor needs to know about other Contributors that are active in a specific part of the content they query git for this information. This could be used to figure out whom to send a copy of a commit (someone who has recently worked on the content a commit modifies is likely to be interested in such a commit). This information may be easily gathered with, say, git blame. Aggregating it''s output (in the background if need be to maintain speedy response times), it is trivial to determine whether a Contributor has more commits/lines of change than a predefined amount. The main difference with git blame is that it''s output is aggregated over the history of the content, for a specific Contributor, whereas git blame only shows the latest changes. Name: Finding which commits touches the parts of the content that a commit touches. Description: There are several reasons that one might want to know which commit touches the parts of the content that a commit touches. This may be implemented similar to how git blame works only instead of ''stopping'' after having found the author of a line, the search continues up to a certain date in the past. Name: Integrating the found ''bug introducing'' commit with the git commit message system. Description: When a Bug-fixer sends out a commit to fix a bug it might be useful for them to find out where exactly the bug was introduced. Using the ''which commit touched the content this commit touches'' technique optional candidates may be retrieved. After picking which of the found commits caused the bug, this information may then automatically added to the commit''s description. This does not only allow the Bug-fixer to make clear the origin of their commit, but also make it possible to later unambiguously determine a bug/fix pair. Note that this is automated, no user input is required to determine which commit caused the bug, only the picking of ''cause'' commits requires input from the user. Name: Finding the Author that introduce a lot of/almost no bugs to the content. Description: Contributors might be interested to know which of the Developers introduce a lot of bugs, or the contrary, which introduce almost no bugs to the content. This information is highly relevant to the Maintainer as they may now focus the time they spend on reviewing commits on those that stem from Developers that turn out to often introduce bugs. On the other hand, Developers that usually do not introduce bugs need less reviewing time. While such information is usually known to the experienced Maintainer (as they know their main contributors well), it can be helpful to new maintainers, or as a pointer that the opinion of the Maintainer about a specific Developer needs to be adjusted. Bug-fixers on the other hand can use this information to address the Developer that introduces most of the bugs they fix, perhaps with advice on how to prevent future bugs from being introduced. Name: Finding the Contributor that accepted a lot of/almost no bugs into the content. Description: Similar to the finding Authors that write the bugs, there are other Contributors that ''accept'' the commit. Either passively, by not commenting when the commit is sent out for review, or actively, by ''acknowledging'' (acked-by), ''signing off'' (signed-off-by) or ''testing'' (tested-by) a commit. When actively doing so, this can later be traced and then be used in the same ways as for Authors. Name: Finding parts of the content in which a lot of bugs are introduced and fixed Description: When a Developer decides to change part of the content, it would be interesting for them to know that many before them introduced bugs when working on that part of the content. Knowing this the Developer might ask for all such buggy commits to try and learn from the mistakes made by others and prevent making the same mistake. A Maintainer might use this information to spend extra time reviewing a commit from a ''bug prone'' part of the content. Name: Finding parts of the content a particular Contributor introduces a lot of/almost no bugs to. Description: When trying to decide whether to ask a specific Contributor to work on part of the content it might be useful to not only know how active they work on that part of the content, but also if they introduced a lot of bugs to that part, or perhaps fixed many. Similar to the more general case, this can be split out between modifying content and ''accepting'' modifications. This information may be used to decide to ask a Contributor to spend more time on a specific part of the content before sending in a commit for review. Name: Finding how many bugs were introduced/fixed in a period of time Description: As bugs are recognized by their fixes, it is always possible to match a bug to it''s fix. Both commits have a time stamp and with those the time between bug and fix can be calculated. Aggregating this data over all known bug(fixes) the amount of unfixed bugs may be found over a specified period of time. For example, finding the amount of fixed bugs between two releases, or how many bugs were not fixed within one release cycle. This number might then be calculated over several time frames (say, each release), after which it is possible to track ''content quality'' throughout releases. If this information is then graphed one can find extremes in this figure (for example, a release cycle in which a lot of bugs were fixed, or one that introduced many). Knowing this the Contributors may then determine the cause of such and learn from that. Name: Finding how much work a contributor has done over a period of time. Description: When working in a team in which everybody is expected to do approximately the same amount of work it is interesting to see how much work each Contributor actually does. This allows the team to discuss any extremes and attempt to handle these as to distribute the work more evenly. When work is being done by a large group of people it is interesting to know the most active Contributors since these usually are the ones with most knowledge on the content. The other way around, it is possible to determine if a specific Contributor is ''active enough'' for a specific task (such as mentoring). Name: Finding whether a Contributor is mostly a Developer or a Bug-fixer. Description: To all Contributors it is interesting to know if they spend most of their time fixing bugs, or contributing enhancements to the content. This information could also be queried over a specific time frame, for example ''weekends vs. workdays'' or ''holidays vs. non-holidays''.