Leadership in AI Assisted Engineering – Justin Reock, DX (acq. Atlassian)

GenAI Impact & Adoption: A Deep Dive for Senior Executives

Key Concepts:

GenAI (Generative AI): Artificial intelligence models capable of generating new content (text, code, images, etc.).
DXAI Measurement Framework: A framework for evaluating the impact of GenAI, encompassing Utilization, Impact, and Cost.
Psychological Safety: A belief that one will not be punished or humiliated for speaking up with ideas, questions, concerns, or mistakes. Crucial for team performance.
SDLC (Software Development Life Cycle): The process of planning, creating, testing, and deploying software.
Temperature (in AI models): A parameter controlling the randomness of the model’s output. Lower values = more deterministic, higher values = more creative.
Telemetry Metrics: Data automatically collected from API usage, providing insights into tool adoption but potentially lacking context.
Experience Sampling: Gathering data directly from users about their experiences with AI tools (e.g., through PR forms).
MTTR (Mean Time To Resolution): The average time taken to resolve an incident or issue.

I. Current Impact & Productivity Paradox

The current impact of GenAI is highly variable and difficult to quantify. Initial reports are conflicting: Google claims a 10% productivity increase, while a MER study indicated a 19% decrease in productivity when using code assistance tools. This discrepancy highlights a crucial point – engineers feel more productive, but data often shows the opposite. This “induced flow” creates a positive perception that doesn’t align with actual output.

Industry averages, based on a large sample size, suggest modest positive trends: a 7.5% increase in documentation quality and a 3.4% increase in code quality with 25% AI adoption. DX’s internal data corroborates these findings, showing a 2.6-6% increase in change confidence and a 1% reduction in change failure rate. However, breaking down these averages per company reveals extreme volatility. Some organizations experience 20% increases in change confidence, while others see 20% decreases. A 2% increase in change failure rate (compared to an industry benchmark of 4%) can translate to shipping 50% more defects.

II. Factors Influencing GenAI Success & Failure

The variability in GenAI impact stems from several key factors:

Top-Down Mandates: Forcing 100% AI adoption without proper support is ineffective. Compliance without genuine integration yields minimal results.
Lack of Education & Enablement: Simply providing the technology without training and guidance leads to underutilization and suboptimal results.
Inadequate Metric Measurement: Organizations struggle to define and track meaningful metrics to assess GenAI’s true impact. Focusing solely on utilization is insufficient.

Dora research emphasizes the importance of clear AI policies and dedicated time for experimentation. A Bayesian posterior distribution analysis highlights these factors as key drivers of positive outcomes.

III. Integrating GenAI Across the SDLC – Beyond Code Completion

The speaker stresses that code writing is often not the primary bottleneck in software development. GenAI’s potential lies in unblocking issues across the entire SDLC. This requires:

Creative Problem Solving for Data Security: Addressing concerns about data exfiltration by leveraging secure infrastructure like Bedrock and Fireworks AI.
Open Communication about Metrics: Transparency regarding why metrics are being collected and how they will be used to improve processes.
Reducing Fear of Job Displacement: Framing GenAI as a tool to augment engineers, not replace them, increasing throughput and overall business value.
Establishing Compliance & Trust: Ensuring responsible and ethical AI implementation.
Tying AI Skills to Employee Success: Recognizing and developing new skill sets required for effective GenAI utilization.

IV. Reducing Fear & Building Psychological Safety

Drawing on Google’s Project Aristotle (2012), the speaker emphasizes the critical role of psychological safety in high-performing teams. Engineers need to feel comfortable experimenting with AI without fear of negative consequences. The speaker points to SweetBench data, which demonstrates that even advanced AI agents still require human intervention for approximately two-thirds of tasks, reinforcing the augmentation narrative. Transparency about intent – using AI to help developers, not replace them – is crucial.

V. Measuring GenAI Impact: Speed & Quality

Effective measurement requires focusing on two key levers: speed and quality. Specific metrics include:

PR Throughput (Velocity)
Change Failure Rate
Overall Perception of Quality
Change Confidence
Maintainability

The speaker outlines three types of metrics:

Telemetry Metrics: API-derived data (e.g., accept vs. suggest rates), but acknowledges their limitations due to potential inaccuracies.
Experience Sampling: Gathering direct user feedback through surveys and PR forms (e.g., “I used AI to generate this PR”).
Self-Reported Data/Surveys: Emphasizing the importance of well-designed surveys with high participation rates (90%+) that treat developer experience as a systems problem, not a people problem (citing W. Edwards Deming’s principle that 90-95% of productivity is determined by the system).

VI. The DXAI Measurement Framework & Maturity Curve

DX has developed a framework – the DXAI Measurement Framework – inspired by Dora and DevX, categorizing GenAI impact into three dimensions: Utilization, Impact, and Cost. This framework represents a maturity curve:

Utilization: Tracking adoption rates (e.g., percentage of AI-assisted pull requests).
Impact: Correlating utilization with improvements in velocity and quality.
Cost: Analyzing the financial implications of GenAI implementation (though this area is still developing).

VII. Practical Applications & Case Studies

Morgan Stanley (Dev Gen AI): Automating the creation of specifications for modernizing legacy code (Cobalt, mainframe, Pearl), saving approximately 300,000 hours annually.
Zapier: Utilizing AI-powered bots to accelerate engineer onboarding, reducing time to effectiveness from 90 days to 2 weeks, and driving increased hiring.
Spotify: Streamlining incident response by automatically gathering and delivering relevant context to SRE channels, significantly reducing MTTR.

VIII. Compliance, Trust & System Prompts

Maintaining compliance and trust requires a robust feedback loop for system prompts (also known as cursor rules or agent markdown). A dedicated team should manage and continuously improve these prompts to ensure consistent and reliable AI behavior. Understanding the temperature parameter in AI models is also crucial for controlling the balance between determinism and creativity.

IX. Conclusion & Next Steps

The key takeaway is that GenAI’s success hinges on a holistic approach that prioritizes education, measurement, and integration across the entire SDLC. Organizations should:

Distribute the AI strategy playbook as a reference guide.
Establish a method for measuring GenAI impact.
Track adoption and correlate it with key performance indicators.
Iterate on best practices and use cases.

The speaker concludes by emphasizing that AI is not a threat to engineers’ jobs, but rather an opportunity to enhance their skills and increase the overall throughput of the business.