Support Stack E13: Fin Handles the Easy Stuff… Now QA Gets Harder

25 Mar

As AI takes on more of your support volume, something subtle but important shifts: Your team stops handling simple, repetitive tickets and starts handling only the most complex, high-stakes conversations.

That raises the bar for quality.

In this episode of Support Stack, I’m joined by Thomas Hils, Head of Support at Wingspan, to explore how QA needs to evolve in an AI-first support environment.

Thomas walks through how he built a practical QA system using Claude Cowork and Intercom, designed to handle this new reality. Instead of adding another tool or increasing manual workload, he focused on creating a structured, repeatable approach that combines AI with human judgement.

We cover:

Why higher Fin resolution rates make QA more important, not less
The gap between QA rubrics written for humans and what AI can actually interpret
How to structure context so AI can meaningfully evaluate conversations
The role of calibration, and why it remains essential
How AI can surface inconsistencies in how we score quality as humans
Extending QA beyond teammates to review Fin’s own responses

A key takeaway from this conversation is that AI doesn’t remove the need for QA. It changes where the effort needs to go. Less time on volume, more time on judgement, consistency, and edge cases.

If you’re using Intercom, experimenting with Fin, or thinking about how your support team evolves as automation improves, this episode gives you a clear, practical example of what that shift looks like.

Subscribe for daily CX + Intercom insights

Episode transcript

Conor Pendergrast (00:00)

Hello and welcome to episode 13 of Support Stack.

So Thomas Hills, welcome to Support Stack. How are you doing today?

Thomas Hils (00:14)

doing great, pretty stoked to be here and to chat through some of the stuff with you.

Conor Pendergrast (00:18)

I am excited. So, I, you are head of support at Wingspan, is that correct?

Thomas Hils (00:24)

That's right, for the past two years now.

Conor Pendergrast (00:27)

Excellent. And so you and I spoke because I saw you in a, in a customer community that we're part of called support driven. And you were talking about all about how you were using Claude in particular Claude, cowork and Claude code to really dig into intercom and get some pretty sophisticated workflows done that would just have been utterly impossible before now. So we're going to do two episodes together with some really pleased about.

And today is the first one, so tell me, what are you solving for today?

Thomas Hils (01:04)

Yeah, I think everyone kind of knows the importance and the relative pain of doing QA for your team. And that's been something that's really been on my mind lately as we've been growing and scaling, wanting to make sure that that quality bar is really where we want it to be. But to do that in a way that felt both, you know, honestly easy for me in my time as I'm pulled in a lot of places, but didn't require an expensive tool. ⁓

Since Claws, we lost them, rest in peace to Zendesk. They were one of the best tools out there in my opinion. I still have my socks from them with the little cats all over it. So I was looking for...

Conor Pendergrast (01:39)

Yes.

⁓ I'm confident

that if I look around this room, even I think 10, 12 years after I met some of the people from Claws at a conference in Serbia, I'm just confident that I could find some cats somewhere around here.

Thomas Hils (02:00)

They had great ⁓ swag and just a great demeanor. So much respect to them. But what was great about that tool for me is it was at the right price point for smaller teams where you're at that size where you do have to do a substantial amount of QA to keep up with everyone, but you can't be dropping 50, 60 bucks per head on your QA tool. And so I had tried a few different ways to solve this problem, ⁓ mostly as toys.

I was using kind of like lovable style AI tools to build an app that would pull tickets from intercom, which is our ticketing system and helped me QA those. And that turned out to be a nightmare. ⁓ It's everything kind of what people joke about AI tooling to be where you could get like a slick interface, but once you actually started getting into the specificities of it, it just choked constantly. ⁓ And then Claude Cowork came out and that kind of

pushed me towards where I have landed now.

Conor Pendergrast (03:00)

Great. So, this might be a good point to pause and just explain what Claude Cowork is for people who are not deep into the Claude ecosystem. Do you have like an easy definition for Cowork? Cause I couldn't tell you what the difference is. I just know that I use it a little bit.

Thomas Hils (03:20)

Yeah, I think, ⁓ you know, Anthropic does a lot of things, right? I don't think they've really been able to clearly articulate the value of Claude Cowork. I know when it first came out, I was just confused as I couldn't understand why I would use this, but it seemed like they were pouring a lot of resources into it. So that's, drew me to it. And what I'd say is at a high level Claude Cowork is Claude code, but for people who aren't necessarily engineers and what it can do that

Conor Pendergrast (03:48)

Hmm.

Thomas Hils (03:49)

that Claude, the chat interface can't do is a few things. First, it exists in a local virtual machine. So it's running its own kind of virtual computer where one, that's a safety measure, but two, it's allowed to do more things on your computer, which at first is kind of abstract, but you'll see a little bit of what it's done later on to get some context there.

Second, it's multi-agent. So you might ask it a question and Claude can determine whether to spin up multiple agents to help solve that. So it can have kind of some bulky intensive tasks. For example, I was just syncing a bunch of local documentation back to ⁓ a hosted source. And instead of doing that one by one, as Claude would, if I asked it in a chat, it spun up, you know, 10 agents and each of them took a chunk of these docs and then could do that work for you.

So none of that was like a pithy description of Claude Co-Work, ⁓ but it is a little bit closer to the difference.

Conor Pendergrast (04:51)

Okay, okay, that's great. you discovered or you found out about Cloud Cowork at the same time, I guess, as you were trying to think of more efficient ways of doing quality assurance on your team's conversations that they're having with customers. And so maybe it's important to pause for a moment as well and say, why actually, like, you're a head of support, you know this, and any other head of support, a director of support probably knows it, but I feel like there's a point

that we're getting to now where teams who weren't traditionally doing QA, it's probably a good time to start doing QA when you're getting good use out of Fin, especially, because I don't know how you're finding this, but certainly the people I'm working with, once you hit the sort 60 % resolution rate, it's mostly very difficult interactions that end up with your teammates. So is...

Thomas Hils (05:46)

Absolutely.

Conor Pendergrast (05:47)

Was that one of the triggers as well for looking at getting better reach for your QA?

Thomas Hils (05:53)

Absolutely. We've hit that kind of 60 % mark

the tickets that we do feed into Fin. And that does leave the most complicated or emotionally charged tickets for the team. ⁓ And where there was a time that, you know, I would say the mix was more like 60, 70 % easy tickets ending up in the individual agents buckets. Now it's like probably 70, 80 % of those tickets are quite tricky. And to us that meant

Conor Pendergrast (06:05)

Hmm.

Thomas Hils (06:24)

especially as a team who's growing, excuse me, a product that's growing and changing quite quickly, we had to be certain we were giving the best possible answers. We couldn't just ⁓ expect these more tricky tickets to be a small minority of that work. So I wanted to make sure that that consistency was there. And at the same time that we're also sharing knowledge as we learn different troubleshooting mechanisms or different changes to the platform that would

how we deal with these very complex tickets.

Conor Pendergrast (06:53)

Of course, of course. So can you talk us through the first thing you did? The first way you approached this using, was it first using Claude's co-work or did you try an earlier version through Claude?

Thomas Hils (07:06)

you know, I went right to cowork. ⁓ part of what inspired this was I wanted to understand cowork. ⁓ I knew there was something to this tool. There was a reason that people wanted it, ⁓ or that they at least built it. And I wanted to kind of experiment with it. And this was a kind of problem that I had that I thought I could throw at this. And one of the first things that stood out to me, of course, is that Claude has a connection directly to intercom through the intercom MCP. And some of the biggest struggle that.

I've been having with these kind of ticket QA tools I was building was that ability to cleanly pull in tickets and then format them and the conversation inside of that correctly. ⁓ I don't know how many folks have worked with the intercom API, but the way that conversations are returned are in these chunks and it's a bit of a nightmare to necessarily structured all the way that you want. So my first thing was just like, Hey, Claude, can you grab five tickets for me for me to QA?

and of course it returned five random tickets from anywhere in our intercom workspace. ⁓ and that's when I realized, okay, yeah, there's, this is a spark, but there's a lot of kind of defining and scoping that I needed to provide it to get kind of what I actually wanted out of it.

Conor Pendergrast (08:11)

Hahaha

Yeah, OK, so this wasn't you just being like, hey, Claude, go and QA five conversations. Give me a score, and then I'll just pass it straight to the person who interacted with the customer.

Thomas Hils (08:35)

Yeah, had hoped it could be that easy. The next step was feeding it, of course, our internal QA scorecard, which ⁓ really highlights the difference, I think, between how a human can read that type of content and then how a machine reads it. Because even then, just saying, hey, all right, here's our QA scorecard. Can you score these tickets? It wasn't giving great results either. ⁓ It's very interesting.

Conor Pendergrast (08:59)

of course. yeah,

of course. Like I talk about this

all the time, that the content you write for an AI agent is very different than what you write for a human teammate. We're like, we as people are a lot more forgiving of the stranger structures and like it. ⁓

in indirect references and all of the language that we all use every day. But an LLM is quite literal in a lot of ways. And I'm so used to saying that on the like, fin and content side of it, the help article side of it, the content snippet side of it. But when you say it that way, of course, like you have to have a QA structure that makes sense to an LLM. did you just,

it a process of feeding that in? And then you got, you saw the mistakes that Claude was making?

Thomas Hils (09:53)

Yeah, immediately it was pretty clear there were a few things that were lacking. First is nuance. Humans kind of fill in that context. One of the obvious examples,

this is still actually a thing I haven't entirely squashed, is a best practice of ours is in a lengthy conversation ⁓ of a complicated issue, you should leave internal notes so that if another person has to pick it up, this is what you can do. ⁓ Claude's like, "okay, where's the note?"

to just about every conversation. It doesn't have a great sense of what complexity is. It didn't understand what the length really meant in this regard. Whereas a human QAing can kind of say, "OK, this isn't that hard of a ticket. Even if there is a significant amount of back and forth", ⁓ Claude had that simple heuristic, and it was applying it quite aggressively. ⁓ Second to that is just ⁓ a lack of understanding of then the process. We started off

Conor Pendergrast (10:28)

Yeah.

Thomas Hils (10:52)

⁓ feeding

product knowledge. I've been very lucky that our CTO, my boss has set up a very thorough internal knowledge base, ⁓ for AI written for our

across the company. And it references our code base and like human written definitions of each feature and their expectations of how to work and even where in the code to look to kind of understand these different features better. and that is like.

a foundational change for us and it's really improved everything we do when it comes to AI. But even that wasn't quite enough. You could say, hey, look at this ticket, ⁓ give it the score based off of the rubric, but here's how you know if they are providing accurate information. I thought that would be a slam dunk and really solve all of my problems, but even that didn't quite get me where I needed to be.

Conor Pendergrast (11:47)

Okay, so why don't we, yeah, why don't we go through and see what it would look like for an example.

Thomas Hils (11:48)

Alright.

Yeah, so I've got a ticket link here. The skill can either grab tickets for you, or in this case, we're using one version of the

that takes a specific input link. That can be helpful if I find a particular ticket that I want to QA versus having it grab random ones. ⁓ And so yeah, we can run this raw score test. So it's going to go ahead and score this ticket without the calibration and context that we've done subsequently.

Claude does run a little slow, so it might take a second here. But also it's quite thorough. The second version of this is quite thorough, where it's going to read through a pretty significant amount of product information, a pretty significant amount of calibration work, and then kind of give you a more organized and meaningful review. What has been tricky is just kind of

getting ⁓ that content to be as bite-sized and understandable as we would like it to be so that Claude doesn't have to constantly think over eight or nine different docs. But that kind of long-form context structuring is something that I'm learning a lot about to make these tools

but isn't something I knew very much about going in.

Conor Pendergrast (13:09)

Yeah.

Yeah. I mean, it really makes me appreciate the work that intercom has done to have Fin interact with customer conversations so effectively, especially when you have, you know, hundreds of public articles and hundreds of maybe dozens or hundreds of snippets. And, and then you also have all the guidance and procedures and everything else that Fin could possibly interact with. And it just, it just runs through it in seconds every time.

Thomas Hils (13:40)

Yeah, it.

Conor Pendergrast (13:40)

Okay, cool.

this is summarizing what the conversation is about.

Thomas Hils (13:45)

Yes, so it starts off with a quick summary from my context. ⁓ This would typically be a link so that I can open this ticket up myself now. And so how I have this run is it's a scheduled task. So every morning I wake up to just five of these all ready to go. So that kind of lengthiness and running isn't as ⁓ noticeable to me in my day to day. Exactly.

Conor Pendergrast (14:06)

You don't notice it. Yeah.

Thomas Hils (14:09)

So you can see it scores by different ⁓ elements of our rubric. So what we've told it is important to us gives a note for each of these individual

attributes. And then it kind of puts together everything into an overall coaching note. Now for our process, we do have one element that's pass fail, and that's our compliance checks. These just like regulatory requirements where if those things aren't done, you know, that's that's kind of a failure here.

Conor Pendergrast (14:22)

Mm-hmm.

Of course.

Thomas Hils (14:38)

So you can see it, you know, we score out of five where this idea of three is essentially kind of doing what you were expected to do. ⁓ Four is like going a little bit above and beyond. And then five is just like these amazing tickets. And so what I did find in this early version of the skill is that Cloud can just be overly nice. ⁓ It does try to score inflate quite a bit. ⁓ So here we can see this worked out to a 3.5.

Conor Pendergrast (15:01)

Okay, interesting.

Thomas Hils (15:08)

And now if we scroll up and we do this again with the score ticket skill where it's going to use that fully calibrated ⁓

we can see a bit of a difference here.

Conor Pendergrast (15:21)

Got it. Okay, so this is the version that you worked towards. Is this what you're using today or it's just closer to what you're using today?

Thomas Hils (15:30)

This is a bit closer to what I'm using today.

so I've actually done quite a bit more work on top of this, ⁓ to improve the overall process. One is that some of these like troubleshooting and process documentation, they live in, ⁓ they originally lived in this kind of local folder of mine. now they live in a support, ⁓ process database on notion that's being ingested. ⁓ and our CTO is doing some fancy work to.

Conor Pendergrast (15:54)

Okay.

Thomas Hils (16:00)

frame them in a database that AI can access a little bit more quickly. So it doesn't have to read through all of these docs. Instead, it can kind of find the right one a bit more quickly. ⁓ So it's going to run very much a similar process. You'll see that it is ⁓ asking me some things that it typically doesn't. But it's going to see if there's a matching troubleshooting doc after it's identified the issue.

Conor Pendergrast (16:11)

right.

Mm-hmm.

Thomas Hils (16:25)

you'll see that it's ⁓ reading them in parallel. This is kind of some of that element of multiple agents occurring at once. ⁓ And then it runs.

Conor Pendergrast (16:32)

Yeah, and this, like,

it's not, it's doing more, but it feels faster. And maybe, maybe part of that is cause it has the, the, the progress is showing like what it's doing. So it feels faster that way.

Thomas Hils (16:44)

Yeah, it's interesting to dive into the thinking sometimes. In this case, we won't because it's not anonymized. ⁓ But it's cool to kind of see how Claude thinks about this. It'll reference things that it's been told before. It'll reason it out. And sometimes I refer to that actually when we do calibration, me and the bot together, just to say, ⁓ you drew this conclusion from that. This is why that's not quite right.

Conor Pendergrast (17:13)

Yeah. Yeah.

Thomas Hils (17:13)

You'll also see

it mentioning that it's checking the ESC.

So one of the other parts of this process is if there is a linear issue linked to the conversation, which is how we handle our escalated tickets, it will pull that linear conversation and check the content of that as well. So that makes sure that, you know, if we've created an ESC we shouldn't have, or if there's something we could learn from that linear issue that could apply to other things. ⁓

Now it's thinking out loud about what it knows that the raw score didn't. ⁓ So some things that it struggled with in the past, correctly identifying the AI bot that we use in Fin. ⁓ So it wasn't always able to do that well. So we had to kind of work through that to get it to understand those differences, to understand what notes are and what we mean if a note comes from someone who's not being reviewed in the process.

Conor Pendergrast (17:45)

Right.

Thomas Hils (18:09)

⁓ And then of course, it's going to go ahead and reference these calibration docs. So you can see it kind of thinking through that logic and the difference between these scores. ⁓ So ultimately we're getting

3.2 here, which is actually a pretty significant divergence.

Conor Pendergrast (18:23)

Hmm, so it dropped down

from I think it was 3.8 before. Yeah, there it is. So interesting, isn't it?

Thomas Hils (18:30)

out

Yeah, and so it also talks through that logic here. In this example version, in the fully calibrated version, it doesn't have to compare the old and the new anymore, but it does give me that context. And from here, it asks generally if I want to submit. And that is where we would do further calibration. So I would have this ticket open in another window. I would review it myself.

Conor Pendergrast (18:44)

Of course.

Thomas Hils (18:58)

to some degree, although I do less of that now as this is calibrated ⁓ much more strongly to what we're expecting. And if in this conversation we have a divergence, we have an exceptional ticket, Claude can use that, whether it creates a calibration in a kind of running log of what we've disagreed upon or what I've called out, or it can take kind of like a really, really great ticket where there's like a five.

and use that as context for like what I really mean by exemplary because that's a word that's very hard to define for AI.

Conor Pendergrast (19:31)

Mmm.

Of course. Yeah. It doesn't have years and years and years and years of experience of delivering great customer support and seeing that whole range of customer support interactions that you would have, that support leaders would have. Okay. So I'm curious, like you said, you don't as often now dive into the conversations yourselves and look and do that manual review to compare to how Claude has assessed the conversation, the calibrated version.

Like what are the signs for you that you should give it another pass, that you should take your time and really dig into it as well?

Thomas Hils (20:12)

Yeah, that's a great question. ⁓ One of the first indicators for me is I do read the summary of the ticket and I'm looking for processes that kind of I've built up an understanding of where I don't think it's as defined as well as it could be. So I know that this is complicated. There's a lot of variations in while I have gone through this kind of level of calibration and process creation to a pretty significant extent, there's just still edge cases in the product. And I've

Conor Pendergrast (20:27)

Mm-hmm.

Thomas Hils (20:41)

kind of got a mental model now of where I think Fin has a really good understand or Claude has a very good understanding of things ⁓ versus those that it does not. And I'll jump in there. ⁓ Often I will look at something ⁓ and just see the score seem like pretty high and its reasoning doesn't necessarily land with me. ⁓ And during those situations, I'll often ask it to just kind of explain the score that it gave.

Conor Pendergrast (20:47)

Yeah.

Thomas Hils (21:10)

⁓ and what's interesting to me is that often, you know, after you ask it to kind of deep dive on a score, ⁓ it will go ahead and suggest an adjustment on its own. ⁓ part of me thinks that's like a limitation of the context it can have at any given point while it's running through these. And there's probably architecture things that I'd love to do to this, to kind of make it more accurate in the future. ⁓ but even then, ⁓ when that happens, I asked it to suggest

kind of changes to either our calibration or our process docs to prevent this type of like drift in the future.

Conor Pendergrast (21:48)

Yeah, of course. that, that looping back around is really important. Like these are not perfect machines,

especially because fundamentally they're driven by, by us and everything that we've ever done. And so, and, and as you pointed out earlier, they can, they do err on the side of being nice in a lot of cases. And so, which is hard to hear, but we do have to make sure that they are giving honest assessments if we're gonna, if we're gonna use them.

and rely on them. So you calibrate against this. By calibrate, what we mean here is you also check the work that Claude is doing and correct against it. Do you involve other people in that as well? Or is it just you and your grand vision and your strong fist of QA? ⁓

Thomas Hils (22:39)

That's a great question. And you mentioned like kind of biases and what I've created here is a machine that to some degree, QAs like I do, but that means it's got all of my shortcomings. Yeah, exactly. And there's lots of things that I miss or maybe I didn't think through in doing these. So what I've started doing is having members of my team ⁓ run manual QAs and then submit those to me.

Conor Pendergrast (22:50)

Yeah, it's the Thomas QA approach.

Thomas Hils (23:07)

And then I have Claude run through this process, except Claude will score it first. ⁓ Then I will provide my human scores and this person's ⁓ And that gives kind of a more diverse set of opinions and helps me both catch kind of like logical shortcomings in my thinking, but then also identify just like the drift in understanding of like what is defined in a QA rubric.

Conor Pendergrast (23:19)

Okay.

Thomas Hils (23:37)

⁓ it reminds me of like the kind of like thought exercise people have done where like they ask you to give instructions to make a peanut butter and jelly sandwich and you have the person will follow very literally. It's that, but it's the same both for humans and for QA, ⁓ for AI, both of them will follow the instructions to the best of their knowledge. And they both kind of, it's interesting how they fail differently. ⁓ one thing that's come up in these like three person are.

Conor Pendergrast (23:46)

Yes.

Thomas Hils (24:06)

two person, one robot calibration sessions is like the human desire to say that something is like better than ⁓ average. So giving it that four, but not being able to tie it to something explicit. Whereas when ⁓ Claude gives a four now, there is something it is explicitly calling out and it will do that every single time. Whereas I think when a human reads a ticket, often we're basing this kind of on a ⁓

feeling, even when we're trying to be very neutral and ⁓ fair. And that was like so interesting to see as a call out in these calibrations is Claude being quite blunt, like, hey, I think you're just giving it a four without having a real reason. And, you know, that was good feedback for us.

Conor Pendergrast (24:52)

Yeah.

Yeah, yeah, that's very tricky ⁓ to hear from something that you have trained yourself.

Thomas Hils (25:02)

Yeah. ⁓ Honestly, it's been interesting to kind of think through QA through this lens, like programmatically. What does it mean when you're teaching a computer this type of work versus what it meant when someone taught me to do QA and I taught my team to do QA? It is almost a very different process.

Conor Pendergrast (25:22)

Yeah. So, okay, so I'm interested as this is a bit of an aside, but do you apply this to your version of Fin at all? Or is this only for for your your your team at the moment?

Thomas Hils (25:34)

Yeah, I've gone a little above and beyond maybe with how I'm applying this. So I actually have a scheduled process now that's based off of this because why reinvent the wheel? Except when it looks for tickets, it's only looking for a Fin conversations. So AI handled conversations in particular that don't pass off to a human is kind of my first scope because humans will flag if there's something not quite right about.

Conor Pendergrast (25:53)

Okay.

Thomas Hils (26:04)

⁓ or Fin's responses. So it'll take a look at those and apply the same kind of criteria as of TicketQA. It'll look at the process documentation, all of our materials and assess whether or not that it thinks Fin has done a good job. And the second part of that process is another skill I've built where I've collected all this knowledge about how to use Fin. Granted things change so quickly with them it's quite difficult to keep up.

Conor Pendergrast (26:31)

Yes. Yeah.

Thomas Hils (26:32)

But it understands the concept of guidance and snippets and your customer facing docs and what those things are best for. ⁓ And so when it identifies a fin answer, that's not quite great. It creates content that it suggests to use to make that useful. Actually was a little bit more tricky. ⁓ I exported all of our knowledge content, which you can just kind of ask Claude to write a script to download all of that.

Conor Pendergrast (26:42)

Hmm.

Thomas Hils (27:00)

but there's not a programmatic way to get your guidance or your snippets. And if you're like me, you might have a lot of snippets that are poorly organized. So I let Claude run.

Conor Pendergrast (27:09)

There is no way of

like organizing content snippets apart from poorly. It's just poorly.

Thomas Hils (27:16)

Yeah, so that was a, you know, another problem I've had in the back of my head. So I had Claude, run overnight essentially, because it did take quite some time, ⁓ where Claude can use Chrome and it can very slowly navigate a webpage. Yes. Yes. ⁓ and so I hadn't used that before, so I thought this would be a neat opportunity when I realized I couldn't programmatically get this information. So it, I kind of walked it through the structure of the, ⁓ guidance and snippet.

Conor Pendergrast (27:28)

Yeah, was this the Claude Chrome extension?

Thomas Hils (27:45)

pages in Fin and had it download all of those and create local copies. ⁓ So it knows what guidance already exists. ⁓ It's got kind of like a little map. It's made itself of like what things might be impacting it. And when it pulls a ticket, actually gets all of the Fin kind of reasoning, what resources were used to make a decision as well. And at the end of this, like if it finds a content gap or a guidance gap,

it suggests an addition or an amendment to get better answers out of it. ⁓ That's only something I've started doing in the last few weeks, but I've already seen some fairly good results from it.

Conor Pendergrast (28:26)

That's ⁓ a brilliant example. I've played a small bit with the Claude Chrome extension and it is very slow, I agree. And I'm not surprised it ran overnight and did it overnight, but that is a very, very clever way of now I'm thinking like, okay, how can I have like this for to get it working? ⁓ Okay. That was a bit of an aside, but I think it's worth it because yeah, we were talking about QA for teammates, but as we referenced earlier,

If you have a high resolution rate, if you have a high involvement rate, is one of your most significant, Fin is probably your most significant team member from a conversation volume perspective and applying the same approach, the same QA approach or a modified QA approach is probably a really important part for that as well.

Thomas Hils (29:18)

Absolutely. And I wish there were a way that we could give intercom all of the same context that I'm able to do through Claude, but it just isn't practical with the current tooling. So it's nice to know that we can have a way to run through these things with just that expanded knowledge.

Conor Pendergrast (29:25)

Yeah.

Yeah, absolutely. OK, so ⁓ what's next for this set in Claude Cowork? And if someone wanted to create this themselves, how would you recommend they get started?

Thomas Hils (29:53)

Yeah, yeah, it is a tricky thing. I've learned a lot of the wrong ways to do this. So I can definitely walk a little bit through where I started. ⁓ First, I

know a few things now. ⁓ One, I would set that goal and intention out very clearly to Claude that you were looking to build a system to queue a tickets ⁓ and give it a lot of context. So prior to even this step, I think there are much more fundamental steps.

which is to make sure that you have ⁓ context and knowledge in a way that could feed into this decision-making process. I could have made a version of this without the knowledge base that our CTO created, and it certainly would work, but having that was super critical. And then from there, what I did was kind of pass over all of our existing human designed documentation about support processes, ⁓ which also note was just like lacking.

Conor Pendergrast (30:35)

Yeah.

Thomas Hils (30:50)

⁓ so don't let your lack of content here be a blocker to this. This is also a method of generating that content too. You want to set your cloud up for success. ⁓ before even, ⁓ running a single command, the next thing I would suggest is really, really looking at your QA rubric and thinking through, ⁓ that, that gap of context between what

Claude in an AI will understand ⁓ versus what a human will understand from these statements is I think very, very important. And once you feel good about the kind of that structure, that's when I would kind of just dive in and start building with Claude. I think one kind of misconception I come across with my team a lot is they feel like they don't know how to build something in Claude. ⁓ And what I think is most important is that you don't need to know. ⁓

I, it's helpful to have like the knowledge I have now and go back and do it again. ⁓ but this was a process of kind of talking it out and occasionally asking for help. ⁓ the other piece I would recommend is start local. ⁓ you want to keep this process as simple as possible. ⁓ and whether your documentation exists somewhere like notion, which a lot of teams will use, ⁓ or it's in your Google drive, I strongly recommend bringing that.

Conor Pendergrast (32:02)

Mm-hmm.

Thomas Hils (32:12)

content local and putting that into a folder and working out of that folder with Claude cowork. ⁓ it allows you to choose a specific place to look and you can start in there in this kind of contained environment where you have structured all of the content in a way that it understands. And you know, doing that as a matter of saying, Hey, these are my goals and my expectations. Here's that content. How can you put this together to kind of best deliver upon.

because Claude's really good at knowing what Claude needs. And I think that's really critical to

Conor Pendergrast (32:44)

Yes.

Thomas Hils (32:46)

ask as you go along.

Conor Pendergrast (32:48)

Yeah. And you can, you just do that in cowork. So I, by chance, I just did this yesterday in cowork where I set up a repeat, renew, ⁓ recurring skill or scheduled skill. That's what it's called. And what I did was I went into cowork and I said, please do this. I am going to create this as a scheduled skill. So ask me questions that help you build that scheduled skill and make the most out of it.

⁓ and then, ⁓ and the other part of it that I said was actually the calibration idea that you said as well, which is, ⁓ and include in that scheduled skill, a question or a set of questions to ask me every day to improve the overall process and make it more effective every day. And it's, and it doesn't have to be a huge lot more complicated than that. Mine, I did jump straight to this, to, to asking to run a process, but I think.

This is a lot more detailed than the thing that I was trying to do. And so this approach is, think, a lot more effective to talk about it with Claude first before you get into the, hey, now go and get the conversations.

Thomas Hils (33:59)

Absolutely. And you can start small. This was a process of many, many weeks for me to build up a kind of structure and format that fit the needs that we have and my comfort level with an amount of AI involvement. ⁓ You might want more human interaction. And this could be totally redesigned where it walks you through every single score, asks you more questions about that so that you get the exact sort of process that you want.

And that's just a matter of kind of figuring out what works for you and then asking for that to be built out. What I do strongly suggest is that cycle of self-improvement. So when QA is a ticket and it can't find a matching process, that gets flagged upfront and interrupts the whole cycle of QA for me so that we can make sure we have a clear understanding of that process versus just diving right in.

Conor Pendergrast (34:53)

Yes, absolutely. Okay. Is there anything else you wanted to share about this, the skill or Claude Cowork or anything else like that Thomas, before we wrap up?

Thomas Hils (35:03)

Yeah, I forgot like one key piece. Just for me, I think reporting is so critical and what this lacks ⁓ quite clearly is a clear reporting structure. So what actually happens with these scores at the end is there's a JSON file, which it doesn't matter how familiar you are with the JSON file. I didn't choose that. Claude did. ⁓ And I asked Claude then to build me a dashboard so that it can just...

Conor Pendergrast (35:14)

Good point.

Thomas Hils (35:28)

Report on this stuff for me, it generates a PDF that I can send to my agents so that they get these like actually nice looking, ⁓ versions of this. And at the same time, I can go and look at these different attributes over time. I can slice and dice against the category that was set in intercom or the subcategories even, get a sense of like where we struggle as a team by category by. You know, QA score point and all of that fun stuff and just kind of have a very historical process. ⁓

Conor Pendergrast (35:33)

super.

Thomas Hils (35:57)

happy to show that if that would be helpful.

Conor Pendergrast (35:59)

All right, so you're showing this is the QA dashboard.

Thomas Hils (35:59)

so this.

Yeah, you can see some early thinking

where part of me was like, maybe I'll want to submit reviews in this tool. ⁓ I don't, it turns out. You could see the definition of the rubric here as well. And I added this section more to see it as it evolved because during the conversation, I have asked it explicitly to adjust things or add things to it. And being able to keep track of everything across cloud conversations can be tricky.

Conor Pendergrast (36:11)

Yeah.

Thomas Hils (36:32)

especially because these are kind of intensive processes where you run out of context pretty quickly. So splitting up into new conversations is great. ⁓ And then you can just see kind of a full database, how we're tracking categories and subcategories, and then even the kind of individual reports that I generate from here. I click this little shareable button and it goes out into the world and does things. ⁓ The PDF is took.

Conor Pendergrast (36:36)

Mm-hmm.

Thomas Hils (36:59)

Probably the most time of the whole dashboarding was to get it to produce a not ugly PDF, but some of that's just

I am with snobbery.

Conor Pendergrast (37:09)

That's fascinating. I do appreciate this is, well beyond... Again, if you're watching this and you have not got anywhere close to started with this, this is... Let's just consider this months down the road building.

So this is locally hosted, right? It's on your machine? It's not? Yeah, yeah, yeah. So at least you haven't just deployed it online or something like that.

Thomas Hils (37:25)

Yeah.

Super. It was surprisingly

easy to create this. ⁓ But yes,

be ⁓ intimidated. This was stuff that I've spent hours working on for past month and a half, two months now. And you iterate slowly upon it and build out what you need.

Conor Pendergrast (37:42)

Yeah.

I will call this out actually as a bit of a risk of Claude, Claude Cowork and of Claude Code ⁓ is you can very easily just keep going with stuff. So before you start know like what is the actual end goal that you want to get to and then sort of gently resist Claude doesn't seem to do it as much as I've seen ChatGPT do it, resists the like, Hey, do you want me to do this now? Like just.

decide where you want to get to, get to there, and then give yourself a nice long break. And yeah, you don't need to keep tinkering all the way.

Thomas Hils (38:26)

I think that is such an important point. ⁓ There's always like another thing you can do. And once you kind of get the ball rolling, it is exciting and fun to see what you can do just from your words. know, the fact you can produce all of this tooling without writing any code is just like a superpower. And sometimes I'm a little drunk on that superpower.

Conor Pendergrast (38:45)

Yeah.

Yes, me too. Yes. ⁓ In a lot of ways, it feels like something that I've, I mean, it's definitely something that I've always wanted to do. And ⁓ I think in the support side, we've had, we've had to beg for, for technical resources and engineering resources for decades. Well, decades in my case, but ⁓ yeah, now it's just totally magic. Thomas, this was fantastic. You've got another episode where you're going to come back and show us something else later. So.

anyone who is watching now and enjoyed this, think will enjoy the next episode. But we won't talk too much about that now. So click that lovely subscribe button for now. But Thomas, where can we find you on the internet if people are interested in what you're doing?

Thomas Hils (39:32)

Yeah, the best place to find me is my personal website, thomas.town. It's got all of my contact information there if you need me. ⁓ Yeah, I tried to buy thomas.everything for a while.

Conor Pendergrast (39:44)

So it's t-h-o-m-a-s dot t-o-w-n. That's a really good one. Well, visit Thomas Town anytime you want. You can find me and my daily email list all about Intercom and Fin at customer success dot c-x slash daily. As I said, click that subscribe button. If you enjoyed this episode, click the like button, share it somewhere on the internet, feed my ravenous ego.

Thomas Hils (39:48)

That's right, yeah.

Conor Pendergrast (40:10)

and we will be back in the next episode of Support Stack. Thank you and goodbye!

Conor Pendergrast

Support Stack E13: Fin Handles the Easy Stuff… Now QA Gets Harder

Episode transcript

Support Stack E14: The Missing Link Between QA, Docs & Fin AI, with Thomas Hils

Support Stack Solo Ep 03: How to Build a Fin Procedure in Intercom

CustomerSuccess.cx