Skip to content

On - FC Validation Session - 2026-01-28

Metadata

  • Date: 2026-01-28
  • Company: On
  • External Participants: Federico Morando (Treasury Intern), Amanda Mitt (Treasury Analyst), Lucia Galan.caceres (Treasury Team Lead)
  • Palm Participants: Emma, Giannis, Art Koci, Simon, Jennifer Pearson
  • Type: Customer Call
  • Domain Areas: Cash Forecasting, Forecast Performance, Variance Analysis, IC Activity

Summary

Context

Second forecast validation session with On's treasury team. Palm data specialists (Simon, Art) presented how forecasting models work and upcoming improvements. Federico is working on forecast comparison dashboards for a meeting with Christoph end of February.

Key Discussion Points

  • How Palm forecasting works: Weighted ensemble of ~20 models per category; highest-probability value selected; weights adjusted based on model performance
  • Coming improvements: Surfacing evaluation metrics to customers, better handling of noisy data, exogenous factors (holidays, salary dates), user interaction to input known patterns
  • Recategorization impact: Forecasts regenerate daily; when categories change, Palm backfills ~2 years of forecast history with new categories (data preserved, not deleted)
  • Two workstreams for Federico: (1) Forecast vs Forecast comparison, (2) Operational cash + investments position chart
  • Category-level accuracy: Team wants to track which categories forecast well vs poorly - currently lack this visibility
  • Intercompany in variance: At global level, IC nets to zero. Entity-level shows misleading variances without IC forecast.

Pain Points

  • Lack of visibility into category-level forecast accuracy - can't see which categories drive variance
  • Confusion about whether recategorizing transactions "rewrites history" - makes paper trail harder to follow
  • Entity-level variance looks wrong when intercompany isn't forecasted (but global is fine since IC nets to zero)

Feature Requests & Needs

  • Category-level forecast accuracy tracking (not just global balance)
  • When comparing forecast vs actuals, exclude categories that aren't forecasted (like intercompany)
  • Long-term: Forecast intercompany (hard because it must net to zero across counterparties)

Jobs & Desired Outcomes

Job: Demonstrate forecast accuracy to non-treasury stakeholders

Desired Outcomes: - Minimize confusion when showing variance reports to leadership - Increase confidence in forecast data by explaining what's included/excluded - Reduce skepticism from stakeholders when entity-level numbers look off

Job: Track forecast improvement over time as categorization improves

Desired Outcomes: - Minimize effort to compare forecast accuracy across different time periods - Increase visibility into which categories drive accuracy improvements - Reduce ambiguity about whether changes helped or hurt forecast quality

Domain Insights

  • Forecast timing: Palm runs forecasts daily (not just weekly), picking up category changes next day
  • History backfill: On major category changes, Palm backfills ~2 years of forecast history
  • Christoph meeting: End of February deadline driving this work; showing global-level only
  • Palm Chat preview: AI agent that queries Palm data, creates visualizations from natural language prompts

Action Items

  • [ ] Federico + Giannis to continue forecast comparison work
  • [ ] Explore category-level accuracy tracking (may not be ready for Feb 25)
  • [ ] Palm kicking off intercompany improvement project

Notable Quotes

"It's hugely important to understand what drives the forecast. Which categories can we track accurately? Which categories are by definition harder to track and how can we improve them?" - Giannis

"Long term strategy should be to somehow forecast intercompany because otherwise we need to add that manually if we want to use this operationally." - Amanda

"The tricky part is the counterparties and forecasting in pairs." - Emma


Full Transcript

Me: Hi. Just gonna share a little AI.
Them: Do we have tldb? I actually don't know how to. Go. I think it's a good idea.
Me: Hel. Lo.
Them: To add tldv. Yeah, but I need to log in with Chrome because I don't have this extension here. I can bring out the pen and paper, no problem.
Me: Hey, Amanda.
Them: Hey. Sorry. Amanda is super loud. Can you hear us? Okay? Hey, Federico. Hey, everyone. Hello. I hope you don't mind, but might record these two sessions just to get some. So we can get some good feedback. Yes. We're going to wait for Lucia. To join. Have you guys met Simon and Art before? I think it's the first. For me, at least. Nice to meet you. Nice to meet you guys. So I think the purpose. Then you can kick it off. Are we waiting for Lucia for this session, or is he just going to join the next? I think she was going to join one or the other. Okay? I need this one. But I think we can slowly start. And we just got to kick off. Simon and Art, data specialists. They're prepared, sort of some slides about how the forecast is prepared and some metrics to measure forecast accuracy. But if you have any questions during the session at all, just jump in. Before we hand over to. No, I think we can kick it off. Over to you. Well, the purpose of the session, like Jenn said, is to work on the, let's say, forecast comparison. Had a couple of talks with Federico already, and I think it's great to touch the topics of what metrics are important to track regarding forecast accuracy. And how to track this, let's say over time as well. Where we are right now, at least from pub site and what other things are coming soon regarding forecasting. And I think Simon and Art can have created sort of presentation just to cover all these topics. So I'll give the floor to them. Did we share the slides as well, or do you have them? You can show them. All right, let me find them. How do I go into press central mode in Canva? Top right present.
Me: Top right.
Them: Fair. Enough. It will not. Is it properly visible? Now.
Me: Y.
Them: Don't mind the date in the left corner because I did a thing. But anyhow, the situation of forecasting right now. It's quite straightforward. So we're looking at the per category. We make the choice which value has the highest probability and then we select that value, sorta. And we have a bunch of models where we apply. Stored weight on the different ones. So let's say we have 20 models, but one is really good, so it gets the most choice in the decision, right? So it gets higher weight. And then we choose which models given the best of ability. To be close to the actual value. This is already evaluation. And then we do this every week, and so on and so forth. Does that make sense? All right. What will it be like soon, though? So we want to have more informed customers. Of course. So we will provide you soonish. With the evaluation metrics, both for data sets and for a training session. I don't know what to call it properly, but a training session, sure. And then you can see, okay? It seems that during this iteration of the models, the signal is very low. What does that mean? And then we can have a back and forth or you can ask Mr. Shetty Kitty. He's also very well familiar with this stuff. So you can get more bearing and understanding on what's going on as well as you get some intuition on what's going on as well with having these metrics surfaced. And do keep in mind all of these metrics are for past periods because we can't really know the actual values of the future yet. Some reality versus training will be different, but still indicative. For example, if our models were really good a month ago and now they're like, imperfect, let's say, then maybe a flag should be raised, a little warning flag. We will also handle better noisy data. Because as you might know, the data is extremely fluctuating on some categories accounts. And we want to be able to handle that better. And. To score them, be able to put into the data that, okay. This is a Dutch holiday, for example. That means that pay will be one day more. So we can model what's called exogenous factors, and then there's probably a billion different reasons why things are on a specific date, right? But our goal is to model this close to reality and autonomously as well, and hopefully with, let's say, some user interactions, so we can take that into regard as well. So let's say, you know, the salary is the 25th every day or the Friday closest to the 25th. Then you can just click, click that information in somehow. And that will help the models better understand your environment. And still soon a little bit of interaction with your data and art, you can maybe talk about Palm Chat. Yeah. So Palm Chat is a slightly different project to forecasting. What Pom Chat is is a bunch of AI agents that sort of reads our data, and then you can sort of just ask LLMs on questions you have about forecasting transactions and different stuff. I'm just going to show a very simple use case with onstater or maybe actually, Giannis, do you want to showcase on what Palm Chat actually is? I think you're the biggest user. This is currently only an internal tool, but we're working hard to sort of make this available for on, so. You guys can actually use it as well. Before we head to that, I'm sure that you have questions regarding the forecasting, either what we have right now or what's coming. So if you have any questions, please feel free to fire up. Now for me. Question. Local. Federico. No go. I was going to. Sorry. Now, I think you'll get deeper into the models later. Like, analytically. So maybe it's better to do it. But I was wondering more. Okay. The different models consider all the same time frame. Like historically in the analysis. And how they have, like, different settings, like probably about seasonality. So I wanted to understand, like, probably you were going to introduce it later. Probably just came earlier. But it was really curious about that. For me, it was a little bit tricky finding the level. You guys want an explanation, but we can go deep if you want. So right now, stuff like seasonality. And which data is being framed. So, first and foremost, all data is being claimed on, which is not good, because we want to have automatic cutoffs, and we have that in the pipe. But then finding the patterns, et cetera, et cetera. We have that in the works as well. But now we rely on models like just regular time series estimators. We rely on them finding the patterns. But we found that if we model this by more like hand and provided as features that would be more informative for the models. So right now we're considering seasonality. But in the future, we will model it a little bit more. And then provide it as information. And that will be something. If Valerie is on the 25th every day, then we can assume that will hold in the future and that will increase the chances. Let's say that the model can predict that next month. Okay, 25. Here we go. Rather than the latest one was a one. Off week 23 instead of week 24. But it should have been week 24. So then it assumes that, okay, we're having some one offs here. Why is that? And then it tries to predict something in the middle, which is smart, but it's not what we want. And these things, like, it's just not adding to. What we want to do instead is provide more informative features for it and a little bit more handcrafted features. Let's say a better understanding of the world in arms context. Nice. Thanks. Federico. Sorry, Jen. Go for it. And then I'll ask. One of the questions that I had is that as we're making changes to the forecast on our side with the changes that you described, and then a team of honor also making changes when they recategorize things, providing prompt updates and things like that. Isolating the changes between the forecast one week to the next week. Which is based on that and also the actuals. So there's kind of three different points that the forecast is changing every way. I think would be helpful. So I think one is my question is specifically around categorization, because that's where we're seeing quite big changes, especially from as we've changed the categories, recategorized all transactions. Too. And our generally recap, the grizzly. Is when you regenerate the forecast every week. So there's a forecast every Monday for 13 weeks. That is locked at that point in time. But then if the categories change, Then that forecast does change, is that correct? When you compare Forecast v Forecast. If you are changing historic forecasts by recateging transactions that. Will muddy the waters a little bit on the impact, Is that correct? And then leading to that. If the team Gwynn and recategorize loads of transactions this week. Will they see the impact of that again on Monday? Does that make sense? I didn't necessarily had quite a few questions. Do you want me to answer this, Simon? Yeah, go ahead. Yeah. So we actually are running forecasts every day. So if you change categories today, the next day, the forecasting will usually take all of the new categories, like we overwrite the history and our data. However, in most categorizations, we are also running backfill. So I know recently there has been a category change for on. So in that instance, we actually recreate the whole history with the new categories, not for the full history, but for the last two. Two years, I think. So technically, yes, but then we're always making sure that we are rewriting the history with the new categories and updating them.
Me: Es.
Them: I hope that makes sense. I guess it's the best practice really, that when there's two update categories as soon as possible. So then. Because if you're updating the old transactions and you're rewriting history, that does make the paper trail a bit harder to follow if those numbers are moving. So better cadence on the recategorizations will just make the understandability, the history of the forecast easier. Yes. Although forecasting is heavily dependent on categorization, it is sort of like an isolated system. So whatever that categories are, the model will pull the loss categories for that day or from that instance onwards. If you want accurate forecast that are linked to proper categories, then we need to update these categories as soon as possible. Ideally, as I said, we can always rewrite the history of forecasts. But yeah, if there's changes that we should also update the categories. As soon as possible.
Me: Would love to explore this more in detail. Up until now, an assumption has been that you would actually like to keep a lot of your history intact so that you can see. Even if there is variances in your history, you can have the explanation being, while we recategorised a bunch of these transactions and therefore from that point on, the forecast looks like this, so that you'll keep that log of events that explains the evolution of your forecasts and why they're more accurate. Today than they were six months ago. And then it will be easier to see the progress you're driving from the act of recategorizing transactions, for example. But super happy to dive more into this. I think that it's a very relevant. Very relevant part aspect of how the whole Palm system works as a whole. Like it's a holistic experience.
Them: Yeah. Just to make things clear, we're not deleting data. We're just backfilling the forecasts when we change categories. If you want an actual comparison on how it looked a year ago, then you can always do that.
Me: Very cool.
Them: So I would like to also establish a baseline for Federico work, basically because we had with deco discussion last week and let's say agreed and seed and we saw basically in the Google seats that Federico is working on are two tracks. One track is forecast beforecast comparison which establishes, let's say between the different forecast versions for specific forecast dates that we see for the 13 week horizon that we produce. Forecast one is that. And the second part is basically the nice chart that you're creating between the operational cache. And investments, cash position and over time. So over time meaning in the next 13 week horizon. So basically, these are the two baselines that we have right now. And based on what Simon and Art mentioned here, I think it's fair to say as well that there are more metrics that we can include and track when it comes to forecast accuracy. And I think the guys here already mentioned a couple of metrics that are very important to track. And Federico, if you would like we can pick it up together next week to start tracking those values as well. And the other thing that I want to bring up to the table now, we are tracking global balances, and I discussed it with Amanda also last week, that this is also the most important part. Not sure for 25th of February that's going to be possible because I guess it's a lot of work. But what would be extremely important is to track also the forecast accuracy. In a lower level, meaning in categories. So it's hugely important to understand what drives, let's say, the forecast. Which forecast categories we can which categories can we track accurately? Which categories are by definition, harder to track and how can we improve them? And I think right now we don't have this visibility in the level that we want. And at the same time, To see how with your categorization you improve also the algorithm on the back end. So to compare, basically one year ago, two dates, let's say the difference in forecast accuracy is also important. And to identify also where it improves and where it declines the forecast accuracy, because that can also signal that either there is a mistake in data categorization or more data needs to be feed into the models to provide higher accuracy for the forecast. And the last thing. Also could be that something on our site that we need to have a closer look. So let's start with the baseline, which is a global balance, and the forecast balances for the 13 week plus the forecast over forecast comparison. For the different forecast versions. If we have time. And the capacity, that goes to you, Federico, as well. Let's try to go into deeper level to find those insights as well. Does this sound like a good plan? Yeah, sorry. I think a question that I have as well. Is. Can you hear me? Shopping, or is it Yes. When you're tracking the global balance, for example. And we know we struggle with this intercompany movement because we don't park at them. So, for example, I think some entities that we have a negative balance in the quarter, which makes sense because we don't forget intercompany. But we'll cover this one. When you do the various analysis, do you also exclude if intercompany movements somehow like making a formula with the category? Out of the balance, or you just do a comparison like we do. Very simple kind of balance forecast balance versus actual balance. I think that's super interesting for us to understand because sometimes I think the forecast will look very wrong if you see just a balance, but it's actually true because excluding the intercompany but some way we could visualize that. I think it would make it very would make us feel a little bit more confident with the data, especially when showing to other non treasury stakeholders. Art. Simon, Would you like to pick it up? Well, I had a hard time hearing. But we're talking basically about can we show intercompany transactions well and can we handle them well for the total variance calculations?
Me: I'm hearing you'd like to have, like, one for operational performance, and then maybe we can layer or remove in the company's stuff.
Them: No, I think, I think what Amanda said is like, if we're not forecasting one category, let it be intercompany or something else. Do we also exclude it from the actuals that we compare the forecast against? Because there always be.
Me: That wouldn't make sense. For sure. 100%. But I'm also curious about. Is that. Is that global level enough for you, or is it that the entity level what you need?
Them: I mean, at the global level, the intercompany should be zero.
Me: Then it will net each other out. So then it's not that.
Them: Level, I guess. Right, Amanda?
Me: But for my understanding is you have a meeting with a Christoph end of February, right. And you would like to showcase.
Them: Yeah.
Me: Only global. So then for that meeting in the company is less relevant.
Them: Yeah.
Me: So this.
Them: I mean, Amanda, has this scope changed? Because I think we're only showing global right now, right? Yeah, we're selling global, but I was just curious how you visualize the variances and how you check them if the balance of being so different in reality from the target. But yes, to your point. So we're not going to report on this just yet because we are looking only at the global. But. When we speak about the usability of the forecasting tool in the future, then the entity level becomes relevant. It's either we start creating forecast for intercompany and then we need to make sure they match to zero so the global is not mistaken. Or we don't. But then for the variance analysis, But to be honest, I think long term strategy should be to somehow forecast intercompany because otherwise we need to add that manually if we want to use this operationally, right?
Me: 100% align. We're actually going to kick start a project right now to take a look at how we can improve on our intercompany.
Them: Yeah. It's a tricky one because you want to make it zero. So that should be a hard coded thing. But then where do you tape or add?
Me: The tricky part is the counterparties and the forecasting impairs. So that's where 100%. Cool, but let's chat more about this if it's not a super prioritized thing for now. Cool. Good. Amazing.
Them: Us capable of modeling the art world, more or less. That's a start of doing the intercompany analysis work as well. So it's a little bit of a stepping stone, but these things are quite tricky. I see that. We are also five minutes out, but we have some more to show. We haven't shown palm shaft for now. And we also have one and a half more slides. Should we. Could we get on that? Absolutely. Thanks for the pink, Simon. So back to the conversation that Art started previously. We have a prototype of Pump set that basically I explained verbally last time. So what we can do here I created a prompt and I insert basically the image that I copy. Paste it basically from the Google. See that you have, which is the cast plus investments cash position with the minimum operational cost line and the total CASAs full overview. What the system will do is basically first, so the prompting is. Very simple. Now it's a bit more detailed, but in the end it shouldn't be that detailed that we specify basically what we need to show the operational cost plus the investment for the horizon. So now it's 13 weeks and basically you can also install. Where do you would like to draw the minimum operational? CAS line and what the system will do. Is go and track. Basically find the right data sets and the right information from the backend, let's say, because the information already lives in the databases. So the only thing that we need to do is just find the right ones, instruct the system in the correct way to fetch those data and bring it to surface. And after that you can do anything basically with the data. As you can create visuals without needing to go to drag and drops or selecting data sets yourself. You just do it. Just mention to the system I need stacked area, for example, with these values and it will just execute it. And of course you can iterate. Yourself on the chart and instruct it better in upcoming versions. But the only thing we want you to do in the end is to write the right prompts and check if, let's say, the data are reflecting the reality. At least in the first versions. Take couple of minutes to run, but it should eventually create the right data sets. Yeah. Basically, the idea is that now you can have, like, a personal data analyst. I think I'm using it almost daily to get data analysis, quick questions based on our data and. Basically what this does is it's running queries that normally data scientists or analysts would run, and then it's doing it for you without too much complaining. And probably much faster. But, yeah, this is still working progress. We're going to try and make it a bit faster, a bit more. It's quite accurate, but sometimes it goes in loops until it finds the correct result. But, yeah, I think Giannis is probably much better at using this than I am at this point. And that's the beauty of it. You only need to specify what the end goal is. Do not specify how the system needs to do to work towards that setting goal. Just mention the goal. And the system will find the best way. The agent will find the best way to reach that approach. If you see above, it says, hey, I saw that. I found some really large numbers. 107 billion. What is this? It doesn't look correct. Let me go find the right data set. And it indeed went to find the right data set. And it says, okay, this looks more reasonable in the end. And now it has a data. As soon as it has the data, it kicks off a different agent to draw basically the charts for you. That's super cool. Like, even if it's still Beta. Probably have your personal agent analysis. You cannot imagine how much we're using it for different purposes as well, from debugging, from quick answers, from developing new tools. And here you go. Basically, you have the total cash. It detects also when it's the maturity date from the investments and after the last maturity date, it sets them to zero. And it pops them up, I guess, to the operating cars. And voila, let's say. Any questions? So basically, it pays itself on all the data present in palm. Basically. You don't have to np any. You can basically everything. As you said. Everything that lives into as data into PoM, you can use it. And the nice thing is, if you have additional data, like a file from AR ap, you can drag and drop them to, let's say, Claude. This will analyze as well those data. It will surface the data from combine those data together and create additional not only dependent on the Palm database as well. Nice. It's really cool. If you have any other questions, let's say regarding what to prompt, we can set up another session and you can find away your questions. Towards the 8 10th and see what it comes up with. But, yeah, this is, let's say, the little demo. Super curious. It's quite good at explaining why forecasts are bad. That's always good. Thank you. That's really cool. All right, I think we have another session. With Dico, which I think lives in a separate link. Can't I join that one? But thank you so much. Nice to meet you. Art Simon. See you next time. Lucia, nice to see you again. Long time no see. Yes. See you soon. Bye. See you.