this is about Facebook Stories let's say as an analyst you have found that very few users click through this story functionality the functionality was luned with the hope that we can create a lot of engagements and momentum but at this particular metric we are not seeing a lot of Engagement and the CTR is pretty much low then your task with measuring the success of this feature without using any AB testing let's go through the process I mean about the testing strategy or like any analysis that you would use and also like metrics that you would
use for like determine the success of stories as a feature hello everyone this is Hy from interview query as intro to myself I have been in Tech and particularly data science machine learning analytics for the past 10 years or so with companies like group M Amazon meta and Alberton and some startups so I've been helping different candidates and I've been like acting as a career coach in data science from like five years ago and today I'm hosting priia for Market inter rview and priia before we get started with the market interview want to go ahead
and like introduce yourself and talk about your career your interest in data science in general and how did you get into like this career hi this is Priya I'm currently working as a product analytics manager at Walmart and I have been into analytics for almost now 8 years and the reason I love analytics and data Sciences it's mostly about the user Journey user Behavior so I'm really passionate about studying the user Behavior I mix the user psychology with the data I literally marry them with a product together for me the Target is always that how
can we actually make a business impact right out of the data that we are seeing so I started my career with just being a table analyst and then I moved to United States to pursue my masters and learn more technical details on it after that I worked at realtor.com I worked with at lashan I worked with Twitter and now I'm working with Walmart so yeah with this interview query session I definitely hoping to get a constructive feedback on my structured thinking and how do I think about the user Behavior user analysis and yeah awesome and
it was also like very good to learn about your yourself and background so let's get start with the the question so typical data science role at Facebook which is a a product analytic archetype let me think through it uh that okay I am I'm someone who's working on Facebook Stories and very few users click through the story functionality okay so my first question would be what do we actually mean by in fact U just a second let me and I want to understand the feature first right what are actually Facebook Stories so just assuming Facebook
Stories is a status that usually user uploads and it stays there for 24 hours so there are users who are actually creating stories like there are two segments and the one who are actually viewing stories right so a user creates a story he Taps on a button he loads some something it can be a video it can be a photo it can be just general wishes and then viewers can view it how they can view it through the messenger they see a DOT right they can view on the profile picture or just a Newsfeed update
that they get right and after viewing the story they can react comment and conversations can start so I'm assuming this is what the Facebook story means and when we talk about the story functionality like very few users are clicking what part are we actually talking about is it like viewers who are looking at the story they are declining or what is actually happening so I mean first of all I mean your understanding about the story is correct but then people react to stories in different ways they could just like basically pass through like uh hit
the next button uh they may just like react to that just liking or any kind of emojis also they may share stories one particular action that we saw that is not necessarily showing good numbers or like very low rate is like the clickr rate which is basically just like clicking on that story maybe going through it different link related to that story so that's basically the particular functionality that we were seeing that is not necessarily showing a lot of like good results relative to at least like the Baseline that we anticipate I don't know maybe
in different products maybe in ads or like other features that we like monitor click like TTR for so that's basically the observation so far okay so just to clear so within the story the clickthrough rate to some other link it is going down it is not as for the expectations this is what we yes yes got it so yeah so like because I'm tasked with measuring the success of Facebook story Facebook Stories was something that were launched so that people can view it and ultimately they can receive some of the feedbacks interactions can start they
can receive more attention and overall engagement score goes up and hence the average time a user is spending on the Facebook increases we are getting more business Revenue plus the Facebook's mission of connecting people together interactions in right so this is a short summary of what can be the goal of Facebook Stories both for the business and for the user well I'll just write it down go of Facebook story right so now there are different parts of it right when I measure the success of the Facebook story there can be different parts first would be
what is the top of the funnel how many users are we acquiring right how many users they are actually viewing the stories and reacting to it and how many users they are recreating the story like they enjoy creating the story so much that they are engaging with it again and again right the retention rate so there can be few metrics like anything that cannot be measured cannot be measured upon the success metrics so it would be number of users uh creating stories percentage of users clicking stories so number of users creating stories or percentage of
Facebook active users of Da uh creating stories now there can be stories created but if the data is not really interesting enough people can just you know close it even before the story finishes right so what percentage of users they are completing the story which means they watch it through the entire watch time entire watch time so this is one of the key engagement I would say then what percentage of users they are actually receiving any feedback or any kind of interaction it can be a message it can be a like love percentage of feedback
received so it can be live love message Etc any kind of feedback and what percentage of users they are recreating stories stories okay sorry stories and this also includes share since you mentioned that share is also and overall basically this should actually lead to an increased average time a user is spending on a platform right so here would be one of the key success metric what is the average time spent on every platform all right and this is a metric that I will use against measuring the key success since we can't use the standard abest
I'll Think through some of the other techniques but yeah this is the key metric also percentage of time spent on Facebook for creating or viewing stories so for example a user is spending on an average 10 or 15 minutes on Facebook out of that maybe 3 minutes are allotted to the story there are so many features but this feature is something I would measure the percentage that what percentage of attribution time spend is coming from this feature now average time spend on platform actually increases my user interactions and my business goal so this is something
I would like even if the click through rate of the story functionality is not meeting the Baseline but if I am increasing the overall average time spent on Facebook platform then the story is successful but there are some things that average time spent on stories has increased but average time spent on Newsfeed has degrees so there are some of the counter metrics that I would look at that because of launching stories these shouldn't have degrees so they can be uh average engagement score score on other Facebook features so when I a engagement score it is
a combined score of the likes love the post everything and it can be on the news feed average engagement on the groups right average engagement on the timeline right so that's why I meant what are the different Facebook features so news feed groups timeline all the different one so yeah this would be one of the counter metric now to measure the average time spent on Facebook platform we don't want to use standard AB test okay so there is something like we know that like AB test are gold standard for measuring the causal inference that okay
Facebook Stories is causing an increase in the average time spent but if not then can we use the other causal techniques right this would be something that I would think through uh how to measure success without a test so yeah I think one of the causal technique that I would say is mediation modeling mediation modeling is uh just running a regression through and making sure that stories is an important feature that is actually mediating so it would be users on Facebook platform platform because they are on Facebook platform they are actually using stories and because
they are using stories average time spend increases so this is something a mediation is that basically your main metric that uh you're going to be monit in as the success as the measure for success of his stories that right average time is spent yeah this is one of uh the main metric yeah average time spent on Facebook platform and this is the counter metric that okay engagement on the other features it shouldn't go down okay average time spend on the Facebook platform then uh frequency of frequency or number of stories created per user like if
you want to see the engagement number of stories created per user and then from the viewer side number of stories viewed per user viewed per user like all these metrics number of stories created number of stories viewed everything will actually increase the average time spent so that was the main metric that I was pointing to fair enough there are a lot of different supporting metrics right average time spend on the stories like whatever I have put in here there are a lot of supporting metrics But ultimately they will lead to either the increase in the
frequency of user visit or overall time span right depth and the breadth of of the time that the user are spending Facebook platform so so if you want to summarize what we did so far in terms of like the the relevant metrics is it fair to say that while I mean the some indications are not necessarily satisfactory when we launch this but then uh you're you're chasing some others like uh metric that you think that going to be more relevant to gauge the success of like Facebook Stories okay so just like trying to confirm that
we are on the same page on that saying that uh yeah I mean the we probably shouldn't Define the success of stories by CTR I mean maybe there are like a different better ways I mean to define success is that the right understanding yes yes not just by ctrs there are different ways to define success because ultimately what do we want is people to revisit to view the stories again right view different number of stories because having click through on the ctrs until it is some advertisement that Facebook is getting paid for it is still
not useful for my business right people are spending more time on my platform they are viewing more ads and they are clicking the ads on the Facebook platform that is the click through I would care about users create a link and then other viewers going to that link that is not the key metric for me for me spending more time and clicking on the ads that Facebook is showing to the users which ultimately is giving me more clickthrough rate and more click through revenue is important yeah and that is why I have kept just one
success metric because when I'm measuring in like even in AB test right we have a standard criteria that this is the primary metric that should be either neutral or should go negative on the launch decision right so that is average time span and to support these secondary metrics these are all the secondary metrics that I'll be looking at that I have underlined so yeah now how do I measure without AB test right so without AB test uh like apart from this there there can be two different cohorts users who are using the Facebook Stories and
users who are using all the features except the Facebook stories those users can be completely alike and we can create two cohorts right like the user belongs to the same continent same place right at the average time usually spend on the other features the same on the other features the average number of friends so we can create those clusters right manually at the same time and after creating those clusters we can just measure the average time spent on the users for those with stories and those without stories so this is something I would call as
matching methods technique uh technique so it is like matching two set of users who are exactly same just that one is using stories and one is not using stories at all and I'm sure there'll be a cohort of users right like that so without abest this is something another thing is that do we see a sudden shift in the time spent once the stories was launched this can be done by interrupted time series method so it is like before and after we create different batches of time every company has a forecasting Baseline that okay this
was the average time that would have been spent by the users in the month of July August but if we are going above or beyond the average Baseline that we created then is it at the time of the stories launch so there is an interrupted time series method that we usually use and I would love to have that then we can create a model when we say model I mean like tree based model RF or anything where we can try to find the feature importance feature extraction that is stories one of the key feature in
time spent that model can be actually a regression against the average time spent right or it can be a regression against zero or one which means zero means below the threshold time spent one is above the threshold the time a user is spending so these are few of the methods which I would say and again average time spent on Facebook platform because of the stories right so that is why I'm again specifying that average time spent of the user when I'm making a decision tree model or when I'm doing matching methods or interrupted time series
I'm taking care it's because of the stories got it so what we go ahead and like maybe like uh dissect some of these approaches diving deep into like methodologies for example talking about the magic methods what do you do then I mean after like you find like the two counterparts on both sides then what would be the next step for you okay so there are for example there is a group of users who are living in SFO right they have almost the same number of friends and they have almost same the demographic and behavior that
users in SFO with stories they have right but overall the average time spend of users with story is increased right so we know that okay percentage of time span on stories is also counted so it is increased so I would definitely firstly declare it as a success because this was the main right testing strategy to determine the success of the story I would first Identify two different class of users who are exactly same just that there is one difference that is that one of them use the story and one of them don't use the story
at all this would help me in determining the success of the stories are there any other actions would you use uh like in comparison uh would you use any like sort of like test statistics to like uh basically call it a success or like uh inconclusive is that even applicable to this case Okay so uh yeah regarding the test statistics so when we are creating two different clusters of the users we can use maybe the kai Square test just to identify the population two sets of population that we are comparing they're actually statistically different from
each other right like I can definitely love to recall those details but yeah I used it somewhere long back in my career where we used K Square to identify if the two segment of population they're actually same right or if they're actually different right to create similar segments we can use K Square now just to measure the metric for that segment average time spent for two different cluster we can use a T Test where we measure the P values right and we can set a significance level that okay if the P value or if the
average time span is in that 5% of the area where we don't expect it is an outlier right compared to the cohort one like it's a simple T test that I would use T statistic got it and then like um are you familiar with like propensity score matching yes so I am familiar we are giving score to users and then we are matching each of the different cohort right so it is regression based a counterpart of mine he did that at Twitter where they were actually measuring and giving scores to different users and then measuring
the users with the same scores okay point2 users in cluster a point2 users in cluster 2 right that's the probability of the users or the propensity of the user to do a particular action mhm this this is what is my understanding correct me if I'm wrong but yeah that that's right and and potentially that would be applicable to this as well I mean the if we find right action and then build a regression upon and then we can actually use that for matching um let's talk about like U another approach you mentioned could be regression
based or any sort of like uh modeling that can include our compounding variables as input and then of course I mean story usage as a factor can you elaborate more I mean how do you make sure that it is conclusive or like is it at least your your model can explain like U the difference or basically how do you avoid biases in that I mean all sorts of like practical considerations in building that regression model yeah sure so uh for example let us assume that Facebook definitely has a threshold Baseline of what is called as
the average time spent they consider as a good one and bad one right I would just categorize into two things zero and one like and I'm talking about the daily active users so daily active users users who spend 5 minutes or less are zero users who are spending more than 5 minutes like 5 minutes is just a hypothetical example rest depends on the frequency distribution of what the Facebook data actually is right so maybe like 80% of the users they spend 10 minutes I don't know that currently what is at Facebook but depending on the
data there will be a threshold bar zero and one right so users who are not spending time like zero and one who are spending enough time now when I create a regression model against the zero and one putting in all the user touch points which means the geographical characteristics the behavior characteristics and the user demographics right it can be age and it can be friends it can be number of groups they have joined everything if I look at the top 10 feature importance like there's a feature extraction technique especially in all the tree based models
right if I regress and if I find okay uh users who are using the Facebook story zero or one this would be one of my feature right using stories or not zero or one would be feature if that feature comes in the top 10 features I would consider that stories is actually having some of the significant impact on making a user spend average time got it okay and uh are there any like uh other techniques that you particularly use I mean stuff like in addition to feature importance which is of course very important but any
other like statistics based approach I mean to identify the significance of like a particular feature in our prediction mhm yeah feature importance is one of them and then like if we are if we are doing a simple regression model it is usually the R square and the adjusted R square that we see that it is closer to one if you're doing a simple regression model yeah let me just try to recall what are the other practical things that we can have and when we are definitely creating a model it should have a correlation like when
we put features right we create usually a correlation metrix we usually reject the features that have two less correlation awesome yeah and then of course I mean the there there are some techniques that we can actually report a P value per feature for you I mean given a model construct oh yeah and then you can use basically F test like to basically uh like report those like P values so that's basically just like representing the significance of the feature I of course I mean there's always debates between like machine learning community and the statistics Community
machine Learning Community doesn't care much about like feature significance they just like put everything in there as long as like the prediction power is is okay and it's satisfactory they're fine with it but I know that like the statisticians are really like strict about like making sure that whatever we include there is statistic from like feature perspective but I think for this particular application that's actually a valid uh analysis to do I mean you try to see if this feature which is your feature of interest is even like significant for this prediction or not I
think that that would be a nice filter to have I mean even before getting to the fature importance or like other types of analysis M so you can actually maybe stack like this one like a simple regression model to see the significance but then maybe you can also like combine it with other techniques I mean if you want to like rank them of course I mean you can always do that with regression but there's also debate I mean Some people prefer tree based method for a more systematic way of like feature importance analysis and ranking
yeah yeah just that with tree based method instead of regression I will get uh scoring to each of the feature like okay this is all the features that they again total to the 100% but what percentage is my Fe feure driving that is something that I will get and there are nowadays many many libraries they just give you the ranking of negative positive and with how much percentage of course I mean something like shap is also applicable here yeah awesome shap values yes yeah so yeah you also mentioned interrupted time series I would like to
discuss that a little bit as well and any considerations about that what would be your model of choice and if when you're doing that time series prediction interrupted time series is something that I have read through because many people have nowadays been using I have not particularly used it but what I understand from it is that you actually compare your result against the forecasted model that you already had forecasting your average time was supposed to be 14 minutes but suddenly it is 15 minutes or maybe 14.5 minutes this5 increased when did it actually happen did
it actually happen at the time of launch of the stories right so this is something that we actually measure against it is just measuring the success immediately pre and post the launch of the feature yeah that's right in some communities they call that like forecast as as a counterfactual Bic that's something that is not necessarily the reality but that that's uh without necessarily intervention coming from stories that would be your your fact or like at this point counter factual then that would be your benchmark to compare against yeah and that would be like my least
preferred method like interrupted time series because forecast are anyways 100% not reliable and then measuring something against the forecast again is how can it be that much reliable right so it is have any more data points we just measuring pre and post and comparing it against the forecasted Baseline so yeah I think one of the issues with that model is I mean depending how strong you're forecasting or how how far have you gone to capture different seasonalities you may hit a particular like time frame that there might be some external factors coming to the picture
but you haven't captured that before and then you you're capturing it in your like real data but your counteract is not necessarily like it doesn't have a memory of that so that that particular seasonal event so then these are some of basically Corner cases for this and and many more I mean this is like somehow okay but you you don't know I mean what is the usage pattern like of stories among all population I mean people may come into the picture like at different time frames like different cohorts and not sure how exactly this would
account for that maybe not the best approach but synthetic control approach or like matching you mentioned I think you have more control I mean you're you're sure that they have been using the feature or not using the feature in a more like deterministic manner rather than just like treating the entire population the same way yes so and even if you know we cannot use a standard AB test and if I have to make this decision pre-launch what I would do is I would maybe launch it some few of the continents right and then I would
keep few of them away so there we can use the difference in differences approach I can like the different geolocation geot testing not the standard yeah that's an interesting approach I really like that but again I mean a potential Pitfall would be making sure that those two regions that you picked are like comparable I mean the particularly the in the pre period I mean they Behavior are pretty much following the same Trend yeah if you're like arbitrarily picking to the locations I mean you may not be able to satisfy the conditions for for that particular
approach so that's why people sometimes people do a little more refined way and like pick the combination of like urban areas or like Suburban areas like different like types of countries with like their like similar counter Parts rather than just like the entire big region like continent level so it may need some more refinements at at a lower level to make sure that your your groups are compatible with each other yeah that is very difficult to attain but yeah like those are the methods that I would use if my matching method or my P values
and all those methods they don't work and I don't see any results in them so cool uh that's awesome yeah I think we can uh wrap up the solution piece and then talk about feedback I mean first of all I enjoy the conversation I mean definitely showed like good intuition about the business value and the functionality and why certain metrics are important for for this uh particular feature and I think I mean you covered a lot of metrics of course I mean in terms of like the applicable metrics you can go on forever I mean
so there there might be a lot of like other methods like I don't know unique user per stories or like even like in terms of the interaction between the like U people and the stories what else we could uh consider potentially there there's a lot that we can talk about but I think um your like list was pretty much comprehensive enough and covered at least all the main points I mean particularly uh focusing on something that is considered like Beyond stories as the success metric for the stories I think this is something that some people
may miss I mean yeah I mean we can talk about all the success metrics in general how a particular feature works but in especially if you're working at a big company with a very populated and and crazy work space a lot of features like working together it's easy to lose the focus from like the north star of what we're trying to achieve in general rather than a like very like basically small perspective of a a particular future success I think that that's a big indication that you address pretty well and then yes of course I
mean you covered some of the what we typically call as like causal imp approaches um I think the I and there there are some more but I think that we covered that pretty much like some of the most established ones and um yeah I mean uh in in general I would uh definitely uh do more BR up maybe on details of some of those approaches some of the corner cases regarding like the the causal imprints but I think you did pretty well in terms of like covering different approach with different tastes and and flavors of
causality yeah and um that's it for in terms of my feedback I think in general you did a great job thank you thank you so much so I also thank you Han if I really liked interacting with you chatting with you especially about the different approaches the pros and cons the assumptions we need to make so yeah I think I am ready to pick another book of judia pearsal inference yeah that's a good motivation awesome yeah awesome thank you so much thank you