Soridata - Data handling Transparency


In this page, we cover all aspects of data handling at the site. How data is stored, treated and maintained, and how different stats and values are created.

Table of Contents

Server Actions by Time

The server automatically performs several actions depending on the time of day. Server time is UTC-3 (Brazil/Brasilia, Brazil/São Paulo, Argentina/Buenos Aires), and the video update regime uses UTC-7 (PST), the same as Youtube.

Here are some highlights:

  • All important actions are done sometime every day, usually at 0:00 or 12:00 server time.
  • Videos released in the last 3 years are updated every day (priority updates, meaning updated before anything else), the rest are updated every other day.
  • Audio-Videos are updated as fast as possible, as of 2024 it takes a little over 10 days to update them all, and are updated during all day (a little every hour). However since late 2025 Audio Videos are now being handled with less priority.
  • Circle Chart data is updated on the day it is released; Billboard data is updated on tuesdays (day after it is released)
  • Melon Chart data is updated a few hours after it is released, around 5:00 server time.
  • Awards tries to be updated a few hours after announced, but sometimes fail and take more time. Currently they are manually added.
  • All averages and calculations for graphics, trends and awards are done daily or real time, with a few stats performed only monthly (first day of month).
  • Some server-side page caches are reset at 0:00 server time, others at 0:00 PST

Youtube Music Videos

TL;DR: We update views and likes every day for videos released in the last 3 years, and every other day for others. Videos that are offline are marked as dead and no longer updated, but we check if they came back online every month (automatically).

Every 15 minutes, our server automatically checks a number of videos using Youtube API. We have allocated 23 hours for normal video updates, and one hour (the last of the PST day) to guarantee that videos near milestones, and videos released in the last 3 weeks, are precisely updated.

The data retrieved is global views/likes, and not just Korean views.

The data retrieved from Youtube API are:

  • Views: If the video is new within the first 3 weeks, a precise daily view is obtained gathered every day around 23:00h PST. Otherwise, the daily views are calculated based on the difference between the new views and the last views (previous day), divided by how many hours have passed between updates (while most videos are updated with an average of 24h in between, server pace can cause these to differ, so it is important to normalize the data. For instance, if a video is updated after 26 hours of the last update, the daily view will be calculated as New Views - Old Views / (26 / 24), resulting in the correct 24h average views.
  • Likes: the same rules as above, if the video have likes enabled.
  • Region restrictions: Youtube can return either in which countries the video is allowed, or in which countries it is disallowed. Videos are kept regardless of region restrictions, with said restrictions displayed on the video details. Videos blocked in the US are not kept because that is the main demographic (we might change that rule later)
  • The following data are also retrieved when the video is registered: Channel ID that posted the video, Time it was published on Youtube, Video duration. All except video duration are saved in the database, duration being used only to validate the requirement of minimum 2 minutes length.

Music Videos have 6 months of daily statistics preserved (from publish date to the 181st day). Holes in this stats can happen if the video is temporarily unavailable, the server fails to retrieve data on that day, or videos older than December 2020 (In fact, some videos have daily data as far back as June 2020 but a severe server downtime of 5 days in December 2020 (and several weeks between October and December that year where the daily system were not active) prevents this period to being considered contiguous). Notice, however, that only videos from the last 3 years are updated daily, with older videos updated once every other day.

Videos that return a negative number of daily views or likes will have that daily stat filled as 1 instead of the negative number, which usually means the following days will be used to "catch up" for the deducted views or likes. This is done to differentiate from actual 0 views/likes, and to prevent negative data being stored (this is a design decision, not a technical limitation). Negative daily views are common and happen on average from 5 to 10 videos every day (these are usually Youtube deducting views that it detected from bots or illegal views). Negative likes are much rarer and happen very sporadically.

Every time a video is updated, it will also verify if a milestone has been passed (multiples of 100 million views) and store that milestone and date on the database.

Database Video types

TL;DR: Main videos are the Music Video with most views (there can be exact duplicates or different Music Video versions). Alternate videos are videos like Dance practice, Performance, Live acoustics and so on. Audio-Videos are Music-only (most of them available on Youtube Music). When checking stats for a video, "Main views" are for all Main and Duplicates videos, while Total also include every other type.

Videos can be a "main" video (meaning the main MV or only version of a song), a "duplicate" video (exact copy of the "main" video), or an "alternate" video, which is usually a different version/video type of the same song (dance versions, acoustic versions, etc). These are filled out by collaborators at time of registration but can be changed any time.

Upon registration, collaborators can also choose if a video is a Music Video or an Audio Video (usually B-sides). Audio Videos do not have an actual video but only audio, although some might have a simple visualization or lyrics showing.

Main music videos will be automatically switched with its duplicate video if one of two conditions occur: The 'main' video is removed from Youtube (becomes a 'dead' video, which is kept only as an historical/statistical data), or if the 'duplicate' video garners more views than the 'main' video, thus becoming the new 'main'.

While Music Videos that are offline are kept on the database only marked as 'dead' and stop being updated, Audio Videos are automatically deleted from the database if they go offline. A video must fail two consecutive days to be marked 'dead' or deleted.

Once a month (on the 1st day of each month), all Music Videos that are marked as 'dead' are re-checked on Youtube API so in the event the video was just temporarily private/blocked and is now back online, the video will be "revived".

Promoted Views in Videos (Non-organic views)

TL;DR: Data comes from Youtube Charts, if it didn't chart, we can't tell exact Non-organic views (NOV). NO algorithm is used to calculate non-organic views, only to render the daily graphic since Youtube doesn't realease daily values. Only 6 weeks is monitored with rare exceptions. Data starting January 2021 is trustworthy, before that we have some holes in the daily data, and no daily data exists prior to July 2020.

Daisuki was the first site to bring the issue of major promoted views on mainstream videos back in 2019. Over the last years, Daisuki tweaked and improved not only on the method to detect and calculate promoted views, but on more methods and further improvements on precision. You can read all about the methods and statistics used in this article (this method is no longer used to calculate the total NOV, but is used to render the daily graphics).

Ever since March 2023, we decided to rollback and use Youtube Charts as the main source since it took too much time and effort to explain why and how Statistics and Probability works. Out Statistical model is only used to render the daily graphic, and while it might have discrepancies on each day, the total NOV for each week is correct, only the distribution over the days might be off.

For every video that had more than necessary to chart on Youtube Charts (usually around 4M views), the following logic is applied:

1. If the video charts, then we know the total views (since we monitor the video every day) and the real views present in the Youtube Charts, thus, the adviews is Total Views - Real Views. This will result in a precise estimate.

2. If the video do not chart, then we only know the total views, but not the actual real views (since it did not chart and we can't tell). However, we do know that if it didn't chart, all views above how many views the 100th place had (last video that charts) are adviews (or the video would have charted), and this is the Minimum adviews. There might be more NOV but we can't really tell since it didn't chart.

3. The Minimum estimate (or precise estimate when a video charts) is the most realistic NOV we can calculate, and are as trustworthy as Youtube Charts.
Only 6 weeks after release is monitored, it's not feasible to monitor all videos forever.

Below is a brief technical explanation of the process behind the Stat&Prob algorithm to render the views (it is only used to the graphic display, not the estimates):

  1. An expected number of Views and Views per Like for the video is calculated based on the statistics and formulas from the site explained on the article linked above, which use all videos we have daily views and likes data;
  2. The Views/Likes are calculated based on our statistical formula. On top of the VPL formula, an extra amount of views are allowed on the first day to compensate the fact we use a simple power law that does not accurately describes the first few days excess views.
  3. Once the graphic is generated, a multiplier is calculated to the amount of NOV displayed if you add day by day will be equal to the total NOV from Youtube (so, it guarantees the total NOV displayed weekly is indeed the total NOV calculated based on Youtube Charts). Thus, even if it have slight daily deviations from reality, the total is correct.

Youtube Charts are released every Saturday. Therefore, we can only update Non-organic views on recent videos when Youtube Charts are released. Keep that in mind: The last week or so might still not have been accounted for.

The main algorithm accuracy (for the graphics) is calculated based on all videos that have Youtube Charts and is about 85% correct, so even though the daily graphics might have deviations, it is mostly correct.

Videos starting 2021 have trustworthy data; Videos between October 2020 and December 2020 might not have NOV calculated because the server's daily statistics were not working. Videos between June 2020 and October 2020 have some videos with available data.

CircleChart Physical Sales

TL;DR: Circle releases this chart on different days every month, we monitor it every day and will update when it is released. This is the Circle Album Chart, not the Circle Retail Album Sales (introduced in 2021, works more like Hanteo). We also monitor and collate the Yearly totals released by Circle on January. For more on the differences between Circle and Hanteo, check our glossary

CircleChart releases two Physical Sales data. One is the original system in which data of distribution minus returns is reported since the old days of MIAK (The chart name before 2010, which was then called Gaon until 2022). The other, introduced in 2021, is Retail Sales, which is more similar to Hanteo charts, where direct retail sales are reported. However, CircleChart network of stores is smaller and therefore less accurate then both Hanteo and their original method. Since historically only the Distribution data is released, that is the one being used on the site. Due to the need to compensate returns as well their internal audit, their charts are usually delayed - their Monthly charts are usually released over a week past each month, while their yearly audit can take 2 weeks to be released on mid-January.

Circle doesn't have a centralized organization, which means artists do not have a unique ID and each time an artist charts on any of their charts, it can (and often does) be spelled differently. We have complex algorithms in place to match every artist they mention to our unique ID, we have been going through all data (since 2010) to double check everything has been matched correctly, we currently checked 2010, 2011, and 2023 onwards.

Physical Sales are stored in two methods: For each item (album), sales data are stored per month and per year. Then a totalized sales for the artist is also stored per month and per year.

The need to have the year sales stored on top of the monthly sales is that CircleChart applies corrections on their yearly charts based on returns. Also, since sales that do not enter the top 100 monthly are not detected, some of these sales might be detected on the yearly total of the top 100 yearly sales. When you check item or artist data per month, you will see the monthly data (there is no way to apply the corrected yearly data on the monthly data). When checking the yearly sales, the data shown will either be the yearly corrected value if it was available on the yearly top 100, or the sum of the monthly tables.

It is important to consider that with all things included, the totals are the minimum sales, since small sales are not seen on either monthly or yearly sales. Most artists have some tiny amount of sales for months or even years after release, but there is no way to record those unless CircleChart one day creates an artist totals stats.

If you are looking for complete sales data, always keep in mind there are two sources for Korean data: Circle (officially linked to the government and audited), and Hanteo (from a commercial organization and not audited). For more on the differences between Circle and Hanteo, check out glossary

CircleChart Digital Streaming, Billboard Chart

TL;DR: Circle releases this chart on Wednesdays. This is the Circle Digital Chart (which includes downloads), not the Circle Streaming Chart, because we believe its a better Digital benchmark than just streams. Billboard releases on Monday and is fetched on Tuesday, this is the top 200 Global chart.

CircleChart historically releases digital streaming data, but their method to calculate their digital streaming data have changed several times, and in some years they did not supply a number of streams (or scores), only the rank. Since we would only be able to have the order of the chart, and that their score formula is unstable across the years, we calculate a score based on chart position. Also, extremelly important to notice, is that we use their Digital chart instead of Streaming chart. The Digital scores is more complete because it includes not only streaming, but also digital sales, ringtone sales and BGM sales (all digital goods). Like with Circle streaming scores, this has changed over the years and for the same reason we still have to relly on one centralized score system of our own (see below).

As usual with CircleChart releases, this chart is delayed a few days after the end of the week. It is usually released on the following Wednewsday. The system automatically detects a new week and retrieves it, using the same method mentioned above on Physical Sales to correlate the artist name with the database name. The system also uses the Song name to try and lock with one of the artist's MVs.

The weekly data is stored in two arrays: One stores the actual chart, with the original artist and song name, and the resolved artist and MV id's. The second array stores artists by total score per week using the following formula to calculate the score:

  • First place gets 200 points
  • Second place gets 150 points
  • Third place onwards gets 100-position, thus, 97 to 1 points.

Each artist have a total score based on the sum of all songs that charted.

Since late 2023, we have been going through all the streaming weekly since 2010 to make sure all artist and MV links are correct. For the most part, all of 2023 onwards have been verified as it is released.

For Billboard Chart, since it includes 200 entries, the score is:

  • First place gets 400 points
  • Second place gets 300 points
  • Third place onwards gets 200-position, thus, 197 to 1 points.

Contrary to the Circle Chart, only the Artist is resolved on Billboard because the naming can be different and would require a whole new matching system. Besides, we are only interested in when an artist charts, not each song, for Billboard.

Artist Trending System

TL;DR: This system measures which artists have the highest score (trend) in the last couple of weeks. It uses Circle Digital (weekly), Awards, Sales (monthly), Youtube Likes and Billboard Chart to rank them every day.

Data starting from the previous week Monday is considered, and given the following weight:

  • Circle Digital score (uses most recent week): 30%
  • Music Show Awards (all since previous week's Monday): 25%
  • Youtube Likes on Main videos: 20%
  • Billboard Global Chart score: 15%
  • Circle Physical Sales (uses most recent month): 10%

For Circle Digital, Circle Sales and Billboard Global Chart, the most recent data available is used; For Youtube video likes, all released since the previous week Monday onwards is used (only for Main videos); For Awards, all awards since the previous week Monday onwards is used (this, since The Show of last week).

Each metric is ranked in a percentile. The best of each rank will receive full score for that metric (for instance, #1 on Circle Digital gets the full 30% points for Digital). For likes, all likes on all videos are counted and the highest score will be for the Artist with most likes. For Awards, all awards are counted and full score is given for the artist with most awards (so if the Artist with most awards has 6 awards, that will be a full score).

This score is calculated once a day


Historic Monthly View Statistics

TL;DR: We don't have daily or monthly views for old videos, and for recent videos only 6 months, so we use a statistics model to distribute the views since it released. This is only used on the main Statistics page to show Views per month since 2008.

In the Statistics page, you can see a Monthly View statistics of all videos in the database. Those views are NOT simply all the views of the MV added to the month of their release, but rather a DISTRIBUTION of the views starting on the month of release across 60 months.

The calculation for how to distribute the likes follow the complete analysis of all videos since 2020 as displayed in the last section of the data analysis page. Since this distribution doesn't change, it is hard-coded and does not require updates (usually it is updated once a couple of years).

All videos except promotional are included in the graphic. A few exceptions that never followed the predicted falloff (like Gangnam Style and Boom Boom) have 12-month hard-coded values and only then follow the same principle of normal videos. Unfortunately, as of now, Non-organic videos are included.

Historic Top 30 videos

This is the first major statistics that the site started compiling since 2016, but unfortunately a bug on the original counting voided a whole year of data. The current data goes back to May 2017.

Each first day of the month, the top 30 videos on each category are recorded (with total) and stored on the database so that the table can be compiled. Because of the way the data is stored, changes in artist names or music names do NOT affect the data in this table, so these changes cause weird events where a song drops from the list (as if deleted from the internet) and re-enters with the new name. There is nothing we can do.

Also, since the site was small in scope up to late 2019, some artists had little representation in the database and once users started filling in the void, their totals increased. These increases were not actual more views, but simply more videos being registered in the database. The data is trustworthy as accurate starting around 2020.

As with the historic monthly view, Non-organic views are included, so be aware of that when checking these tables specially beginning in 2019 when TrueView practice in K-pop MVs became commonplace.

Time to Success

The "Time to Success" shown in artists is calculated based on how long the artist took from their debut date to the first occurrence of one of the following, whichever comes first:

  • Music Show Award
  • PAK
  • An item being sold 100.000 units
  • #1 song in CircleChart Weekly

"Time to Success" is not shared among the members of a group: Each member have their own Time to Success based only on their solo activities.

Mainstream Level

TL;DR: The database has too many artists, so some of us prefer to only filter the more mainstream ones. This details how we calculate who is mainstream based on Streaming. While Circle only have data since 2010, artists with no releases/streams since 2010 are probably no longer mainstream.

Mainstream Level is calculated to allow users only interested in the Mainstream artists to have a lighter site content. Approximately 50% of content is hidden when the maximum filter is active.

The Mainstream level can range from 1 (highest) to 3 (lowest) and is calculated as follows:

  • Rank 1: Total Circle Score of 7000 or more.
  • Rank 2: Total Circle Score of 1000 or more.
  • Rank 3: Artists that do not fit any of the above.

After that, a few increases are performed based on sales:

  • Artists ranked level 2 but with over 150.000 units sold are upgraded to level 1
  • Artists ranked level 3 but with over 15.000 units sold are upgraded to level 2

Another extra pass increase are performed based on total views (all MVs together):

  • Artists ranked level 2 but with over 200M views are upgraded to level 1
  • Artists ranked level 3 but with over 100M views are upgraded to level 2

Sub-units, soloists and collabs have also special level exceptions increase:

  • Sub-units will have their level upgraded to the same as the main group if lower
  • Soloists have their level upgraded to the same as the main group if lower
  • Collabs will have their level upgraded to the same as the highest level member if the collab level was lower.

Artist Success (Stars) System

Artists have a Success Rating (Stars) assigned to them, from zero () to 5 () stars.

This system is rather simple to avoid any polemics. Each star represents one of the following:

  • Sales up to 500.000
  • Streaming (score) up to 700
  • Awards up to 5
  • Organic Views up to 250.000.000
  • Yearly awards up to 7

The threshold values are based on a subjective cut-off for the top ~150 artists. For instance, the top ~150 artists have 500.000 sales, ~150 artists have a streaming score above 700 and so on. These values are subject to change. Each star can be filled in percentages, so an artist with 250.000 sales and 3500 streaming would have 0.5 stars on sales and 0.5 stars on streaming, with will add up to 1 star.

Server Costs, Ads and Donations

In April 2023, I have added a Buy me a Coffee Donation page and a Paypal Donation page to try and coup the server costs. These are the data for each year.

We currently use an high-level shared webhost and a support server (my home PC) that stays on 24/7 and performs support actions every 6 hours, generates a backup a day, and monitors the main server uptime.

Advertisements are active if the Donation total is no longer enough to pay for the current month. Otherwise, to keep the AdSense active, Advertisements are displayed on a few pages (mostly pages that repeating users don't use) just to generate statistics and keep the algorithm updated about site usage so when needed it won't take several days to pick up. This "passive" income is in the order of $0.01 a day.

During Donald Duck presidency in the USA and his illegal tariffs against Brazil (my country), Ads will be run for residents of the US. This matter has already been raised by Brazil with WTO and the US is likelly to pay or default, either way yet another diplomatic blunder for them. The revenue from these ads do not count towards the server maintenance. Russia also is served ads for their incursion on Ukraine, and Israel for their crimes against humanity.

Year Hosting/Domain Costs DonationsAds
2023 $ 200 ($15/mo) $ 204-
2024 $ 144 ($18/mo) up to August
$ 139 ($39/mo = $32 host + $2 domain + $5 backup) September onwards

TOTAL: $ 283
$ 283-
2025 $ 78 ($39/mo = $32 host + $2 domain + $5 fee) up to end of february
$ 108 ($18/mo = $9 host + $4 domain + $5 fee) up to mid september
$ 133 ($33/mo = $25 host + $4 domain + $5 fee) mid sep ~ end of year

TOTAL: $ 319
$ 316$ 4
2026 $ 396 ($33/mo = $25 host + $3 domain + $5 fee)

TOTAL: $ 396
$ 95

Note:2025 is cheaper because we were on our first year in the new host, and the promotional price is less than half the normal price. 2026+ should be $33/mo which totals $ 396

Ads by Google. ADs support our site when donations are down