Are Chess Improvers Causing a Lichess Tactic Rating Deflation? • page 1/5 • Community Blog Discussions • lichess.org

Comments on https://lichess.org/@/saychessclassical/blog/are-chess-improvers-causing-a-lichess-tactic-rating-deflation/fkspU3LA

hAlunAmAtAtA

To best visualization you can use alpha parameter in function :)

dboing

timestamps would be nice in puzzle database. and possibly some past history of lichess posting it. The live database has more data in it than the inert one (such as theme votes, a personal interest of mine).

Would there be anyway to attribute theme among the multithemed feedback on any puzzle, to the difficulty as estimated by the rating. Yes thematic entry is easier, but not so much on higher levels. They might not be the themes that make the rating.

I usually use theme entry after batches of random mix, based on weakness dashboard (would be nice here to have weights, not only performance ratings, upon hover, to know if the performance rating is from a small sample or not, note to lichess).

So, I think think the theme entry hypothesis, might have complicating factors to it. But interesting, to ask such questions out of datasets.

timestamps would be nice in puzzle database. and possibly some past history of lichess posting it. The live database has more data in it than the inert one (such as theme votes, a personal interest of mine). Would there be anyway to attribute theme among the multithemed feedback on any puzzle, to the difficulty as estimated by the rating. Yes thematic entry is easier, but not so much on higher levels. They might not be the themes that make the rating. I usually use theme entry after batches of random mix, based on weakness dashboard (would be nice here to have weights, not only performance ratings, upon hover, to know if the performance rating is from a small sample or not, note to lichess). So, I think think the theme entry hypothesis, might have complicating factors to it. But interesting, to ask such questions out of datasets.

Toscani

There should be a rating change drop, because of the pool of the players in the rating. Less players are rated 2000 than 1500. 1500 is not as obvious that they are not progressing of over inflated, because they might just have a recent rating. It would have been odd to see the red line horizontal or raising.

Assuming ... there is only a small handful doing the higher rated puzzles than the lower rated puzzles. The lower rated puzzles are probably less stable in rating compared to higher rated puzzles.

There should be a rating change drop, because of the pool of the players in the rating. Less players are rated 2000 than 1500. 1500 is not as obvious that they are not progressing of over inflated, because they might just have a recent rating. It would have been odd to see the red line horizontal or raising. Assuming ... there is only a small handful doing the higher rated puzzles than the lower rated puzzles. The lower rated puzzles are probably less stable in rating compared to higher rated puzzles.

Persona_L

Is there any way to objectively evaluate the rating of a particular puzzle? Not manually, of course, with the help of some programs. I would like to know exactly how difficult the puzzle I have solved (or not solved).

dboing

@Persona_L said in #6:

Is there any way to objectively evaluate the rating of a particular puzzle? Not manually, of course, with the help of some programs. I would like to know exactly how difficult the puzzle I have solved (or not solved).

What is objective. I think the questions might be more about intrinsic and extrinsic measures. Chess does not have any measure of difficulty that is based on chess itself. It is always indirect measures, such as ELO (for engine unspecified pools, most often, those pool characteristics don't even come in fine print with the ELO king of the mountain, which might be a very narrow high scraper), or pools of puzzles as players against the human players. What is objectivity if not combined multiple subjectivities.

There is a tendency to think that automatized programmed subjectivity might be objectivity. It might have high precision (and even that, watch out for the fine prints, also not shared much, with programmed random behavior on flat lands, why a lot of puzzles are selected against not just at challenge initial position, but at depth of solution accepted).

Since puzzles are not only of mate outcomes.. Using engine programmer subjectivity or ELO of engine pools, all of similar programmed subjectivity (we will never know, because nobody asks or make them compete to be diverse, or even how diverse) is like asking a population of humans to adapt to that uncharacterized ever subjectivity of few engine designers (fewer than there are players). I would not trust blindly a machine that is still not self-programmable, self-critical, and blind about its sandbox representativity of the wilderness of chess. I would trust certain things, but the fine print would have to come with them, in chess land terms (and some math. or stats about it).

So for now i think the human to puzzler (machine selection of challenge, and unique solution segment that will grant success or failure) pool is the most statistically objective way.

Some could build a more chess realistic measure of difficulty, but it would have to be multidimension, as chess is.... the closest thing I find going in that direction, is the excellent thematic decomposition that lichess has put the seed for us to keep developping.

@Persona_L said in #6: > Is there any way to objectively evaluate the rating of a particular puzzle? Not manually, of course, with the help of some programs. I would like to know exactly how difficult the puzzle I have solved (or not solved). What is objective. I think the questions might be more about intrinsic and extrinsic measures. Chess does not have any measure of difficulty that is based on chess itself. It is always indirect measures, such as ELO (for engine unspecified pools, most often, those pool characteristics don't even come in fine print with the ELO king of the mountain, which might be a very narrow high scraper), or pools of puzzles as players against the human players. What is objectivity if not combined multiple subjectivities. There is a tendency to think that automatized programmed subjectivity might be objectivity. It might have high precision (and even that, watch out for the fine prints, also not shared much, with programmed random behavior on flat lands, why a lot of puzzles are selected against not just at challenge initial position, but at depth of solution accepted). Since puzzles are not only of mate outcomes.. Using engine programmer subjectivity or ELO of engine pools, all of similar programmed subjectivity (we will never know, because nobody asks or make them compete to be diverse, or even how diverse) is like asking a population of humans to adapt to that uncharacterized ever subjectivity of few engine designers (fewer than there are players). I would not trust blindly a machine that is still not self-programmable, self-critical, and blind about its sandbox representativity of the wilderness of chess. I would trust certain things, but the fine print would have to come with them, in chess land terms (and some math. or stats about it). So for now i think the human to puzzler (machine selection of challenge, and unique solution segment that will grant success or failure) pool is the most statistically objective way. Some could build a more chess realistic measure of difficulty, but it would have to be multidimension, as chess is.... the closest thing I find going in that direction, is the excellent thematic decomposition that lichess has put the seed for us to keep developping.

BongoOve

Is Liechess about chess or an orgy of data science? Can you cut away all the amateur data scientist crap please!

Persona_L

edited

It's just that sometimes the rating of the puzzle of the day changes a lot. But it doesn't make them any easier or harder, does it? And as a result, at the end of the day, I do not know which rating I solved the puzzle with - the rating in the morning, before solving this puzzle by everyone, or in the evening, when everyone has already solved it. It's confusing.
Thanks for the answer! @dboing

It's just that sometimes the rating of the puzzle of the day changes a lot. But it doesn't make them any easier or harder, does it? And as a result, at the end of the day, I do not know which rating I solved the puzzle with - the rating in the morning, before solving this puzzle by everyone, or in the evening, when everyone has already solved it. It's confusing. Thanks for the answer! @dboing

SayChessClassical

@dboing said in #4:

timestamps would be nice in puzzle database. and possibly some past history of lichess posting it. The live database has more data in it than the inert one (such as theme votes, a personal interest of mine).

Would there be anyway to attribute theme among the multithemed feedback on any puzzle, to the difficulty as estimated by the rating. Yes thematic entry is easier, but not so much on higher levels. They might not be the themes that make the rating.

I usually use theme entry after batches of random mix, based on weakness dashboard (would be nice here to have weights, not only performance ratings, upon hover, to know if the performance rating is from a small sample or not, note to lichess).

So, I think think the theme entry hypothesis, might have complicating factors to it. But interesting, to ask such questions out of datasets.

One way would be to calculate the success rated based on the entry point (mix or themed). Then this score would show how big the hint is

@dboing said in #4: > timestamps would be nice in puzzle database. and possibly some past history of lichess posting it. The live database has more data in it than the inert one (such as theme votes, a personal interest of mine). > > Would there be anyway to attribute theme among the multithemed feedback on any puzzle, to the difficulty as estimated by the rating. Yes thematic entry is easier, but not so much on higher levels. They might not be the themes that make the rating. > > I usually use theme entry after batches of random mix, based on weakness dashboard (would be nice here to have weights, not only performance ratings, upon hover, to know if the performance rating is from a small sample or not, note to lichess). > > So, I think think the theme entry hypothesis, might have complicating factors to it. But interesting, to ask such questions out of datasets. One way would be to calculate the success rated based on the entry point (mix or themed). Then this score would show how big the hint is